Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
"Kevin Becker" <starbugs@gmx.net> wrote in message news:bbc55e92.0311070813.507ea083@posting.google.com... > Francisco Rodriguez wrote: > > As your numbers are 32-bit wide, and overflow condition is a flag > > about the result of the _whole_ v := v + i operation, it can be determined > > by the MSB bits only (that is, from columns 31 and 30 of the addition). > > Thank you! I think I used the wrong word "underflow". What I mean is: > if "i" is negative, it might be that abs(lo(i)) is greater than lo(v). Yes, underflow is the right word. It isn't used very often, though. > What happens then? An example: > > v = 1234 0100 hex > i = 0000 0101 hex > > The operation for the lo bits will result in FFFF and the carry will > be set. Then when I add the high bits, it will be 1235 when it should > actually be 1233. How do I save that the first operation was not an > overflow but a borrow? (I don't know how to call this. That's what I > mistakenly called underflow). Hmm. X'0100' + X'0101' is X'0201' with no overflow. If you want to add a small twos complement negative number, then that number will have F's in the high bits which will take care of the 1233 part. In that case, borrow means that there is no carry, otherwise there is a carry. Say v=X'12340100' and i is negative 257. Negative 257 is X'FFFFFEFF' X'0100' + X'FEFF' is X'FFFF' with no carry. X'1234'+X'FFFF' is X'1233' with carry. The carry into the high bit is also 1, so that there is no overflow or underflow. > Glen Herrmannsfeld wrote: > > After the high halfword add, you compare the carry out to the carry out of > > the sign bit to the carry in of the sign bit. If they are different then it > > is overflow or underflow. The value of such bit tells you which one. > > So does that mean I have to modify my architecture and set TWO flags? > A carry flag and a negative flag (if sign of last operation was > negative), and then the Add-With-Carry instruction would look at both? No, add with carry doesn't need to know. You only need the two flags at the end if you want to detect overflow and underflow. -- glenArticle: 62801
Took cared of the dumb problem.. It seemed XST (5.2) does not like a 'wire type' output port driven by a reg output (e.g. 'Q'). A 'wire type' output driven by combi logic is okay tho. For example, I have output [127:0] a; reg [31:0] c,d,e,f; a[31:0] = c; a[63:32] = d; etc. will cause some 'a' wires to 'stuck'..Article: 62802
Hi, I am having PCIX Core in FPGA. Hence after power on reset, it takes some time (FPGA configuration) to get the PCI-X Core logic. Lets say it is 2 seconds for example. after this time only, logics being realised and it can respond to PCI configuration Cycles. My question is, after power ON when the PCIX controller starts its enumaration process. ie., reading configuration space Thanks in advance. Regards, MuthuArticle: 62803
I've always been amazed that at a big company there can be two coders sitting next to each other with outputs that vary by a factor of ten, and their pay varies by a factor of 5%. Companies seem to be very good at laying off large swaths of workers, but not at firing really useless ones. -Kevin "Ken Land" <kland1@neuralog1.com> wrote in message news:vqnf5oatba4n85@news.supernews.com... > I've been a programmer for over 15 yrs. I'm still a programmer and I > employ programmers in my company. > > Programmer output can vary (easily) by a factor of 10 from programmer to > programmer. (This is documented BTW - see "Rapid Development") > > If you are an average or above programmer and you are *actually writing > code*, your output is so incredibly high that overtime will almost always be > unecessary. Also, average to above average programmers *love* to write code > and would work extra hours just for the enjoyment if they didn't have > families to go home to. > > One more thing. As an employer/business owner - we have no incentive or > inherent desire for people to work unpaid overtime. We just need the work > done to keep the business moving forward. If you can do your part in 10 > hours/wk. great, if not then whatever it takes is what it takes. > > Ken > > > "Nial Stewart" <nial@spamno.nialstewart.co.uk> wrote in message > news:3fab93a1$0$12691$fa0fcedb@lovejoy.zen.co.uk... > > > > Phil Hays <SpamPostmaster@attbi.com> wrote in message > > news:3FAA5342.B1F91A03@attbi.com... > > > > > The current law makes salaried people not get paid overtime. If you > don't > > think > > > that is fair, you need to convince voters to elect people that will > change > > the > > > laws. > > > > Surely all the law says is that if you sign a contract of employment > > which say you don't get paid overtime, then you can't expect to get > > paid for overtime? > > > > It's up to you whether you sign in the first place. > > > > ? > > > > > > Nial > > > > > > > > > >Article: 62804
Hi, It is not quite as simple as that. In case you are using a conservative wire-load model, provided by the silicon vendor, and a healthy margin for clock jitter, scan flip-flop timing overhead and second order effects, as well as a conservative setting for environmental parameters (for example 100+ deg. celsius temperature and voltage 15% lower than nominal for the process you are using) then the results could be quite realistic. In case you are running DC with an optimistic setup than you could be off by way more than 20%. You need to provide further info about your setup in order to get a realistic answer to your question. Ljubisa Bajic ATI Technologies -------------- My opinions do not represent those of my employer -------------- jon@beniston.com (Jon Beniston) wrote in message news:<e87b9ce8.0311070140.5bc4afb@posting.google.com>... > > Now my question is: Is the ASIC speed result reliable? > > If it's from DC, then no. > > Since we didn't > > do P&R( we don't have tools and experiences ), I really doubt the > > timing report may be over optimistically estimated and not reliable. I > > was told something about "wire load model" and ours is automatically > > selected by the compiler. > > Knock off 20%, as you're likely to have a more realistic figure. > > If you're working at .13, you probably want to be using physical > synthesis rather than synthesis based on wire load models. > > JonArticle: 62805
How fast can you really get data in and out of an FPGA? With current pin layouts it is possible to hook four (or maybe even five) DDR memory DIMM modules to a single chip. Let's say you can create memory controllers that run at 200MHz (as claimed in an Xcell article), for a total bandwidth of 5(modules/FPGA) * 64(bits/word) * 200e6(cycles/sec) * (2words/cycle) * (1byte/8bits)= 5*3.2GB/s=16GB/s Assuming an application that needs more BW than this, does anyone know a way around this bottleneck? Is this a physical limit with current memory technology? FernandoArticle: 62806
"Denis Gleeson" <dgleeson-2@utvinternet.com> wrote in message news:184c35f9.0311070333.7a6acaae@posting.google.com... > Hi Chuck > > Many thanks for your input on my question. > I have used your code and it leaves me with just one problem in my > simulator that you may be able to advise me on. > > It is a warning that Net "/clear" does not set/reset > "/".../Store_trigger_Acquisition_Count_reg<0> > all other bits for Store_trigger_Acquisition_Count get the same > warning. > > The result is that the synthesis tool warns that no global set/reset > (GSR) net could be used in the design as there is not a unique net > that sets or resets all the sequential cells. > > I have modified your code to include the use of the clear signal to > set Store_Trigger_Acquisition_Count > to 0. This has had no effect. > > Any suggestions? > > always @ (ACB_Decade_Count_Enable or OUT_Acquisition_Count or clear) > if(clear) > Store_Trigger_Acquisition_Count <= 14'b0; > else Shouldn't that be 15'b0? Best regards, BenArticle: 62807
Hi Kevin I would recommend you to search and study the instruction set of a microprocessor with the support you're trying to implement in your processor. Many processors have two different add instruccions (with and without carry) to support large integer arithmetic. The simplest I know of is the 8031 8-bit microcontroller from Intel, Infineon, Dallas and many other manufacturers. The first hit I found in google points to the page http://www.rehn.org/YAM51/51set/instruction.shtml It contains the description and some numeric examples for arithmetic operations. Of interest are: ADD (A =A +x, carry is not used) ADDC (A=A+x+carry) SUBB (A = A-x-carry) "Kevin Becker" <starbugs@gmx.net> escribió en el mensaje news:bbc55e92.0311070813.507ea083@posting.google.com... > Francisco Rodriguez wrote: > > As your numbers are 32-bit wide, and overflow condition is a flag > > about the result of the _whole_ v := v + i operation, it can be determined > > by the MSB bits only (that is, from columns 31 and 30 of the addition). > > Thank you! I think I used the wrong word "underflow". What I mean is: > if "i" is negative, it might be that abs(lo(i)) is greater than lo(v). > What happens then? An example: > > v = 1234 0100 hex > i = 0000 0101 hex > > The operation for the lo bits will result in FFFF and the carry will > be set. Then when I add the high bits, it will be 1235 when it should > actually be 1233. How do I save that the first operation was not an No. I assume you're substracting the numbers, as i is positive and the result you mention is v-i. Then, take into account that substraction is performed by an adder in 2's complement arithmetic as follows: v - i = v + 2's complement(i) = v + not(i) + 1 So v-i operation is converted to 12340100 + FFFFFEFE + 1 = 1233FFFF The low part is 0100 + FEFF = FFFF, the carry from the low part is _not_ set, so the high part is 1234 + FFFF = 1233 > overflow but a borrow? (I don't know how to call this. That's what I > mistakenly called underflow). Go to the mentioned page, you'll see the different descriptions for carry (or borrow) and the overflow. Carry/borrow is the cy-16 out of the add operation (remember there's no substraction circuit). It is the overflow flag if and only if you're using unsigned arithmetic Overflow for signed arithmetic is cy-16 xor cy-15. When this xor gives you 1 means you've obtained a positive result adding two negative numbers, or a negative result adding two positives. > > Glen Herrmannsfeld wrote: > > After the high halfword add, you compare the carry out to the carry out of > > the sign bit to the carry in of the sign bit. If they are different then it > > is overflow or underflow. The value of such bit tells you which one. > > So does that mean I have to modify my architecture and set TWO flags? > A carry flag and a negative flag (if sign of last operation was > negative), and then the Add-With-Carry instruction would look at both? Never look at both. Your ALU must provide two flags, carry and overflow, and two different add instructions x+y and x+y+carry. Of course, every add must update both flags. If you also provide set-carry/clear-carry instructions, the pseudocode of the 32-bit operations would be for 32-bit additions: add x-low, y-low addc x-high, y-high for 32-bit substractions: set carry addc x-low, not(y-low) addc x-high, not(y-high) When the whole operation is finished, check carry/borrow if the 32-bit numbers are unsigned, or the overflow flag if the 32-bit numbers are signed. But no both. Don't check the flags after the low part. > > Thanks a lot! Best regards FranciscoArticle: 62808
> > Doesn't really matter, good enough in this case, lets any potential > > commercial user get the message loud & clear. If its a 600 target, > > thats one very expensive Arm compared to real thing. For an opensource > > cpu to be useable, it must be competitive in size, speed, power with > > commercial cpus. > > > > johnjaksonATusaDOTcom > > Does anyone know when the arm license is going to expire? I think the exception processing patent was filed around 90-92, so its got quite a bit in it left.. JonBArticle: 62809
Amontec Team <laurent.gauch@www.DELALLCAPSamontec.com> writes: > Petter Gustad wrote: > > In SVF files generated by impact there will be delay statements on the > > form: > > > > // Loading device with a 'ferase' instruction. > > ... > > RUNTEST 15000000 TCK; > > > > What is the minimum delay as a result of this statement, i.e. what is > > the assumed TCK frequency for impact generated SVF files? > > > > TIA > > Petter > > > In the Xilinx SVF file, the assumed TCK is the maximum TCK frequency of > the device. Look the datasheet of the FPGA or your CPLD (between 10 to > 40 MHz). Hmmm. But when there's a chain of different devices, or even other brand names than Xilinx... I guess impact will use the lowest speed in the chain based upon the attribute in the BSDL files: attribute TAP_SCAN_CLOCK of TCK : signal is (10.00e6,BOTH); Is my assumption correct? Petter -- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?Article: 62810
Hello, Once the power becomes "good" there is a minimum 100 ms delay before the RST# signal deasserts. Unless you are designing a 32-bit PCI card, you MUST have the FPGA finished with bitstream loading before RST# deasserts. This applies to any compliant PCI-X design regardless of bus width, and also to any PCI design that is 64-bits wide. The reason for this is that your FPGA design MUST be loaded so that it can detect the busmode initialization pattern, which is broadcast at the deassertion of RST#. If you miss this, you are in big trouble... Once RST# is deasserted, you then have 2^25 or 2^27 cycles, depending on the bus frequency, until the first configuration access to your device. EricArticle: 62811
If I remember right, this is a counter application. Forget the 16-bit ALU and use a 32-bit counter instead, and avoid all this headache. KISS. Peter Alfke ========================== Glen Herrmannsfeldt wrote: > > "Kevin Becker" <starbugs@gmx.net> wrote in message > news:bbc55e92.0311070813.507ea083@posting.google.com... > > Francisco Rodriguez wrote: > > > As your numbers are 32-bit wide, and overflow condition is a flag > > > about the result of the _whole_ v := v + i operation, it can be > determined > > > by the MSB bits only (that is, from columns 31 and 30 of the addition). > > > > Thank you! I think I used the wrong word "underflow". What I mean is: > > if "i" is negative, it might be that abs(lo(i)) is greater than lo(v). > > Yes, underflow is the right word. It isn't used very often, though. > > > What happens then? An example: > > > > v = 1234 0100 hex > > i = 0000 0101 hex > > > > The operation for the lo bits will result in FFFF and the carry will > > be set. Then when I add the high bits, it will be 1235 when it should > > actually be 1233. How do I save that the first operation was not an > > overflow but a borrow? (I don't know how to call this. That's what I > > mistakenly called underflow). > > Hmm. X'0100' + X'0101' is X'0201' with no overflow. If you want to add a > small twos complement negative number, then that number will have F's in the > high bits which will take care of the 1233 part. In that case, borrow > means that there is no carry, otherwise there is a carry. Say v=X'12340100' > and i is negative 257. Negative 257 is X'FFFFFEFF' > > X'0100' + X'FEFF' is X'FFFF' with no carry. X'1234'+X'FFFF' is X'1233' > with carry. The carry into the high bit is also 1, so that there is no > overflow or underflow. > > > Glen Herrmannsfeld wrote: > > > After the high halfword add, you compare the carry out to the carry out > of > > > the sign bit to the carry in of the sign bit. If they are different > then it > > > is overflow or underflow. The value of such bit tells you which one. > > > > So does that mean I have to modify my architecture and set TWO flags? > > A carry flag and a negative flag (if sign of last operation was > > negative), and then the Add-With-Carry instruction would look at both? > > No, add with carry doesn't need to know. You only need the two flags at > the end if you want to detect overflow and underflow. > > -- glenArticle: 62812
Kevin Neilson wrote: > > I've always been amazed that at a big company there can be two coders > sitting next to each other with outputs that vary by a factor of ten, and > their pay varies by a factor of 5%. Companies seem to be very good at > laying off large swaths of workers, but not at firing really useless ones. > -Kevin And some companies are very good at promoting and throwing great fistfuls of cash at coders with outputs of 100x the average who can also solve other technical problems. It's really hard to fire a useless person without being able to prove in court that they guy really IS useless, was given the appropriate number of chances to remedy his uselessness, and that the company bent over backwards to keep him gainfully employed in spite of his limitations, especially if said useless person is a member of some EEO "protected" class. You have problems even if you give such a person a charity layoff and a few months of severance pay. Carry on... -- Cheers, Bev +++++++++++++++++++++++++++++++++++++++++++++++++ "I don't care who your father is! Drop that cross one more time and you're out of the parade!"Article: 62813
Hi, I'm trying to capture a video frame with the camera included in RC200E board of Celoxica. I'm founding my design in the PAL example VideoIn. 1. I capture a block of 640x9 pixels and I store in a RAM. (I capture the pixels with PalVideoInRead) 2. I copy this block to a PalFrameBuffer. 3. I return to step 1 to capture the next block in the video IN frame. In simulation it works, but when I download the bit stream to RC200E board, the result it's wrong. I lose a few lines, because the image displayed in the monitor is of less height and therefore it's displayed 3 or 4 frames in the monitor displacing up. The idea is capture a block of rows for make some processing and then move the results pixels to the PalFrameBuffer. I have read the PAL API Reference manual a lot of times and I've watched the examples and the only thing that I believe that makes the error is that for use PalVideoInRead function, it should be called repeatedly whitout delay in order to be sure of not missing pixels. Any Ideas or suggestions? I'll appreciate whatever comment. Thanks Gerardo Sosa P.D. If somebody needs to see my code e-mail me.Article: 62814
David Gesswein wrote: > I tried the Xilinx support line Case # 503586 and haven't gotten a good > answer so though I would try here. > > I am trying to interface to a ZBT SRAM from a Virtex II and was trying to > do like in xapp 136 which used 2 DCM's to generate a internal FPGA and an > external board clock using external DCM feedback that are aligned. That > configuration in simulation (5.2i sp3) shows the external clock is .5ns > delayed from the internal clock. We actually need to use a third DCM FX > output to generate the clock for the SRAM. When we do that the external > clock is now leading the internal clock by 1 ns. I didn't understand why > what clock feeds both DCM's would change the timing and since our timing > is tight I need them to be closely aligned. > > The external clock is output using a DDR FF and I used the DCM wizard which > should of put in all the problem bufg/ibufg etc which the V2 users guide > says are needed if it is going to compensate for the pad to DCM delay. > > I also think only 2 DCM's are needed, 1 to generate the internal clock using > FX and a second to generate the deskewed external clock. That configuration > seems to generate the same timing as the 3 DCM version. > > Anybody know the correct solution? Howdy David, I'm not quite clear on what you are using the FX output before, but I've used ZBT SRAM's on a number of designs over the past couple years. Here is how I do it: feed the input/reference clock into two DCM's. The CLKFB of one is just the output of the global buffer. The output of the other DCM goes off chip (using a DDR FF doesn't get you anythig... the deskew function takes the delay out). Put two resistors on the output pin (keeping the resistors as close as possible to the FPGA), and route from one resistor to the ZBT. The other resistor routes to a GCLK input pin and feeds to the CLKFB pin. As long as you keep the two "long" traces (the outputs of the two resistors) close to the same length, you won't get reflections, and the DCM will remove the skew so that the rising edge of the clock arrives at the ZBT around the same time as the rising edge occurs inside the FPGA. As another poster mentioned, if you use external feedback, Xilinx recommends (or used to) that you hold the second DCM in reset for a long while so that the clock has time to propagate off chip and back into the feedback pin. As for how you can compare the output clocks of the two, do you really need to? What you care about is alignment of the ZBT clock and the address or data bus transitions coming from the FPGA (which are a function of the internal clock). Good luck, MarcArticle: 62815
News info below. Automotive customers tend to be tough on reliability, and on standard supply voltages. Of interest in this release are + 0.13u/150MHz core, but they manage to deliver 5V I/O, ADCs etc [ FPGA vendors could learn from this ] + Comment on error correcting FLASH Not mentioned here, but also noted, is the trend to require a Vpp or PGM enable pin, on Automotive FLASH parts. Seems to be a concern about shipping a part that MIGHT be able to re-program its own flash ? - jg Motorola news item : "Based on 0.13-micron design rules, the MPC5554 chip operates at speeds of 50 to 150MHz. Though the design rules are advanced, Motorola made the part so that its I/O and ADC will run at 5V, which automakers often prefer. The company also said it designed the flash memory to be more reliable by adding error correcting code. The flash is built to retain data for 20 years and withstand 100,000 read/erase cycles. The first MPC5554 will include 2 Mbytes of flash, and the company is planning to come out with a 4Mbyte version next year, Cornyn said. "Article: 62816
Followup to: <3FAC46F0.31F9B374@myrealbox.com> By author: The Real Bev <bashley@myrealbox.com> In newsgroup: comp.arch.fpga > > Kevin Neilson wrote: > > > > I've always been amazed that at a big company there can be two coders > > sitting next to each other with outputs that vary by a factor of ten, and > > their pay varies by a factor of 5%. Companies seem to be very good at > > laying off large swaths of workers, but not at firing really useless ones. > > -Kevin > > And some companies are very good at promoting and throwing great > fistfuls of cash at coders with outputs of 100x the average who can also > solve other technical problems. > > It's really hard to fire a useless person without being able to prove in > court that they guy really IS useless, was given the appropriate number > of chances to remedy his uselessness, and that the company bent over > backwards to keep him gainfully employed in spite of his limitations, > especially if said useless person is a member of some EEO "protected" > class. You have problems even if you give such a person a charity > layoff and a few months of severance pay. > What's much worse than deadwood are people who are active obstructionists. They can also be really hard to get rid of, unfortunately. -hpa -- <hpa@transmeta.com> at work, <hpa@zytor.com> in private! If you send me mail in HTML format I will assume it's spam. "Unix gives you enough rope to shoot yourself in the foot." Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64Article: 62817
Followup to: <bbc55e92.0311061129.28af9d44@posting.google.com> By author: starbugs@gmx.net (Kevin Becker) In newsgroup: comp.arch.fpga > > I'm designing a processor for one specific application and in my > software I have need a counter. I have a problem figuring out how to > make Add-with-carry work for this. > > I want to do v := v + i. > v and i are both 32 bit values, my ALU is 16 bits wide. > Everything is 2-complement. > > I would add the lower 16 bits, then add the higher 16 bits with carry. > My problem: "i" may be positive or negative, so there are 3 things > that can occur: > - overflow > - underflow > - none of those > > If I have only one carry bit, those 3 possibilities cannot be > represented. Am I right that in such an architecture it is impossible > to achieve what I want? How do I have to change my ALU in order to do > that? And how do I handle the sign bits in the "middle" of the 32 bit > values? If possible, I would like to avoid an additional comparison > and use only flags. > No, you're not correct. What you're doing wrong is simply failing to recognize the fundamental reason why 2's complement is so ubiquitous: ADDITION AND SUBTRACTION OF 2'S COMPLEMENT NUMBERS IS IDENTICAL TO THE SAME OPERATIONS ON UNSIGNED NUMBERS Therefore, you don't care if you got overflow or underflow -- they are both represented by carry out. In other words, build your ALU just as if "v" and "i" were unsigned numbers, and everything is good. -hpa -- <hpa@transmeta.com> at work, <hpa@zytor.com> in private! If you send me mail in HTML format I will assume it's spam. "Unix gives you enough rope to shoot yourself in the foot." Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64Article: 62818
fortiz80@tutopia.com (Fernando) wrote in message news:<2658f0d3.0311071117.3bf6eaea@posting.google.com>... > How fast can you really get data in and out of an FPGA? > With current pin layouts it is possible to hook four (or maybe even > five) DDR memory DIMM modules to a single chip. > > Let's say you can create memory controllers that run at 200MHz (as > claimed in an Xcell article), for a total bandwidth of > 5(modules/FPGA) * 64(bits/word) * 200e6(cycles/sec) * (2words/cycle) * > (1byte/8bits)= > 5*3.2GB/s=16GB/s > > Assuming an application that needs more BW than this, does anyone know > a way around this bottleneck? Is this a physical limit with current > memory technology? > > Fernando OTOH If you want more bandwidth than DDR DRAM, you could go for RamBus, RLDRAM or the other NetRam or whatever its called. The RLDRAM devices separate the I/Os for pure bandwidth, no turning the bus or clock around nonsense and reduce latency from 60-80ns range down to 20ns or so, that is true RAS cycle. Micron & Infineon do the RLDRAM, another group does the NetRam (Hynix, Samsung maybe). The RLDRAM can run the bus upto 400MHz, double pumped to 800MHz and can use most every cycle to move data 2x and receive control 1x. It is 8 ways banked so every 2.5ns another true random access can start to each bank once every 20ns. The architecture supports 8,16,32-36 bit width IOs IIRC. Sizes are 256M now. I was quoted price about $20 something, cheap for the speed, but far steeper than PC ram. Data can come out in 1,2 or 4 words per address. Think I got all that right. Details on Micron.com. I was told there are Xilinx interfaces for them, I got docs at Xilinx but haven't eaten them yet. They also have interfaces for the RamBus & NetRam. AVNET (??) also has a dev board with couple of RLDRAM parts on them connected to a Virtex2 part, but I think these are the 1st gen RLDRAM parts which are 250MHz 25ns cycle so the interface must work. Anyway, I only wish my PC could use them, I'd willingly pay mucho $ for a mobo that would use them but that will never happen. I quite fancy using one for FPGA cpu, only I could probably keep 8 nested cpus busy 1 bank each since cpus will be far closer to 20ns cycle than 2.5ns. The interface would then be a mux-demux box on my side. The total BW would far surpass any old P4, but the latency is the most important thing for me. Hope that helps johnjakson_usa_comArticle: 62819
"H. Peter Anvin" <hpa@zytor.com> wrote in message news:bohuvm$4v4$1@cesium.transmeta.com... > Followup to: <bbc55e92.0311061129.28af9d44@posting.google.com> > By author: starbugs@gmx.net (Kevin Becker) > In newsgroup: comp.arch.fpga > > > > I'm designing a processor for one specific application and in my > > software I have need a counter. I have a problem figuring out how to > > make Add-with-carry work for this. (snip) > No, you're not correct. What you're doing wrong is simply failing to > recognize the fundamental reason why 2's complement is so ubiquitous: > > ADDITION AND SUBTRACTION OF 2'S COMPLEMENT NUMBERS IS IDENTICAL TO > THE SAME OPERATIONS ON UNSIGNED NUMBERS > > Therefore, you don't care if you got overflow or underflow -- they are > both represented by carry out. > > In other words, build your ALU just as if "v" and "i" were unsigned > numbers, and everything is good. This is true, except for generating the flags on the final add. Well, you can either generate all the flags, or only the signed or unsigned flags. For the intermediate adds only the carry, or lack of carry, from the high bit is important. To detect signed overflow or underflow (more negative than can be represented) requires comparing the carry into and out of the sign bit. -- glenArticle: 62820
Austin Lesea <Austin.Lesea@xilinx.com> wrote in message news:<3FABC4A2.E78A6D86@xilinx.com>... > Yu Jun, > > Knock off 20% for .13u from schematic to RC extracted. > > Also depends on what the foundry actually supports: is this based on lo-k > dieletric? > > If not, that will take you down another 5%. > > The Virtex II Pro IBM405PPC runs at 450 MHz, so I would expect any well > designed and semi-custom layout uP to be at least that fast in .13u. > > Austin > > Yu Jun wrote: > > > I'm working on a cpu core and intend to embed it into ASIC circuits, > > with the aim to do some network processing. Now the FPGA prototype is > > running and a 66M speed is achieved( xilinx virtexII-4 ). Wondering > > how fast it can run in ASIC, we had our ASIC guys to synthesize the > > codes and the result was shocking, it reached 400M! Far beyond our > > expectation of 150M. The library we used was of 0.13u, from TI, fairly > > fast, in which a NAND gate is around 0.03ns. > > > > Now my question is: Is the ASIC speed result reliable? Since we didn't > > do P&R( we don't have tools and experiences ), I really doubt the > > timing report may be over optimistically estimated and not reliable. I > > was told something about "wire load model" and ours is automatically > > selected by the compiler. > > > > Anybody can give me some hints or direct me to some documents will be > > very appreciated! Thank you very much. > > > > yu jun > > > > yujun@huawei.com Your surprise really reflects that your design is not Blockram limited but gate/logic level limited where ASICs will stay about 5x faster or more. If you were not going to ASIC, your design might be considered slow since you could push any Blockrams to 200MHz or so, but then it is very difficult to do much cpu logic with only a few LUT levels per cycle. MicroBlaze (at 120MHz)is probably limited to multiplier delay as well as cpu logic levels long before hitting BlockRam limit, and I am sure its hand placed where needed to boot. For those designs that are truly Blockram limited, an ASIC memory won't be much faster than BlockRams for the same architecture spec & process, they are also likely made by same foundry on similar process. Ofcourse ASICs can offer custom compiled SRAMs to get a bit more speed and they do allow 5x more logic layers in that cycle. The note of 30ps nand gates, that compares to 3GHz P4 cycle of 330ps or about 10 gate delays. Although I am sure Intel doesn't use many gates as we know them but various high speed pass logic schemes so they are using much shorter transit times. Also SRAMs have for decades had access times of about 10 gate delays too. And the old supercomputer designers used to clock cpus in 10 ECL layers of dotted logic, so I figure 10 Lut levels is fair enough cycle target. Luckily the carry chains we need are not done by Lut level logic or we would be really ____ed, but then we deal with switched wires instead. johnjakson_usa_comArticle: 62821
Hi Goran > > The new instruction in MicroBlaze for handling these locallinks are > simple but there is no HW scheduler in MicroBlaze. I have done processor > before with complete Ada RTOS in HW but it would be an overkill in a FPGA: > .. now that sounds like something we could chat about for some time. An Ada RTOS in HW certainly would be heavy, but the Occam model is very light. The Burns book on Occam compares them, the jist being that ADA has something for everybody, and Occam is maybe too light. Anyway they both rendezvous. At the beginning of my Inmos days we were following ADA and the iAPX32 very closely to see where concurrency on other cpus might go (or not as the case turned out). Inmos went for simplicity, ADA went for complexity. Thanks for all the gory details. > The Locallinks for MicroBlaze is 32-bit wide so they are not serial. > They can handle a new word every clock cycle. > > You could also connect up a massive array of MicroBlaze over FSL ala > transputer but I think that the usage of the FPGA logic as SW > accelarators will be a more popular way since FPGA logic can be many > magnitudes faster than any processor and with the ease of interconnect > as the FSL provides it will be the most used case. > I am curious what the typ power useage of MicroBlaze is per node, and has anybody actually tried to hook any no of them up. If I wanted large no of cpus to work on some project that weren't Transputers, I might also look at PicoTurbo, Clearspeed or some other BOPSy cpu array, but they would all be hard to program and I wouldn't be able to customize them. Having lots of cpus in FPGA brings up the issue of how to organize memory hierarchy. Most US architects seem to favor the complexity of shared memory and complicated coherent caches, Europeans seem to favor strict message passing (as I do). We agree that if SW can be turned into HW engines quickly and obviously, for the kernals, sure they should be mapped right onto FPGA fabric for whatever speed up. That brings up some points, 1st P4 outruns typ FPGA app maybe 50x on clockspeed. 2nd converting C code to FPGA is likely to be a few x less efficient than an EE designed engine, I guess 5x. IO bandwidth to FPGA engine from PC is a killer. It means FPGAs best suited to continuous streaming engines like real time DSP. When hooked to PC, FPGA would need to be doing between 50-250x more work in parallel just to be even. But then I thinks most PCs run far slower than Intel/AMD would have us believe because they too have been turned into streaming engines that stall on cache misses all too often. But SW tends to follow 80/20 (or whatever xx/yy) rule, some little piece of code takes up most of the time. What about the rest of it, it will still be sequential code that interacts with the engine(s). We would still be forced to rewrite the code and cut it with an axe and keep one side in C and one part in HDL. If C is used as a HDL, we know thats already very inefficient compared to EE HDL code. The Transputer & mixed language approach allows a middle road between the PC cluster and raw FPGA accelerator. It uses less resources than cluster but more than the dedicated accelerator. Being more general means that code can run on an array of cpus can leave decision to commit to HW for later or never. The less efficient approach also sells more FPGAs or Transputer nodes than one committed engine. In the Bioinformatics case, a whole family of algorithms need to be implemented, all in C, some need FP. An accelerator board that suits one problem may not suit others, so does Bio guy get another board, probably not. TimeLogic is an interesting case study, the only commercial FPGA solution left for Bio. My favourite candidate for acceleration is in our own backyard, EDA, esp P/R, I used to spend days waiting for it to finish on much smaller ASICs and FPGAs. I don't see how it can get better as designs are getting bigger much faster than pentium can fake up its speed. One thing EDA SW must do is to use ever increasingly complex algorithms to make up the short fall, but that then becomes a roadblock to turning it to HW so it protects itself in clutter. Not as important as the Bio problem (growing at 3x Moores law), but its in my backyard. rant_mode_off Regards johnjakson_usa_comArticle: 62822
Mario Trams <Mario.Trams@informatik.tu-chemnitz.de> wrote in message > > Hi John, > > do you know about this nice stuff developed by Cradle > (http://www.cradle.com) ? > > They have developed something like an FPGA. But the PFUs > do not consist of generic logic blocks but small processors. > That's perhaps something you would like :-) > > Regards, > Mario Thanks for pointer, I hadn't seen it yet, will take a peek.Article: 62823
Aaaah, now I got it! The problem was that I read somewhere that a SUB instruction generates a carry flag when subtracting a negative number and the result becomes too big. So I automatically assumed somehow that an ADD instruction also generates a carry flag when adding a negative number (which is wrong). I also forgot that when I have a negative number, also the high halfword will be FFFF. I thought it would be zero because the abs(i) is small enough to fit into the low halfword, but due to the sign it is NOT zero. Peter: I am not using the counter macro because this operation is one of many operations in an algorithm and the value needs to be in the RAM which is only connected to the processor. Thanks to everybody who helped me out with this.Article: 62824
Lots of good points in your reply, here is why I think these technologies don't apply to problem that requires large and fast memory. RLDRAM: very promising, but the densities do not seem to increase significantly over time (500Mbits now ~ 64MB). To the best of my knowledge, nobody is making DIMMS with these chips, so they're stuck as cache or network memory. RDRAM (RAMBUS): as you said, only the slowest parts can be used with FPGAs because of the very high frequency of the serial protocol. The current slowest RDRAMs run at 800 MHz, a forbidden range for FPGAs (Xilinx guys, please jump in and correct me if I'm wrong) Am I missing something? Are there any ASICs out there that interface memory DIMMS and FPGAs? Is there any way to use the rocket I/Os to communicate with memory chips? or maybe a completely different solution to the memory bottleneck not mentioned here? johnjakson@yahoo.com (john jakson) wrote in message news:<adb3971c.0311072139.6dab6951@posting.google.com>... > fortiz80@tutopia.com (Fernando) wrote in message news:<2658f0d3.0311071117.3bf6eaea@posting.google.com>... > > How fast can you really get data in and out of an FPGA? > > With current pin layouts it is possible to hook four (or maybe even > > five) DDR memory DIMM modules to a single chip. > > > > Let's say you can create memory controllers that run at 200MHz (as > > claimed in an Xcell article), for a total bandwidth of > > 5(modules/FPGA) * 64(bits/word) * 200e6(cycles/sec) * (2words/cycle) * > > (1byte/8bits)= > > 5*3.2GB/s=16GB/s > > > > Assuming an application that needs more BW than this, does anyone know > > a way around this bottleneck? Is this a physical limit with current > > memory technology? > > > > Fernando > > OTOH > > If you want more bandwidth than DDR DRAM, you could go for RamBus, > RLDRAM or the other NetRam or whatever its called. The RLDRAM devices > separate the I/Os for pure bandwidth, no turning the bus or clock > around nonsense and reduce latency from 60-80ns range down to 20ns or > so, that is true RAS cycle. > > Micron & Infineon do the RLDRAM, another group does the NetRam (Hynix, > Samsung maybe). > > The RLDRAM can run the bus upto 400MHz, double pumped to 800MHz and > can use most every cycle to move data 2x and receive control 1x. It is > 8 ways banked so every 2.5ns another true random access can start to > each bank once every 20ns. The architecture supports 8,16,32-36 bit > width IOs IIRC. Sizes are 256M now. I was quoted price about $20 > something, cheap for the speed, but far steeper than PC ram. Data can > come out in 1,2 or 4 words per address. Think I got all that right. > Details on Micron.com. I was told there are Xilinx interfaces for > them, I got docs at Xilinx but haven't eaten them yet. They also have > interfaces for the RamBus & NetRam. AVNET (??) also has a dev board > with couple of RLDRAM parts on them connected to a Virtex2 part, but I > think these are the 1st gen RLDRAM parts which are 250MHz 25ns cycle > so the interface must work. > > Anyway, I only wish my PC could use them, I'd willingly pay mucho $ > for a mobo that would use them but that will never happen. I quite > fancy using one for FPGA cpu, only I could probably keep 8 nested cpus > busy 1 bank each since cpus will be far closer to 20ns cycle than > 2.5ns. The interface would then be a mux-demux box on my side. The > total BW would far surpass any old P4, but the latency is the most > important thing for me. > > Hope that helps > > johnjakson_usa_com
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z