Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
On Jul 1, 4:15 am, Totally_Lost <air_b...@yahoo.com> wrote: > On Jun 25, 6:05 pm, John Williams <jwilli...@itee.uq.edu.au> wrote: > > > Regardless of the FPGA you choose, or the implementation of your main > > processing loop, don't even start until you've done a thorough IO / > > memory bandwidth analysis. Even in 2007 we're still seeing papers > > saying "we got 20X speed up in the core, then when we put it on the > > memory bus we got 1.5X". > > In many respects that is still putting the cart before the horse. > > One needs to step back, divine the global architecture of the > application, and make some determinations of if this is even a > practical problem for an FPGA, or should it maybe be moved to a more > optimal CPU/Cache/Memory architecture. For a large data set, memory > bandwidth isn't going to be substantially different, unless the > algorithm can be twisted to reduce memory bandwidth by a different > processing ordering or local caching. This is one area where a CPU/ > Cache/Memory architecture may well simply smoke any possible FPGA > design. > > For very large data sets, frequently processors like Itanium 2 with > 12MB and larger caches will smoke a typical PC or FPGA implementation > by simply getting rid of significant portions of the raw memory > bandwidth into faster Caches. > > One side effect of this is that the existing source code for the > application may have already been heavily optimized to squeeze every > possible memory cycle out of the problem. The resulting code is > probably larger, and possible way overly complex for a starting point > for any FPGA design. One might well have to reverse engineer the > original simpler algorithms first, in hopes that there is indeed an > embarassingly parallel FPGA solution that avoids the raw memory > bandwidth. > > Other cases of interest are choices in data path sizes ... they might > all be 64 bit, simply because it has been optimized for a high end HPC > engine that was 64bit native. After stepping back an looking at the > problem very closely from an architecture and requirements > perspective, sometimes insightes emerge that the problem doesn't need > all that significance and dynamic range everywhere .... allowing the > FPGA implemention to be partitioned to match the real problem needs, > not what had formally been done on the prior solution. > > Once architecture, data and algorithm issues are well understood, and > we have a fair idea what the processing kernel must do, then clearly > looking at matching interface designs is not only required, but > finally practical too. Designing interfaces before the architecture > and processing kernels are understood, is doing work toward an > undefined requirement. > > Performance by Design > John Check out the new NVIDIA Tesla GPU based compute station for scientific computing. http://www.nvidia.com/object/tesla_computing_solutions.html http://advancednano.blogspot.com/2007/06/nvidia-tesla-supercomputer-for-1500-to.html http://advancednano.blogspot.com/2007/06/more-specifications-of-nvidia-tesla.html Looks like the PCIe board will do 518 Gflops peak, for only $1500. The deskside version will do 1 Tflops peak for $7,500. They have their own C language API called CUDA... This might be significantly cheaper (time+money) than developing a custom FPGA board as well as the custom compute architecture.Article: 121326
Alex, Have you looked into: http://www.tapr.org/kits_dsp10.html Generally, using a FPGA for SDR is only required when the radio is doing something other than FM, AM, SSB, or low data rate digital mod/demod. http://hpsdr.org/ Details a 'solution' using the Cyclone FPGA. (This was before I gave my one day FPGA class to the TAPR folks). AustinArticle: 121327
Terje Mathisen wrote: > glen herrmannsfeldt wrote: >> I have been told that another reason is that it is too hard for >> engineers to learn and understand both C and verilog. This I also >> don't believe, but maybe that is just me. > Huh? > It is a pretty bad engineer who can only think in one language. As I said, I may not believe it, but in any case the thought process for C programming and verilog programming are different. Switching between, say, Fortran, C, and PL/I the thought process will be similar, though the details are different. > Always try to use the right tool for the job! I agree. -- glenArticle: 121328
Hello, I'm trying to decide to use an EPC16 or EPCS64 to program the Stratix II EP2S601020C3 on my board. Can any comment which method is better/faster? Altera's development kits are using the EPCS64 so I leaning that direction. Thanks, joeArticle: 121329
Xuint32 test1=0xFFFFFFFF; Xuint32 test2=0xBBBBBBBB; unsigned long long res64; res64 = ((unsigned long long) test1) * ((unsigned long long) test2) - Peter LilacSkin wrote: > Hello, > > I would like to do a 32bit multiplication. > The result must be stored in a "64bitregister". > > I did that : > > Xuint32 test1=0xFFFFFFFF; > Xuint32 test2=0xBBBBBBBB; > Xuint64 *res64; > res64 ->Upper = (test1 * test2) >> 32; > res64 ->Lower = test1 * test2; > > But it didn't work ! > Do you have an idea how to do that ? > > Regards, > Laurent. >Article: 121330
The definitions in xbasic_types.h are compiler independent. In other words, unfortunately, not all compilers support "unsigned long long". We also found that some of them do but the result of the computation was incorrect. GCC does support "unsigned long long" and produces the correct result. - Peter Sylvain Munaut wrote: >> You need to type cast test1 and test2 to be 64 bits before you >> multiply them together. >> >> Try: >> >> *res64 = ((Xuint64)test1) * ((Xuint64)test2); >> > > Nope ... that would work if xilinx used the "unsigned long long" type for their 64 bits type ... > But they didn't ... (don't ask me why they did that ...) > > > typedef struct > { > Xuint32 Upper; > Xuint32 Lower; > } Xuint64; > > > > SylvainArticle: 121331
I would like to know if I can use Chipscope to look at the V5 GTP outputs. I am using 4 GTP outputs at 3.125 Gbps and wanted to study the timing of itArticle: 121332
On Jul 2, 11:27 am, wallge <wal...@gmail.com> wrote: > Check out the new NVIDIA Tesla > GPU based compute station > for scientific computing. As I noted earlier, the FCCM presentation detailing costs for custom computing based on FPGA, GPU, and Cell processors showed that FPGA vendors have failed to capitalize on this market, and will very likely see it gone as GPU and Cell solutions kick butt. Xilinx and Altera have simply made it impossible to create general purpose computing platforms out of FPGAs with difficult to use (for programmers) software tools that sometimes take days to place and route ... plus license costs which are as much as off the shelf GPU solutions like these. It still however, comes down to what these devices can do for such real-world applications ... including raw memory bandwidth. I'd guess that the FPGA vendors do not even have a clue what it takes to successfully compete as systems vendors, and are too tight with their IP to allow anyone else to do it right either. JohnArticle: 121333
The output runs at 3.125 Gbps? How can that be? I've successfully clocked the Chipscope cores at around 220 MHz with no timing problems. But that is about as high as I could go. That was in a V4 at the lowest speed grade.Article: 121334
pbFJKD@ludd.invalid wrote: > Does Xilinx ISE benefit from Multi CPU setups? > Like offered by AMD Athlon64 X2, AMD Opteron, Intel Core2Duo etc..? > > Also would AMD AM2 socket + 800MHz DDR2 be really benefitial compared to > non DDR2 motherboards? > A multi-core or multi-CPU setup isn't going to run ISE faster (AFAIK), but your computer will still be a lot more responsive while running CPU intensive tasks. However ISE will run faster with CPUs featuring 4 MB L2 cache, such as th mid/high-end Intel Core2 Duo CPUs (if I recall correctly models E6320, E6420, E6600 and higher).Article: 121335
On Jun 30, 3:53 am, <darrick> wrote: > I would like to ask some questions regarding a Xilinx JTAG programmer: > > First, it seems that the programmer doesn't actually connect to the LPT > port because of gender mismatch. Luckily I have a parallel port gender > changer. Is this still ok? Are you sure that it's a parallel adapter? It could be one of the old serial adapters. Note that old PCs might have used the standard 25-pin serial port connector, which is the opposite sex of the 25-pin Centronics parallel connector. Anyways -- if the adapter doesn't fit into what you think is the paralel port, it's probably the wrong port or the wrong adapter. > Second point, I connect the programmer and start up xilinx ise and impact. > I get a message that many unknown devices are being detected, is this > normal? Depends on what's on your JTAG chain. If the only things on the chain are the FPGA and the config PROM but it reports other stuff, then that's a problem. > Last point, the download cable seems to be some sort of 2 x 8 block > socket, i.e. 16 pins, how do I identify the required pins i.e. > vdd,gnd,tdi,tms,tck,tdo? RTFM! -aArticle: 121336
Eddie H wrote: > I would like to know if I can use Chipscope to look at the V5 GTP outputs. I am using 4 GTP outputs at 3.125 Gbps and wanted to study the timing of it Eddie, Nope. Chipscope can probe the signals to, and from, a GTP (not a V5 issue at all), but in no way may it "observe' the GTP outputs themselves, AustinArticle: 121337
Nice rant, but it would make any difference. Video cards are an ASIC that is built for one thing and one thing only, massively parallel stream processing. An FPGA is a dynamic device that can do that plus a myrid of other things, the problem being that it could NEVER do the job as well as a specialized ASIC can. By the way, the programming model for the cell is by far worse than anything I've seen on an FPGA, and the programming model for video cards in the user domain isn't much better, but it's getting there as Nvidia prioritizes it higher. ---Matthew Hicks > On Jul 2, 11:27 am, wallge <wal...@gmail.com> wrote: > >> Check out the new NVIDIA Tesla >> GPU based compute station >> for scientific computing. > As I noted earlier, the FCCM presentation detailing costs for custom > computing based on FPGA, GPU, and Cell processors showed that FPGA > vendors have failed to capitalize on this market, and will very likely > see it gone as GPU and Cell solutions kick butt. Xilinx and Altera > have simply made it impossible to create general purpose computing > platforms out of FPGAs with difficult to use (for programmers) > software tools that sometimes take days to place and route ... plus > license costs which are as much as off the shelf GPU solutions like > these. > > It still however, comes down to what these devices can do for such > real-world applications ... including raw memory bandwidth. > > I'd guess that the FPGA vendors do not even have a clue what it takes > to successfully compete as systems vendors, and are too tight with > their IP to allow anyone else to do it right either. > > John >Article: 121338
I am prototyping a IP core which was written in verilog languge in cyclone II FPGA.My application engineer wrote code in C for application level.Can i simulate the both in cadence simulation environment so that i can find the bug in real environment .Can anyone suggest on this.I am in desperate situation.Please help me. Thanking you kumarArticle: 121339
On Jul 2, 7:47 pm, Matthew Hicks <mdhic...@uiuc.edu> wrote: > Nice rant, but it would make any difference. Video cards are an ASIC that > is built for one thing and one thing only, massively parallel stream processing. > An FPGA is a dynamic device that can do that plus a myrid of other things, > the problem being that it could NEVER do the job as well as a specialized > ASIC can. By the way, the programming model for the cell is by far worse > than anything I've seen on an FPGA, and the programming model for video cards > in the user domain isn't much better, but it's getting there as Nvidia prioritizes > it higher. Actually, many of the markets that FPGA computing works well, are the same that GPU/CELL will do equally well ... the difference is that IBM/ Cell and Nvidia/GPU companies know how to make a product successful in a systems HPC market. And at the same time, continue to limit applications for difficult to program FPGAs. The programming for GPU and Cell's isn't difficult for someone with SIMD, MIMD parallel programming experience, and a lot less difficult than designing custom "circuits" in FPGAs to solve these classes of problems. The point is, just as you note, an FPGA can be far more, with the right software tools ... which are lacking, leaving the market to Cell/ GPU solutions providing the right finished hardware solutions for HPC, and the software tools to back it up. A systems problem, not solved by FPGA mfgrs.Article: 121340
After some search on this group, at the internet and at Xilinx web site I have not found a conclusive set of informations regarding the behavior of a Spartan-3's input pin in a high voltage signaling. The circuit would use a 27Kohm series resistor to sense the presence of a 24V signal. It is a very slow signal and the series limiting resistor would use the ESD clamp diodes to keep the input voltage below the gate oxide limits. The 27K values was chosen to have the zero state input with maximum leakage current for this device (25uA). I understand the 10mA limit on the clamp diodes (100mA max for this device) would not be stressed with the near 1mA current flow but I wonder if this situation could somehow reduce the part's MTBF and in which amount. I'd like to avoid using an external diode to VCCO (or a zener also because it's knee) since any other part in the system will reduce the overall MTBF. In case it is important there will be 56 inputs in this condition in a TQ144 package. Using the help of this group I would like to include here another question: many documents at Xilinx says the clamp diodes are not present when the pin is configured as outputs. Is the electronic structure of these pins with such level of complexity that can avoid the parasitic diodes (a natural feature for a CMOS architecture)? The XAPP429 and device's data sheet also suggest the input pin structure for CoolRunner-II devices doesn't have clamp diode to VCCO. How can it deal with ESD without the diode to VCCO? Thanks in advance for your help in this matter. -AugustoArticle: 121341
On 2007-07-02, LilacSkin <lpaulo07@iseb.fr> wrote: > res64 ->Upper = (test1 * test2) >> 32; > res64 ->Lower = test1 * test2; > > But it didn't work ! > Do you have an idea how to do that ? C does what's called "integral promotion" when doing a math operation. Basically, if you do some math with two types, the compiler chooses a type for the intermediate results. The simplified rule is: if the operand types fit in an int, use an int, otherwise use an unsigned int. Your multiply on the first line is going to have a 32 bit result on any 32-bit-int system. That means when you shift it down 32, you'll get 0. The only way to get the compiler to do the math in a wider type is to explicitly force one of them to be wider. That's what all the casting is about in the other replies you've gotten. If it's not working, it's a flaw of your compiler. The PPC target of GCC could do what you want just fine. -- Ben Jackson AD7GD <ben@ben.com> http://www.ben.com/Article: 121342
On 2007-07-02, jjlindula@hotmail.com <jjlindula@hotmail.com> wrote: > Hello, I'm trying to decide to use an EPC16 or EPCS64 to program the > Stratix II EP2S601020C3 on my board. Can any comment which method is > better/faster? Altera's development kits are using the EPCS64 so I > leaning that direction. If you like the EPCS64 ($32), you'll love the ST M25P64 ($10). -- Ben Jackson AD7GD <ben@ben.com> http://www.ben.com/Article: 121343
AugustoEinsfeldt wrote: > After some search on this group, at the internet and at Xilinx web > site I have not found a conclusive set of informations regarding the > behavior of a Spartan-3's input pin in a high voltage signaling. > The circuit would use a 27Kohm series resistor to sense the presence > of a 24V signal. It is a very slow signal and the series limiting > resistor would use the ESD clamp diodes to keep the input voltage > below the gate oxide limits. The 27K values was chosen to have the > zero state input with maximum leakage current for this device (25uA). > I understand the 10mA limit on the clamp diodes (100mA max for this > device) would not be stressed with the near 1mA current flow but I > wonder if this situation could somehow reduce the part's MTBF and in > which amount. > I'd like to avoid using an external diode to VCCO (or a zener also > because it's knee) since any other part in the system will reduce the > overall MTBF. What about a resistor to GND ? - that will improve the noise immunity, as right now you are sensing very close to 1V, whilst 12V is better sense level for a 24V industrial type signal. > In case it is important there will be 56 inputs in this condition in a > TQ144 package. That's ~56mA of injection current - some devices have MAX limits on the alowable total injection. What else is the device doing ? Slow edges are also not great direct into a FPGA - how slow is very slow ?. > > Using the help of this group I would like to include here another > question: many documents at Xilinx says the clamp diodes are not > present when the pin is configured as outputs. Is the electronic > structure of these pins with such level of complexity that can avoid > the parasitic diodes (a natural feature for a CMOS architecture)? The > XAPP429 and device's data sheet also suggest the input pin structure > for CoolRunner-II devices doesn't have clamp diode to VCCO. How can it > deal with ESD without the diode to VCCO? The N FET avalanches, typically between 5 & 6V, and the energy is absorbed that way. -jgArticle: 121344
On Jul 2, 9:34 pm, Jim Granville <no.s...@designtools.maps.co.nz> wrote: > AugustoEinsfeldt wrote: > > After some search on this group, at the internet and at Xilinx web > > site I have not found a conclusive set of informations regarding the > > behavior of a Spartan-3's input pin in a high voltage signaling. > > The circuit would use a 27Kohm series resistor to sense the presence > > of a 24V signal. It is a very slow signal and the series limiting > > resistor would use the ESD clamp diodes to keep the input voltage > > below the gate oxide limits. The 27K values was chosen to have the > > zero state input with maximum leakage current for this device (25uA). > > I understand the 10mA limit on the clamp diodes (100mA max for this > > device) would not be stressed with the near 1mA current flow but I > > wonder if this situation could somehow reduce the part's MTBF and in > > which amount. > > I'd like to avoid using an external diode to VCCO (or a zener also > > because it's knee) since any other part in the system will reduce the > > overall MTBF. > > What about a resistor to GND ? - that will improve the noise immunity, > as right now you are sensing very close to 1V, whilst 12V is better > sense level for a 24V industrial type signal. > > > In case it is important there will be 56 inputs in this condition in a > > TQ144 package. > > That's ~56mA of injection current - some devices have MAX limits on > the alowable total injection. > What else is the device doing ? > > Slow edges are also not great direct into a FPGA - how slow is very slow ?. > > > I agree with Jim. Add a 3.3 or 3.9 kilohm transistor from each FPGA pin to ground. You get much better noise immunity and avoid all the (imaginary) diode current issues. Resistors are cheap, small, and very reliable... Peter Alfke ====================== > > Using the help of this group I would like to include here another > > question: many documents at Xilinx says the clamp diodes are not > > present when the pin is configured as outputs. Is the electronic > > structure of these pins with such level of complexity that can avoid > > the parasitic diodes (a natural feature for a CMOS architecture)? The > > XAPP429 and device's data sheet also suggest the input pin structure > > for CoolRunner-II devices doesn't have clamp diode to VCCO. How can it > > deal with ESD without the diode to VCCO? > > The N FET avalanches, typically between 5 & 6V, and the energy is > absorbed that way. > > -jgArticle: 121345
Gents, this is to let you know that departing the two clock signals into two separate cplds and xoring them with external single-gate-logic has improved the situation a lot. Thank you for your suggestions. Ulrich Bangert "John Larkin" <jjlarkin@highNOTlandTHIStechnologyPART.com> schrieb im Newsbeitrag news:lbkb83pbag42itmtl43up1h11gt2566124@4ax.com... > On Thu, 28 Jun 2007 11:02:38 +0200, "Ulrich Bangert" > <df6jb@ulrich-bangert.de> wrote: > > >Gents, > > > >please allow me to confront you with some strange timing behaviour which I > >have measured with an Xilinx XC95108 cpld. > > > >Consider two well conditioned clock signals of 10 MHz (both having EXACTLY > >the same frequency) entering the cpld. Inside the cpld each clock signal is > >divided by 4 by means of two d-flip-flops. The two resulting 2.5 Mhz signals > >enter an exclusive-or-gate which delivers an output signal where the > >pulse/pause-relationship directly depends on the phase relationship of the > >two input clocks. > > > >If some of you feel reminded to something that you have seen before: Yes, > >basically this is the principle of an so called linear phase comparator > >which has been used to compare high stability clocks (for example cesium > >clocks) against each other before high resolution time interval counters > >like the HP5370 or the Stanford Research SR620 were available. > > > >Now imagine one of the two clocks is de-tuned by exactly 0.001 Hz. It is a > >bit beyond the discussion HOW this is achieved but you may believe me that > >this is possible and that THIS is not part of the discussed problem. Now the > >phase relationship of the clocks changes slowly in time as does the > >pulse/pause relationship behind the xor gate. The pulse/pause relationship > >of the xor's output can be measured by two completely different methods: > > > >a) by generating an dc voltage which is directly proportional to the > >pulse/pause relationship (again a bit tricky if you want it to be an really > >high resolution measurement, but it can be done) > > > >b) by directly measuring the output pulse width with an high resolution time > >interval counter like the SR620 having a 25 ps single shot resolution for > >time interval measurements. > > > >It is important to note that both methods to measure can be applied at the > >same time and that both methods (although based on completely different > >physical laws) deliver results that despite some statistical fluctuations > >are basically the same. That is why I am pretty sure that what I measure is > >really an property of the signal itself and not one of the measurement > >apparatus. > > > >If I record the pulse width over time using the two methods and display it > >graphically it looks like an pretty linear relationship at the first glance. > >If however some math is applied to make it evident how good the linear > >relationship really is met then the result is that there are fluctuations in > >the pulse width in the order of some +/-450 ps from the expected values. > > > >About these fluctuations the following facts are known: > > > >1) They are not existent in the inputs clocks > > > >2) Expressed in time units as well as expressed as an dc voltage the > >fluctuations are orders of magnitude bigger than the resolution and > >precision of the time/dc measurement. > > > >3) The fluctuations are by no means of stochastical nature. Instead, If an > >positive fluctuation is noticed at an certain phase between the clock > >signals, an fluctuation of the same magnitude and sign will be noticed the > >next time when the clock signals have the same phase relationship. Or in > >other words: The pulse width is an direct function of the phase relationship > >of the clocks + an error function which is an direct function of the phase > >relationship between the clocks. > > > >It seems as if the phase state of one of the signals can have an linear like > >modulating effect on the phase state of the second signal (and perhaps vice > >versa). Some of you may come to the conclusion that +/-450 ps is not an > >number to cause real world troubles but in my case: The whole arrangement > >has the intention to measure phase fluctuations of the input clocks that ARE > >REALLY THERE but that are smaller at least one order of magnitude than the > >noticed errors. And that is why +/-450 ps is an real annoying number for me. > > > >Any hint will be highly appreciated > >TIA, Ulrich Bangert > > > > > > > > Sounds like crosstalk internal to the cpld, and likely the fact that > the xor gate behaves differently in the case where one edge changes, > as opposed to when both edges change simultaneously. > > I'd suggest using discrete logic, ECL or Eclips for serious > performance. > > A d-type flipflop makes a good phase detector, too. > > John >Article: 121346
Ulrich Bangert wrote: > Gents, > > this is to let you know that departing the two clock signals into two > separate cplds and xoring them with external single-gate-logic has improved > the situation a lot. Good to hear that :) Can you quantify "improved the situation a lot", so readers can know what the relative jitter levels are ? Which/how many single gate devices did you use ? & which CPLDs ? -jgArticle: 121347
Jim Granville wrote: >> In case it is important there will be 56 inputs in this condition in a >> TQ144 package. > > > That's ~56mA of injection current - some devices have MAX limits on > the alowable total injection. Another caution for this situation - is to watch the IccIO of the FPGA does not fall below ~56mA - if it does, you will need a SINKING regulator, to avoid the supply rails being pulled up - IF the VccIO pulls high, that WILL do serious things to your MTBF! :) For Source/Sinking regulators, look at DDR Termination regulators. -jgArticle: 121348
Matthieu <m.a.t.t.h.i.e.u.m.i.c.h.o.n@laposte.net> writes: > However ISE will run faster with CPUs featuring 4 MB L2 cache, such as > th mid/high-end Intel Core2 Duo CPUs (if I recall correctly models > E6320, E6420, E6600 and higher). To give you a single data point, my MAP/PAR time reduced by about 20% when i went from a 2MB Core2Duo (E6400?) to 4MB (E6600) at the same clock speed (2.4GHz). BTW, going to the 2MB Core2Duo from my previous 3GHz P4 halved the time! Cheers, Martin -- martin.j.thompson@trw.com TRW Conekt - Consultancy in Engineering, Knowledge and Technology http://www.conekt.net/electronics.htmlArticle: 121349
Hi All, does anybody has any experience with the Xilinx <-> Philips PX1011A solutions? Thanks in advance, Francesco
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z