Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Hello all ... I know this is an odd request, but i'm looking for some old xilinx software to program some xilinx xc3000 parts. If anybody has such software they are looking to give away/sell (or know where to find) ... let me know. Thanks and Regards -ChrisArticle: 88476
for simplify use attribyte SYN_ENUM_ENCODING and SYN_ENCODING = "original" and don't forget to disable FSM CompilerArticle: 88477
Hi Guys i am working on the image processing side and i wanted to impliment a function on the APEX20KE device and the clock that i want to use is the internal PLL so Kindly let me know how can i do that.Article: 88478
Hi According to synthesis/implementation report, there is no indication that the clock is gated. This example is just an exercise. My intention is just to make the value of an internal signal 'counter' change " 0->1->2->3->4" and see that in ChipScope Pro. Problem is that the ILA core captures only static value '4', even though 'RST' and 'EN' signal are driven by VIO "on the fly". If it is possible to see the change of the value, instead of a static value, I am doing something wrong. What am I missing? Thankyou.Article: 88479
Paul Hartke <phartke@Stanford.EDU> wrote: > Since you didn't list the errors, it is hard to tell. I refrain from posting so many lines, but I put it on the net: http://www.minet.uni-jena.de/~adi/xflow.log > My sense is two Microblazes in the S3 Starter Kit will be a real > tight squeeze. Oh, that's good to hear. There were guessing that even five Microblazes could fit into the S3, but I never agreed with this optimistic expectation. ;) -- mail: adi@thur.de http://adi.thur.de PGP: v2-key via keyserver Jeder is sein Glückes Schmied, doch nich jeder hat ein schmuckes GliedArticle: 88480
replication is usually from two sources... 1/ the fanout as you suggest 2/ speed improvement 3/ borg I would suggest there is little hope for your design as you have too much logic. Correct answer is get a bigger device :-) However.. take a look at any memories... if they are distributed and not block they will eat memory Then look at shift registers.. SLR16 ? Think about what you are trying to achieve and see if there's a simpler solution. Simon "Brandon" <killerhertz@gmail.com> wrote in message news:1124375215.587610.307000@g43g2000cwa.googlegroups.com... > Hello, > > I'm synthesizing a design in XST and I'm having a hard time figuring > out what's consuming all of the devices resources. > > I wrote mostly structural VHDL, so I decided to synthesize each > component separately to get a better idea of the low level utilization. > I haven't seen any option in XST to see a hierarchal analysis of > area... Anyway, I estimated the resource consumption of my design, > excluding routing, the FSM, and some other small amounts of logic and > multiplexing: > > Slice Count Slice FFs 4-input LUTs > ----------- --------- ------------ > used: 10936 29048 12406 > total: 23616 47232 47232 > ----------- --------- ------------ > 46.31% 61.50% 26.27% > > Here is the actual: > Number of Slices: 45523 out of 23616 192% (*) > Number of Slice Flip Flops: 22611 out of 47232 47% > Number of 4 input LUTs: 78378 out of 47232 165% (*) > > > When looking in the synthesis report, I noticed some warnings > indicating that duplicate FFs were removed, so that explains the > reduction in FF count. However, I cannot explain the HUGE increase in > LUT and Slice usage. What can I infer from this? > > The report also tells me that some of my 6-bit counter signals are > being replicated (once or twice). What is the cause of this? High > fan-out? > <SNIP> > FlipFlop cnt_dout_ins_cnt_v_0 has been replicated 2 time(s) > FlipFlop cnt_dout_ins_cnt_v_1 has been replicated 1 time(s) > FlipFlop cnt_hreg_ins0_cnt_v_0 has been replicated 2 time(s) > FlipFlop cnt_hreg_ins0_cnt_v_1 has been replicated 1 time(s) > FlipFlop cnt_hreg_ins10_cnt_v_0 has been replicated 2 time(s) > FlipFlop cnt_hreg_ins10_cnt_v_1 has been replicated 1 time(s) > FlipFlop cnt_hreg_ins11_cnt_v_0 has been replicated 2 time(s) > </SNIP> > > Is there anyway to decipher the cell usage count perhaps? Does anyone > have a URL that includes an explanation of all the cell names? I also > checked the macro statistics and everything is accounted for in that > table. > > Thanks. > -Brandon >Article: 88481
Click on http://answers.altera.com/altera/resultDisplay.do?page=http%3A%2F%2Fwww.altera.com%2Fliterature%2Fan%2Fan115.pdf&result=12&responseid=b0b5a30041c7c737%3A1381e7%3A105cd8c9f6e%3A34&groupid=1&contextid=195%3A4512.4608%2C36605.36685%2C43030.43213&clusterName=DefaultCluster&doctype=1002&excerpt=APEX+20K+Devices+APEX+20K+devices+have+one+PLL+that+features+ClockLock+and+ClockBoost+circuitry.#Goto4512 If that does not work go www.altera.com. Click on Find Answers under Support on the altera.com home page. Enter APEX20K and PLL in the ask a question. It is the 5th answer from the top labeled AN115: Using the Clocklock and ClockBoost features. Hope this helps. Subroto Datta Altera Corp. "Designfreek" <vryaag@gmail.com> wrote in message news:1124429585.211289.227250@z14g2000cwz.googlegroups.com... > Hi Guys > i am working on the image processing side and i wanted to impliment a > function on the APEX20KE device and the clock that i want to use is the > internal PLL so Kindly let me know how can i do that. >Article: 88482
You are not using any Service Packs. I'd upgrade to both the latest EDK and ISE service packs as a first step. Don't forget that no matter what you are doing, its very likely that more logic than just the Microblaze core(s) itself will be required to make a useful system. Paul Adrian Knoth wrote: > > Paul Hartke <phartke@Stanford.EDU> wrote: > > > Since you didn't list the errors, it is hard to tell. > > I refrain from posting so many lines, but I put it on the net: > > http://www.minet.uni-jena.de/~adi/xflow.log > > > My sense is two Microblazes in the S3 Starter Kit will be a real > > tight squeeze. > > Oh, that's good to hear. There were guessing that even five > Microblazes could fit into the S3, but I never agreed with > this optimistic expectation. ;) > > -- > mail: adi@thur.de http://adi.thur.de PGP: v2-key via keyserver > > Jeder is sein Glückes Schmied, doch nich jeder hat ein schmuckes GliedArticle: 88483
Hi Designfreek, > Hi Guys > i am working on the image processing side and i wanted to impliment a > function on the APEX20KE device and the clock that i want to use is the > internal PLL so Kindly let me know how can i do that. I guess Subroto's answer is as complete as it can get. Just a few comments: Be sure that you're targeting an APEX20KEblablabla-X device - the non-X devices' PLLs are untested. I'd personally pick a newer Altera family member, such as the Cyclone or Stratix devices - much more versatile PLLs and much higher performance - but I guess you've got to work using an existing board. Best regards, BenArticle: 88484
Pasacco wrote: > Hi > > According to synthesis/implementation report, there is no indication > that the clock is gated. > > This example is just an exercise. > My intention is just to make the value of an internal signal 'counter' > change " 0->1->2->3->4" and see that in ChipScope Pro. > > Problem is that the ILA core captures only static value '4', even > though 'RST' and 'EN' signal are driven by VIO "on the fly". > > If it is possible to see the change of the value, instead of a static > value, I am doing something wrong. What am I missing? > > Thankyou. > You sent me more information on this in email and added another note here. Try doing this. 1) In the ILA window change the trigger condition to a falling edge transition of the RESET line instead of the static low that you have now. 2) In the VIO window pulse the RESET button 3) Go back to the ILA window and you should see 0, 1, 2, 3, 4, 4, 4, 4... EdArticle: 88485
Open a webcase, We will support old software if you tell us what you need, Austin Chris Beg wrote: > Hello all ... > > I know this is an odd request, but i'm looking for some old xilinx software > to program some xilinx xc3000 parts. If anybody has such software they are > looking to give away/sell (or know where to find) ... let me know. > > Thanks and Regards > -Chris > > >Article: 88486
Hello, Does anybody already made a comparison of the high performance FPGA (Stratix II, V4, ?) relative to double precision floating point performance (add, mult, div, etc.) ? It's for an HPC aplication. Thanks MarcArticle: 88487
Hi Falling edge trigger, and now it works. Thankyou for nice comment and correction. With gratitudeArticle: 88488
Does anybody know how to program access to the Altera USB Blaster? I am trying to port Altera's SRunner software, which currently only supports a ByteBlaster II on a Windows environment to support the USB Blaster (since this is the only download cable I have) on either Windows or Linux. Any help would be appreciated, Altera does not seem to release the API. -- Dr Joachim Schambach The University of Texas at Austin Department of Physics 1 University Station C1600 Austin, Texas 78712-0264, USA Phone: (512) 471-1303; FAX: (814) 295-5111 e-mail: jschamba@physics.utexas.eduArticle: 88489
I have inherited a nearly-working FPGA SDRAM controller but my testing shows I have got the structure wrong, partly due to lack of data on Kingston's site. The module in question is the Kingston KVR133X64C3/1G. The verilog I have inherited caters for 11 column bits, 13 row bits, 4 banks and two select lines. The module has sixteen chips on it which I thought might be eight bit each so there would have to be two chip select lines. But my testing shows something wrong with the way I assign row/column/bank/cs. Maybe it is in fact 16 off 4 bit chips and just the one chip select but a test assuming that shows I'm still losing a bit somewhere. So what is the structure of this module and does the column go out on A0-A9(,A11,A12) ? Googling throws up surprisingly little data given that I'm not out to buy them. JonArticle: 88490
Marc, IEEE floating point standard? You need to be more specific. Does it need to integrate with a processor? I believe the Xilinx IBM 405 Power PC using the APU interface in Virtex 4 with the floating point IP core provides the best and fastest performance. Especially since no other FPGA vendor has a hardened processor to compete with us. If all you want is the floating point processing, without a microprocessor, then I think you will find similar performance between Xilinx and our competition, with us (of course) claiming the superior performance edge. It would not surprise me at all to see them also post claiming they are superior. For a specific floating point core, with a given precision, for given features, it would be pretty easy to bench mark, so there is very little wiggle room here for marketing nonsense. I would be interested to hear from others (not competitors) about what floating point cores they use, and how well they perform (as you obviously are interested). Austin Marc Battyani wrote: > Hello, > > Does anybody already made a comparison of the high performance FPGA (Stratix > II, V4, ?) relative to double precision floating point performance (add, > mult, div, etc.) ? > > It's for an HPC aplication. > > Thanks > > Marc > >Article: 88491
Marc Battyani (Marc.Battyani@fractalconcept.com) wrote: : Hello, : Does anybody already made a comparison of the high performance FPGA (Stratix : II, V4, ?) relative to double precision floating point performance (add, : mult, div, etc.) ? : It's for an HPC aplication. Hi Marc, I don't have a comparisom of various cores but a lot of info is out there in datasheets. However, in an HPC application the performance of your maths cores may not be the bottleneck, rather it is likely to be a question of how fast can you interface the host system to the FPGA, how fast can you shunt data around between CPU, CPU RAM, FPGA and FPGA RAM etc. The heavyweight HPC/FPGA hybrid systems I have seen, such as the Cray-XD1 and SGI NUMAflex/Altix stuff use Xilinx FPGAs. Although I wouldn't want to generalise for the whole field, other interested parties such as Nallatech and Starbridge Systems tend to go for Xilinx. Certianly Xilinx seem to have a head start in the field (not thanks to their tools from the word on the street :-) - possibly this has more to do with interfacing than FP core performance. Not answering the origional question, but there you go :-) Cheers, Chris (A strong believer in FPGA type stuff for HPC, although perhaps the granularity is less than optional and the tools not very well suited, but hey it's early days.) : Thanks : MarcArticle: 88492
While an x86, or cell cluster could whip FPGA at IEEE FPU in raw clock speed ( I am not sure about cost though), you can flip the odds some by defining your own numerics with a direct mapping to the plentifull 18bit muls. If I am not mistaken IEEE is not the be all and end all of FPU and has a certain no of detractors esp in some fields regarding rounding, exceptions etc. If you do define your own FP set you can simulate it farely easly right on your HPC app and see if it gives comparable results. For instance 1,2 or4 multipliers running a 37b mantissa might be enough to not use double IEEE, only you can figure that out. I think I even go for a custom cpu design with a highly serial by 18.18 datapath and try to pump it as fast as the fabric will allow. I notice that the soft core FPUs out there don't run anywhere near the 300MHz speeds being quoted for mul units. Perhaps the V4 500MHz DSP block can be microcoded into a decent FPU unit but as soon as you need the odd features, Anyway I think thats what I would do, if that doesn't work too well then I look at qinetix and other vendors, these links can be found on the X,A sites. So what is your app and what hardware are you running on?Article: 88493
"Austin Lesea" <austin@xilinx.com> wrote > > Marc, > > IEEE floating point standard? You need to be more specific. IEEE 754. It's for a computational accelerator. It will get values from a general purpose processor (Xeon, Itanium, etc.) and send the results back in the same format. Though the internal computations could be done in another format. The other stuff needed is pretty standard (PCI(-X or Express), DDR2, etc. ) > Does it need to integrate with a processor? No. > I believe the Xilinx IBM 405 Power PC using the APU interface in Virtex > 4 with the floating point IP core provides the best and fastest performance. > > Especially since no other FPGA vendor has a hardened processor to > compete with us. OK, that one is easy. ;-) > If all you want is the floating point processing, without a > microprocessor, then I think you will find similar performance between > Xilinx and our competition, with us (of course) claiming the superior > performance edge. The idea is to hardwire some formula by doing the maximum of concurrent FLOP. This is the only way to go faster than a very fast processor like an Itanuim II or even a simple Xeon. > It would not surprise me at all to see them also post claiming they are > superior. > > For a specific floating point core, with a given precision, for given > features, it would be pretty easy to bench mark, so there is very little > wiggle room here for marketing nonsense. > > I would be interested to hear from others (not competitors) about what > floating point cores they use, and how well they perform (as you > obviously are interested). Sure! And this time it should be easy to get useful technical numbers. MarcArticle: 88494
JJ, Perhaps you should read: http://www.xilinx.com/bvdocs/ipcenter/data_sheet/floating_point.pdf first? At 429 MHz for a Virtex 4 for a square root, that is 56 clocks, or 130.5 ns for the answer. 7.663 million floating point sqrure roots per second. And, if you need more, you can implement more than one core, and get more than one answer per 56 clocks.... I am not aware of any x86 that can run quite that fast (even for one core). Their claims are that the floating point hardware unit speeds up the software exection by at least a factor of 5. We are talking here about a speedup of 80 to 100 times over using fixed point integer software to emulate a floating point square root....not a factor of 5! Austin JJ wrote: > While an x86, or cell cluster could whip FPGA at IEEE FPU in raw clock > speed ( I am not sure about cost though), you can flip the odds some by > defining your own numerics with a direct mapping to the plentifull > 18bit muls. > > If I am not mistaken IEEE is not the be all and end all of FPU and has > a certain no of detractors esp in some fields regarding rounding, > exceptions etc. If you do define your own FP set you can simulate it > farely easly right on your HPC app and see if it gives comparable > results. For instance 1,2 or4 multipliers running a 37b mantissa might > be enough to not use double IEEE, only you can figure that out. > > I think I even go for a custom cpu design with a highly serial by 18.18 > datapath and try to pump it as fast as the fabric will allow. I notice > that the soft core FPUs out there don't run anywhere near the 300MHz > speeds being quoted for mul units. Perhaps the V4 500MHz DSP block can > be microcoded into a decent FPU unit but as soon as you need the odd > features, > > Anyway I think thats what I would do, if that doesn't work too well > then I look at qinetix and other vendors, these links can be found on > the X,A sites. > > So what is your app and what hardware are you running on? >Article: 88495
"c d saunter" <christopher.saunter@durham.ac.uk> wrote : > Marc Battyani (Marc.Battyani@fractalconcept.com) wrote: > : Hello, > > : Does anybody already made a comparison of the high performance FPGA (Stratix > : II, V4, ?) relative to double precision floating point performance (add, > : mult, div, etc.) ? > > : It's for an HPC aplication. > > Hi Marc, > I don't have a comparisom of various cores but a lot of info is out > there in datasheets. > > However, in an HPC application the performance of your maths cores may not > be the bottleneck, rather it is likely to be a question of how fast can > you interface the host system to the FPGA, how fast can you shunt data > around between CPU, CPU RAM, FPGA and FPGA RAM etc. Yes, memory bandwidth is one of the bottlenecks, especially for the general purpose processors. > The heavyweight HPC/FPGA hybrid systems I have seen, such as the Cray-XD1 > and SGI NUMAflex/Altix stuff use Xilinx FPGAs. Very interesting. In fact this is what we want to do (on a smaller scale probably ;-) I find it somewhat depressing to see that Cray can't come up with something much better than a bunch of FPGAs but at the same time it's very cool to have access to the same technology than Cray. Or even better as they seem to use Virtex II :) > Although I wouldn't want to generalise for the whole field, other > interested parties such as Nallatech and Starbridge Systems tend to go > for Xilinx. OK. > Certianly Xilinx seem to have a head start in the field (not thanks to > their tools from the word on the street :-) - possibly this has more to do > with interfacing than FP core performance. > > Not answering the origional question, but there you go :-) Well in fact I'm also interested by all the HPC/FPGA question anyway. > Cheers, > Chris > > (A strong believer in FPGA type stuff for HPC, although perhaps the > granularity is less than optional and the tools not very well suited, but > hey it's early days.) Sure, much fun anyway. MarcArticle: 88496
"JJ" <johnjakson@yahoo.com> wrote in message news:1124484934.397020.194050@g43g2000cwa.googlegroups.com... > While an x86, or cell cluster could whip FPGA at IEEE FPU in raw clock > speed ( I am not sure about cost though), you can flip the odds some by > defining your own numerics with a direct mapping to the plentifull > 18bit muls. Using a grid is fine when the problem can be parallelized with a rather coarse granularity but it's not always the case. > If I am not mistaken IEEE is not the be all and end all of FPU and has > a certain no of detractors esp in some fields regarding rounding, > exceptions etc. If you do define your own FP set you can simulate it > farely easly right on your HPC app and see if it gives comparable > results. For instance 1,2 or4 multipliers running a 37b mantissa might > be enough to not use double IEEE, only you can figure that out. Yes, I though about using a 36 bit mantissa to reduce the number of hard multiplier needed and the latency. The input/ouputs need to be in IEEE754 though. > I think I even go for a custom cpu design with a highly serial by 18.18 > datapath and try to pump it as fast as the fabric will allow. I notice > that the soft core FPUs out there don't run anywhere near the 300MHz > speeds being quoted for mul units. Perhaps the V4 500MHz DSP block can > be microcoded into a decent FPU unit but as soon as you need the odd > features, > > Anyway I think thats what I would do, if that doesn't work too well > then I look at qinetix and other vendors, these links can be found on > the X,A sites. > > So what is your app and what hardware are you running on? The apps can be rather diverse. In fact as Chirstopher pointed out, it looks like we are doing some kind of small Cray-XD1 ;-) As for the hardware, we are designing it. MarcArticle: 88497
Hi Austin Very interesting, but V4,S3E is still pretty darn new, I don't check on it every 5mins but QinetiQ is definitely hot in this area (not surprising given their (sq) roots at RSRE). At some point I will do a detailed study of FPGA FPU design v x86 FPU numbers for my transputer project. >From the OPs website I can't guess what iron he'd use but the application seems a bit clearer now. Usually when I see HPC-FPGA, I might infer somebody working with Opteron+VirtexII Pro sytems like Cray, SGI kits but doesn't look like it here. Regards JJArticle: 88498
JJ, Something I just couldn't find anywhere was the actual performance of the x86 co-processor for something like a floating point square root. We have clock cycles for each IEEE floating point operator, and the speed of the synthesized palced and routed core for various families, from Spartan 3 to Virtex4 in that pdf file. I suppose uP software people don't really care about performance in terms of cycles or ns or mops....its all about what game screen graphics are displayed in the coolest fashion.... Does anyone have a link to such a site that has 'real' data of floating point op performance? Austin JJ wrote: > Hi Austin > > Very interesting, but V4,S3E is still pretty darn new, I don't check on > it every 5mins but QinetiQ is definitely hot in this area (not > surprising given their (sq) roots at RSRE). > > At some point I will do a detailed study of FPGA FPU design v x86 FPU > numbers for my transputer project. > >>From the OPs website I can't guess what iron he'd use but the > application seems a bit clearer now. > > Usually when I see HPC-FPGA, I might infer somebody working with > Opteron+VirtexII Pro sytems like Cray, SGI kits but doesn't look like > it here. > > Regards > > JJ >Article: 88499
Austin Well I can't help with any FPU perf links just yet, I would have thought that QinetiQ would have a lot of that. I always thought sq root was in the same ballpark as division for cycles. The comp.arch NG often has sessions on FP math, some of the regulars are fairly clued up on it esp Nick M. As they say the only good benchmark is your own application and usually there's many other factors involved than raw SpecFP nos. Atleast FPGAs can be fairly "transparent" (if you can put a prototype synthesis together) where as figuring timing for OoO code can be tricky. JJ
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z