Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
On Fri, 02 Jul 1999 10:21:16 -0700, Peter Alfke <peter@xilinx.com> wrote: >I forgot them completely, sorry. >So, Lucent and Atmel are now the only remaining companies that handle their PLD >business as a sideline, deriving most of their revenues from other product lines. All >other PLD companies are "pure players". Considering that Lucent Technologies had something like a $23 BILLION revenue in 1997 (as I recall) it would be a little difficult for even the combined revenue of Xilinx, Altera and everybody else in the quaint little PLD industry to amount to anything more than a "sideline" of that size of revenue! However, as I'm sure Peter knows, the key in this game is MARGIN, right? Most PLD "players" manage to scrape by on margins that silicon foundries and vendors of standard products can only dream of. Cheers Stuart For Email remove "NOSPAM" from the addressArticle: 17151
> > Steven Casselman <sc@vcc.com> wrote in article > > <376FF120.6A53D036@vcc.com>... > > > A PCI target on an computer using an Intel PCI > > > bridge can expect 80MBytes/sec on transfers > > > going to a board and about 10-12MBytes/sec > > > comming from a board. > > > > > > These numbers vary. > > > > Do you have more specific data? As you said, these numbers can be all over > > the place, so including a bit more about your 'results' would certainly be > > helpful... > > > > What was the size of the transfer? > > Sustained or burst? > > From where to where? > > What chip set? > > What CPU? > > > > Using the CPU to do the transfer, you might see that for a single (whose > > transfer size would be limited by the x86 instruction set) transfer, but > > certainly not sustained. > > > > Austin > > The program takes an word (32-bits) buffer( 256 to 4K bytes on > word boundaries) then the CPU transfers this to board. Just > WinTell boxes. You have to write a little assembly > program (posted long ago) and call that for each of the transfers. Not quite what I was looking for. You said you have seen 80M writing from the CPU to a PCI target. I would like to know the specifics under which you saw 80M/sec. I have never seen anything close to that, except for a single CPU 'sized' burst transfer. Your read numbers seem more in line. AustinArticle: 17152
ronak@hclt.com wrote: > Hi, > > Does anybody know how to use Virtex Block ram in Verilog > HDL based Synopsys synthesis flow. The datasheets and app > notes gives only info about architecture and symbols but > not about how to put it in HDL based flow. > > Thanks in advance > -ronak > > Sent via Deja.com http://www.deja.com/ > Share what you know. Learn what you don't. Hello If you have the symbols, you can just instatiate them and create the macro with Coregen. If Synopssys can recognize it, an other solution is to write the Verilog source code. It is a classic memory with appropriated depth and I/O. kunze@fbh_berlin.deArticle: 17153
As an experiment, I am trying to prototype SUN's picoJava-II processor (sans caches and FPU) on a Virtex 1000. However, PAR has already been working on the problem for 73h on a 300Mhz UltraSPARC-II machine and appears to be stuck after placement and the detection/disabling of circuit loops. I am very much willing to continue running PAR, but at the moment, it is not clear that anything useful is happening at all. The process has grown to over 700MB (no problem, this is a 1GB RAM machine) and does not perform any system calls (checked with truss). Should I be more patient, or is the tool just spinning its wheels? Thanks, Andreas Koch -- -- -- -- -- -- -- -- BEGIN PAR OUTPUT -- -- -- -- -- -- -- PAR: Xilinx Place And Route M1.5.25. Copyright (c) 1995-1998 Xilinx, Inc. All rights reserved. Fri Jul 2 15:44:16 1999 par -w -n 5 -l 5 -c 2 -d 2 picojava picojava.dir Constraints file: picojava.pcf Loading device database for application par from file "picojava.ncd". "cpu" is an NCD, version 2.27, device xcv1000, package bg560, speed -6 Loading device for application par from file 'v1000.nph' in environment /cad/xilinx. Device speed data version: x1_0.80 1.81 Advanced. Writing design to file "/var/tmp/xil_AAAa00117". Device utilization summary: Number of External GCLKIOBs 1 out of 4 25% Number of External IOBs 171 out of 404 42% Number of SLICEs 9056 out of 12288 73% Number of GCLKs 1 out of 4 25% Number of TBUFs 96 out of 12544 1% Overall effort level (-ol): 5 (set by user) Placer effort level (-pl): 5 (default) Placer cost table entry (-t): 1 Router effort level (-rl): 5 (default) Timing method (-kpaths|-dfs): -kpaths (default) Starting initial Timing Analysis. REAL time: 1 mins 11 secs 10607 circuit loops found and disabled. Finished initial Timing Analysis. REAL time: 15 mins 28 secs Starting initial Placement phase. REAL time: 15 mins 31 secs Finished initial Placement phase. REAL time: 17 mins 43 secs Writing design to file "picojava.dir/5_5_1.ncd". Starting the placer. REAL time: 17 mins 54 secs Placer score = 15545624 Placer score = 16969702 Placer score = 15249491 Placer score = 14640617 Placer score = 14266472 Placer score = 14042333 Placer score = 13540962 Placer score = 12933388 Placer score = 12478199 Placer score = 11958896 Placer score = 11519577 Placer score = 11223801 Placer score = 10854976 Placer score = 10546610 Placer score = 10270522 Placer score = 9925386 Placer score = 9577246 Placer score = 9242148 Placer score = 8931156 Placer score = 8628203 Placer score = 8308718 Placer score = 8056360 Placer score = 7820600 Placer score = 7567580 Placer score = 7221818 Placer score = 7067358 Placer score = 6841047 Placer score = 6652239 Placer score = 6496303 Placer score = 6298430 Placer score = 6121752 Placer score = 5995660 Placer score = 5840426 Placer score = 5681994 Placer score = 5575427 Placer score = 5440834 Placer score = 5331105 Placer score = 5236587 Placer score = 5139598 Placer score = 4988285 Placer score = 4919016 Placer score = 4832865 Placer score = 4731888 Placer score = 4639983 Placer score = 4552282 Placer score = 4491128 Placer score = 4413904 Placer score = 4336166 Placer score = 4263983 Placer score = 4204713 Placer score = 4142192 Placer score = 4072188 Placer score = 4019799 Placer score = 3959659 Placer score = 3925500 Placer score = 3867834 Placer score = 3820891 Placer score = 3778140 Placer score = 3735613 Placer score = 3698179 Placer score = 3650582 Placer score = 3617989 Placer score = 3590920 Placer score = 3552244 Placer score = 3527766 Placer score = 3490500 Placer score = 3462919 Placer score = 3445960 Placer score = 3370117 Placer score = 3304339 Placer score = 3255391 Placer score = 3218705 Placer score = 3194370 Placer score = 3175918 Placer score = 3164362 Placer score = 3156328 Placer score = 3151332 Placer score = 3148281 Placer score = 3146661 Placer completed in real time: 4 hrs 44 mins 7 secs Writing design to file "picojava.dir/5_5_1.ncd". Starting Optimizing Placer. REAL time: 4 hrs 44 mins 24 secs Optimizing . Swapped 471 comps. Xilinx Placer [1] 3138123 REAL time: 4 hrs 45 mins 23 secs Optimizing . Swapped 42 comps. Xilinx Placer [2] 3137363 REAL time: 4 hrs 46 mins 18 secs Optimizing . Swapped 10 comps. Xilinx Placer [3] 3137216 REAL time: 4 hrs 47 mins 12 secs Finished Optimizing Placer. REAL time: 4 hrs 47 mins 12 secs Writing design to file "picojava.dir/5_5_1.ncd". Starting IO Improvement. REAL time: 4 hrs 47 mins 30 secs Placer score = 3088022 Finished IO Improvement. REAL time: 4 hrs 47 mins 32 secs Total REAL time to Placer completion: 4 hrs 48 mins Total CPU time to Placer completion: 4 hrs 47 mins 22 secs 10607 circuit loops found and disabled. -- Andreas Koch Email : koch@eis.cs.tu-bs.de Technische Universit"at Braunschweig Phone : x49-531-391-2384 Abteilung Entwurf integrierter Schaltungen Phax : x49-531-391-5840 Gaussstr. 11, D-38106 Braunschweig, Germany * PGP key available on request *Article: 17154
How many quantity for your order and which your country Lewis Karim LIMAM ¼¶¼g©ó¤å³¹ <7lfnhf$oa1$1@arcturus.ciril.fr>... >Hi, > >I'm looking for the prices of the Altera Flex 10K (10K40 .. 10K130E). Has >somebody an idea ? > >Thanks. > > >kerim el imem > >Article: 17155
Ouch, I stand corrected! ----------------------------------------------------------- Steven K. Knapp OptiMagic, Inc. -- "Great Designs Happen 'OptiMagic'-ally" E-mail: sknapp@optimagic.com Web: http://www.optimagic.com ----------------------------------------------------------- Barry Gershenfeld wrote in message <377BD95B.468D@centercomm.com>... >Steven K. Knapp wrote: >> >> There are several companies that provide free or low-cost software for FPGAs >> and CPLDs. Check out The Programmable Logic Jump Station at >> http://www.optimagic.com/lowcost.shmtl. The site also has links to > ^^^ > shtml > >Moral: NEVER type a URL--paste it! > >:-) Barry >Article: 17156
Operand size is one problem. You need a 56 bit multiplier to do single precision FP multiplication. In addition/subtraction you need to align the operands, add them and then normalize. This operation takes up a large number of logic levels, much larger than integer addition. FPUs are usually done in full custom designs. Even standard cell is not that performance/area efficient for a full IEEE compliant FPU. Roland Paterson-Jones <rolandpj@bigfoot.com> wrote: >Hi > >It has been variously stated that fpga's are no good for floating point >operations. Why? As I see it, floating point operations are typically >just shifted integer operations. Is the bit-width problematic? > >Thanks for any help/opinion >Roland muzo Verilog, ASIC/FPGA and NT Driver Development Consulting (remove nospam from email)Article: 17157
Hi All, The web newsgroup interface at http://www.dacafe.com/newsgroups is the best way I've found to access newsgroups. It lets me view every available newsgroup, see and listen to binaries, and its really easy to use. Try it out. You'll find that it will save you time and effort. Best regards, David HellerArticle: 17158
Andreas Koch wrote: > As an experiment, I am trying to prototype SUN's picoJava-II processor > (sans caches and FPU) on a Virtex 1000. However, PAR has already been > working on the problem for 73h on a 300Mhz UltraSPARC-II machine and > appears to be stuck after placement and the detection/disabling of > circuit loops. > > I am very much willing to continue running PAR, but at the moment, it > is not clear that anything useful is happening at all. The process has > grown to over 700MB (no problem, this is a 1GB RAM machine) and > does not perform any system calls (checked with truss). > > Should I be more patient, or is the tool just spinning its wheels? > > Thanks, > Andreas Koch > > > 10607 circuit loops found and disabled. > -- > Andreas Koch Email : koch@eis.cs.tu-bs.de > Technische Universit"at Braunschweig Phone : x49-531-391-2384 > Abteilung Entwurf integrierter Schaltungen Phax : x49-531-391-5840 > Gaussstr. 11, D-38106 Braunschweig, Germany * PGP key available on request * Hello I would stop it and try to understand why some part of the design is disabled. Anyway, do you think the result will be convenient after disabled logic. Hope this helps, Michel Le Mer Gerpi sa (Xilinx Xpert) 3, rue du Bosphore Alma city 35000 Rennes (France) (02 99 51 17 18) http://www.xilinx.com/company/consultants/partdatabase/europedatabase/gerpi.htmArticle: 17159
Hi, all I need some help about the Xilink FPGA. I notice that when I modify a circuit module in a FPGA (Sparten), the other part in the same chip is subject to change too. After the carefully examining, I found that it is because some clock driven signal cause a unexpected large delay which is produced by different route path. For example, a global clock is chosen to drive a divider or a counter, and an output of this divider or counter is used to directly trigger or switch the other function modules, this signal normally cause an extremely large delay, what's more, the delay is varying with each new implantation (cause the change of the route path), so at last it produces some undesirable result. I was annoyed to put fire in the circuit design. If I also connect this signal to the global clock, the problem can be solved. But the global resource is limited. So can somebody give some useful advice? If I change to vhdl language not the schematic editor , can the problem be solved? I appreciate your kindness help! Sent via Deja.com http://www.deja.com/ Share what you know. Learn what you don't.Article: 17160
Hello, Does anyone have recommendations to find benchmark circuits for FPGA - preferrably in VHDL ? Thanks in advance. email: csoolan@dso.org.sgArticle: 17161
This is a multi-part message in MIME format. --------------F0F52631316D31D2B5D62486 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I think, your problem that you use for second clock not dedicated resources. You have to locate all your clocks on PRI or SEC buffers(with appropriate synthesis attributes). You can also minimize the number of clocks using clock enable inputs. Hope this helps. lingleq@my-deja.com wrote: > Hi, all > > I need some help about the Xilink FPGA. > > I notice that when I modify a circuit module in a FPGA (Sparten), the other > part in the same chip is subject to change too. After the carefully > examining, I found that it is because some clock driven signal cause a > unexpected large delay which is produced by different route path. For > example, a global clock is chosen to drive a divider or a counter, and an > output of this divider or counter is used to directly trigger or switch the > other function modules, this signal normally cause an extremely large delay, > what's more, the delay is varying with each new implantation (cause the > change of the route path), so at last it produces some undesirable result. > I was annoyed to put fire in the circuit design. If I also connect this > signal to the global clock, the problem can be solved. But the global > resource is limited. So can somebody give some useful advice? If I change to > vhdl language not the schematic editor , can the problem be solved? > > I appreciate your kindness help! > > Sent via Deja.com http://www.deja.com/ > Share what you know. Learn what you don't. --------------F0F52631316D31D2B5D62486 Content-Type: text/x-vcard; charset=us-ascii; name="vcard.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for Ilia Oussorov Content-Disposition: attachment; filename="vcard.vcf" begin: vcard fn: Ilia Oussorov n: Oussorov;Ilia org: Robert Bosch GmbH, FV/FLI adr: P.O.Box 10 60 50;;;Stuttgart;;D-70049;Germany email;internet: fliser6@fli.sh.bosch.de tel;work: +49-(0)-711-8117057 tel;fax: +49-(0)-711-8117602 x-mozilla-cpt: ;0 x-mozilla-html: TRUE version: 2.1 end: vcard --------------F0F52631316D31D2B5D62486--Article: 17162
It sounds like you are using logic outputs as clocks elsewhere in the design. FPGAs are best designed as synchronous circuits with just one or a very small number of clocks. This means using synchronous counters instead of ripple counters for example. Using this design technique, all of the clocked logic is clocked by a common clock. The individual flip-flops can be controlled with the clock enable at the CLB or in the CLB logic. In cases where delays are still critical, it may be necessary to floorplan the design. Floorplanning means manually directing the placement of CLBs on the die using either the RLOC attributes and FMAP and HMAP primitives, the constraints file, or the graphical floorplanner tool. Floorplanning will pretty much eliminate timing variations each time the tool is run. Changing to VHDL will only complicate the problem, as VHDL intentionally insulates the user from the FPGAs structure. lingleq@my-deja.com wrote: > Hi, all > > I need some help about the Xilink FPGA. > > I notice that when I modify a circuit module in a FPGA (Sparten), the other > part in the same chip is subject to change too. After the carefully > examining, I found that it is because some clock driven signal cause a > unexpected large delay which is produced by different route path. For > example, a global clock is chosen to drive a divider or a counter, and an > output of this divider or counter is used to directly trigger or switch the > other function modules, this signal normally cause an extremely large delay, > what's more, the delay is varying with each new implantation (cause the > change of the route path), so at last it produces some undesirable result. > I was annoyed to put fire in the circuit design. If I also connect this > signal to the global clock, the problem can be solved. But the global > resource is limited. So can somebody give some useful advice? If I change to > vhdl language not the schematic editor , can the problem be solved? > > I appreciate your kindness help! > > Sent via Deja.com http://www.deja.com/ > Share what you know. Learn what you don't. -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 17163
David Heller wrote: > > Hi All, > > The web newsgroup interface at http://www.dacafe.com/newsgroups > is the best way I've found to access newsgroups. > > It lets me view every available newsgroup, see and listen to > binaries, and its really easy to use. > > Try it out. You'll find that it will save you time and effort. > > Best regards, > > David Heller Gee, lots of advertisements. Why not just point my web browser to the ISP's mail server? With Netscape I can sort by thread. Sometimes I use Deja News for searches. -- Steve Nordhauser Embedded Systems Manager Phone: (518) 283-7500 InterScience, Inc. Fax: (518) 283-7502 105 Jordan Road email: nords@intersci.com Troy, NY 12180 web: http://www.intersci.com "Any sufficiently advanced technology is indistinguishable from magic." - Arthur C. ClarkeArticle: 17164
Article: 17165
While we are on the FP in FPGA discussion... What algorithms exist for floating point counters? I would like to impliment a 10 bit mantissa, 5 bit exponent FP counter that could incriment in roughly 8 clock cycles. I have very little interest in using a full FP adder for the obvious reasons. -Trevor Landon landont@ttc.com Jan Gray <jsgray@acm.org.nospam> wrote in message news:DZrf3.21$qH2.1013@paloalto-snr1.gtei.net... > Roland Paterson-Jones wrote in message <377DC508.D5F1D048@bigfoot.com>... > >It has been variously stated that fpga's are no good for floating point > >operations. Why? As I see it, floating point operations are typically > >just shifted integer operations. Is the bit-width problematic? > > For 16-bit floats with (say) 10 bit mantissas, FPGAs should be *great* for > floating point. Indeed problems start with the wider bit-widths. The > area-expensive (and worse than linear scaling) FP components are the barrel > shifters needed for pre-add mantissa operand alignment and post-add > normalization in the FP adder, and of course the FP multiplier array. > > The FCCM papers on this subject include: > > Ligon et al, A Re-evaluation of the practicality of floating-point > operations on FPGAs, FCCM 1998 > > Louca et al, Implementation of IEEE single precision floating point addition > and multiplication on FPGAs, FCCM 1996 > > Shirazi et al, Quantitative analysis of floating point arithmetic on FPGA > based custom computing machines, FCCM 1995 > > and the neat Leong paper on avoiding the problem entirely, > > Leong et al, Automating floating to fixed point translation and its > application to post-rendering 3D warping, FCCM 1999 > > > See the Ligon paper for a nice presentation of speed-area tradeoffs of > various implementation choices. Ligon estimates their single-precision FP > adder resource use at between 563 and 629 LUTs -- 36-40% of a XC4020E. Note > this group used a synthesis tool; a hand-mapped design could be smaller. > > Put another way, that single precision FP adder is almost twice the area of > a pipelined 32-bit RISC datapath. Ouch. > > > The rest of this article explores ideas for slower-but-smaller FP adders. > > The two FP add barrel shifters are the problem. They each need many LUTs > and much interconnect. For example, a w-bit-wide barrel shifter is often > implemented as lg w stages of w-bit 2-1 muxes, optionally pipelined. > > Example 1: single-precision in << s, w=24 > m0 = s[0] ? in[22:0] << 1 : in; > m1 = s[1] ? m0[21:0] << 2: m0; > m2 = s[2] ? m1[19:0] << 4 : m1; > m3 = s[3] ? m2[15:0] << 8 : m2; // 16 wires 8 high > out = s[4] ? m3[7:0] << 16 : m3; // 8 wires 16 high > ---- > 5*24 2-1 muxes = 120 LUTs > > Example 2: double-precision in << s, w=53 > m0 = s[0] ? in[51:0] << 1 : in; > m1 = s[1] ? m0[50:0] << 2: m0; > m2 = s[2] ? m1[48:0] << 4 : m1; > m3 = s[3] ? m2[44:0] << 8 : m2; // 45 wires 8 high > m4 = s[4] ? m3[36:0] << 16 : m3; // 37 wires 16 high > out = s[5] ? m4[20:0] << 32 : m4; // 21 wires 32 high > ---- > 6*53 2-1 muxes = 318 LUTs > > In a horizontally oriented datapath, the last few mux stages have many > vertical wires, each many LUTs high. This is more vertical interconnect > than is available in one column of LUTs/CLBs, so the actual area can be much > worse than the LUT count indicates. > > > BUT we can of course avoid the barrel shifters, and do FP > denormalization/renormalization iteratively. > > Idea #1: Replace the barrel shifters with early-out iterative shifters. For > example, build a registered 4-1 mux: w = mux(in, w<<1, w<<3, w<<7). Then an > arbitrary 24-bit shift can be done in 5 cycles or less in ~1/3 of the area. > For double precision, make it something like w = mux(in, w<<1, w<<4, w<<12), > giving an arbitrary 53-bit shift in 8 cycles. > > > Idea #2: (half baked and sketchy) Do FP addition in a bit- or nibble-serial > fashion. > > To add A+B, you > > 1) compare exponents A.exp and B.exp; > 2) serialize A.mant and B.mant, LSB first; > 3) swap (using 2 2-1 muxes) lsb-serial(A.mant) and lsb-serial(B.mant) if > A.exp < B.exp > 4) delay lsb-serial(A.mant) in a w-bit FIFO for abs(A.exp-B.exp) cycles; > 5) bit-serial-add delay(lsb-serial(A.mant)) + lsb-serial(B.mant) for w > cycles > 6) collect in a "sum.mant" shift register > 7) shift up to w-1 cycles (until result mantissa is normalized). > > It may be that steps 4 and 6 are quite cheap, using Virtex 4-LUTs in shift > register mode -- they're variable tap, right? > > It is interesting to consider eliminating steps 2, 6, and 7, by keeping your > FP mantissa values in the serialized representation between operations, > counting clocks since last sum-1-bit seen, and then normalizing (exponent > adjustment only) and aligning *both* operands (via swap/delay) on input to > the next FP operation. A big chained data computation might exploit many > serially interconnected serial FP adders and serial FP multipliers... > > Is this approach better (throughput/area) than a traditional pipelined > word-oriented FP datapath? Probably not, I don't know. But if your FP > needs are modest (Mflops not 100 Mflops) this approach should permit quite > compact FP hardware. > > (Philip Freidin and I discussed this at FCCM99. Thanks Philip.) > > Jan Gray > > >Article: 17166
Trevor Landon wrote: > > While we are on the FP in FPGA discussion... > > What algorithms exist for floating point counters? I would like to > impliment a 10 bit mantissa, 5 bit exponent FP counter that could incriment > in roughly 8 clock cycles. > > I have very little interest in using a full FP adder for the obvious > reasons. I don't know that a counter is compatible with a floating point format. If you have a 10 bit mantissa, once you reach a count of 1023 how do you continue to increment by one? At a value of 1024 (or 2048 if you use a hidden bit I guess) the lsb value is 2. So incrementing by one will not change the value of the counter. If the increment value is an arbitrary size, then you need an adder. So maybe I don't understand what you are trying to do. -- Rick Collins rick.collins@XYarius.com remove the XY to email me. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design Arius 4 King Ave Frederick, MD 21701-3110 301-682-7772 Voice 301-682-7666 FAX Internet URL http://www.arius.comArticle: 17167
Check out app note 130, http://www.xilinx.com/xapp/xapp130.pdf module MYMEM (CLK, WE, ADDR, DIN, DOUT); input CLK, WE; input [8:0] ADDR; input [7:0] DIN; output [7:0] DOUT; wire logic0, logic1; //synopsys dc_script_begin //set_attribute ram0 INIT_00 "0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF" -type string //set_attribute ram0 INIT_01 "FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210" -type string //synopsys dc_script_end assign logic0 = 1'b0; assign logic1 = 1'b1; RAMB4_S8 ram0 (.WE(WE), .EN(logic1), .RST(logic0), .CLK(CLK), .ADDR(ADDR), .DI(DIN), .DO(DOUT)); //synopsys translate_off defparam ram0.INIT_00 = 256'h0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF; defparam ram0.INIT_01 = 256'hFEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210; //synopsys translate_on endmodule Le mer Michel wrote: > > ronak@hclt.com wrote: > > > Hi, > > > > Does anybody know how to use Virtex Block ram in Verilog > > HDL based Synopsys synthesis flow. The datasheets and app > > notes gives only info about architecture and symbols but > > not about how to put it in HDL based flow. > > > > Thanks in advance > > -ronak > > > > Sent via Deja.com http://www.deja.com/ > > Share what you know. Learn what you don't. > > Hello > > If you have the symbols, you can just instatiate them and create the > macro with Coregen. > > If Synopssys can recognize it, an other solution is to write the Verilog > source code. It is a classic memory with appropriated depth and I/O. > > kunze@fbh_berlin.de -- Paulo //\\\\ | ~ ~ | ( O O ) __________________________________oOOo______( )_____oOOo_______ | . | | / 7\'7 Paulo Dutra (paulo@xilinx.com) | | \ \ ` Xilinx hotline@xilinx.com | | / / 2100 Logic Drive (800) 255-7778 | | \_\/.\ San Jose, California 95124-3450 USA | | Oooo | |________________________________________oooO______( )_________| ( ) ) / \ ( (_/ \_)Article: 17168
> > > Not quite what I was looking for. You said you have seen 80M writing from > the CPU to a PCI target. I would like to know the specifics under which > you saw 80M/sec. I have never seen anything close to that, except for a > single CPU 'sized' burst transfer. Your read numbers seem more in line. > > Austin You have to use the assembly code to get the Intel chip set to aggregate the writes otherwise you will see something more in the range of 20-40 meg/sec. // word is unsigned int void PCICore::write(word addr, word data, word count) { // lines stuff up word addr = ((addr<<2) | _offset) + _memBase; word *dptr = &data; __asm { push edi push ecx push esi mov esi, dptr mov edi, addr mov ecx, count cld rep movsd pop esi pop ecx pop edi } #endif }// end write -- Steve Casselman, President Virtual Computer Corporation http://www.vcc.comArticle: 17169
For FPGAs with carry chains, you'll find it extremely hard to beat the performance of a ripple carry adder for widths to around 32 bits. For multiplication in FPGAs, you might look at the summary I have on my website under the DSP in FPGAs page. -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 17170
I think the counter has to be fixed point with as many bits as is required to represent the maximum count desired. If it were floating point, you'd need a complementary counter underneath it to resolve increment values below the precision of the mantissa as the count increases. If you need a floating point output from the counter, you would use a fixed point counter of the appropriate width followed by a normalizing barrel shifter. Trevor Landon wrote: > While we are on the FP in FPGA discussion... > > What algorithms exist for floating point counters? I would like to > impliment a 10 bit mantissa, 5 bit exponent FP counter that could incriment > in roughly 8 clock cycles. > > I have very little interest in using a full FP adder for the obvious > reasons. > > -Trevor Landon > landont@ttc.com > > Jan Gray <jsgray@acm.org.nospam> wrote in message > news:DZrf3.21$qH2.1013@paloalto-snr1.gtei.net... > > Roland Paterson-Jones wrote in message <377DC508.D5F1D048@bigfoot.com>... > > >It has been variously stated that fpga's are no good for floating point > > >operations. Why? As I see it, floating point operations are typically > > >just shifted integer operations. Is the bit-width problematic? > > > > For 16-bit floats with (say) 10 bit mantissas, FPGAs should be *great* for > > floating point. Indeed problems start with the wider bit-widths. The > > area-expensive (and worse than linear scaling) FP components are the > barrel > > shifters needed for pre-add mantissa operand alignment and post-add > > normalization in the FP adder, and of course the FP multiplier array. > > > > The FCCM papers on this subject include: > > > > Ligon et al, A Re-evaluation of the practicality of floating-point > > operations on FPGAs, FCCM 1998 > > > > Louca et al, Implementation of IEEE single precision floating point > addition > > and multiplication on FPGAs, FCCM 1996 > > > > Shirazi et al, Quantitative analysis of floating point arithmetic on FPGA > > based custom computing machines, FCCM 1995 > > > > and the neat Leong paper on avoiding the problem entirely, > > > > Leong et al, Automating floating to fixed point translation and its > > application to post-rendering 3D warping, FCCM 1999 > > > > > > See the Ligon paper for a nice presentation of speed-area tradeoffs of > > various implementation choices. Ligon estimates their single-precision FP > > adder resource use at between 563 and 629 LUTs -- 36-40% of a XC4020E. > Note > > this group used a synthesis tool; a hand-mapped design could be smaller. > > > > Put another way, that single precision FP adder is almost twice the area > of > > a pipelined 32-bit RISC datapath. Ouch. > > > > > > The rest of this article explores ideas for slower-but-smaller FP adders. > > > > The two FP add barrel shifters are the problem. They each need many LUTs > > and much interconnect. For example, a w-bit-wide barrel shifter is often > > implemented as lg w stages of w-bit 2-1 muxes, optionally pipelined. > > > > Example 1: single-precision in << s, w=24 > > m0 = s[0] ? in[22:0] << 1 : in; > > m1 = s[1] ? m0[21:0] << 2: m0; > > m2 = s[2] ? m1[19:0] << 4 : m1; > > m3 = s[3] ? m2[15:0] << 8 : m2; // 16 wires 8 high > > out = s[4] ? m3[7:0] << 16 : m3; // 8 wires 16 high > > ---- > > 5*24 2-1 muxes = 120 LUTs > > > > Example 2: double-precision in << s, w=53 > > m0 = s[0] ? in[51:0] << 1 : in; > > m1 = s[1] ? m0[50:0] << 2: m0; > > m2 = s[2] ? m1[48:0] << 4 : m1; > > m3 = s[3] ? m2[44:0] << 8 : m2; // 45 wires 8 high > > m4 = s[4] ? m3[36:0] << 16 : m3; // 37 wires 16 high > > out = s[5] ? m4[20:0] << 32 : m4; // 21 wires 32 high > > ---- > > 6*53 2-1 muxes = 318 LUTs > > > > In a horizontally oriented datapath, the last few mux stages have many > > vertical wires, each many LUTs high. This is more vertical interconnect > > than is available in one column of LUTs/CLBs, so the actual area can be > much > > worse than the LUT count indicates. > > > > > > BUT we can of course avoid the barrel shifters, and do FP > > denormalization/renormalization iteratively. > > > > Idea #1: Replace the barrel shifters with early-out iterative shifters. > For > > example, build a registered 4-1 mux: w = mux(in, w<<1, w<<3, w<<7). Then > an > > arbitrary 24-bit shift can be done in 5 cycles or less in ~1/3 of the > area. > > For double precision, make it something like w = mux(in, w<<1, w<<4, > w<<12), > > giving an arbitrary 53-bit shift in 8 cycles. > > > > > > Idea #2: (half baked and sketchy) Do FP addition in a bit- or > nibble-serial > > fashion. > > > > To add A+B, you > > > > 1) compare exponents A.exp and B.exp; > > 2) serialize A.mant and B.mant, LSB first; > > 3) swap (using 2 2-1 muxes) lsb-serial(A.mant) and lsb-serial(B.mant) if > > A.exp < B.exp > > 4) delay lsb-serial(A.mant) in a w-bit FIFO for abs(A.exp-B.exp) cycles; > > 5) bit-serial-add delay(lsb-serial(A.mant)) + lsb-serial(B.mant) for w > > cycles > > 6) collect in a "sum.mant" shift register > > 7) shift up to w-1 cycles (until result mantissa is normalized). > > > > It may be that steps 4 and 6 are quite cheap, using Virtex 4-LUTs in shift > > register mode -- they're variable tap, right? > > > > It is interesting to consider eliminating steps 2, 6, and 7, by keeping > your > > FP mantissa values in the serialized representation between operations, > > counting clocks since last sum-1-bit seen, and then normalizing (exponent > > adjustment only) and aligning *both* operands (via swap/delay) on input to > > the next FP operation. A big chained data computation might exploit many > > serially interconnected serial FP adders and serial FP multipliers... > > > > Is this approach better (throughput/area) than a traditional pipelined > > word-oriented FP datapath? Probably not, I don't know. But if your FP > > needs are modest (Mflops not 100 Mflops) this approach should permit quite > > compact FP hardware. > > > > (Philip Freidin and I discussed this at FCCM99. Thanks Philip.) > > > > Jan Gray > > > > > > -- -Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email randraka@ids.net http://users.ids.net/~randrakaArticle: 17171
Tom Kean wrote: > > > > > > I would argue with that. Its more like modern reconfigurable computers > > > were not possible before CMOS technology got to a certain level of > > > capability. Ross Freeman's genius was building a team that could turn > > > his idea into an industry and hitting the market just when process > > > technology made the overhead of reconfiguration economically viable. > > > > > > Tom. > > > > I certainly agree that Ross put together a great team. > > But by that logic Jules Verne invented interplanetary > > flight but did not have the team to make it happen. > > I don't think thats a fair analogy: the guys in the 60's and 70's were > engineers > not science fiction writers and some of them built working > systems. They published literaly hundreds of technical papers. They > had configuration memory controlling programmable function units and > switches. Their big problem was that their configuration memory was > shift registers > built from logic gates and their multiplexers were also built from logic > gates > and you did not get that many logic gates on a chip at that time. > > Tom. You have a point. Lots of great work has been done by lots of great engineers. The line of where something is new or not is very thin in places. Maybe you'll agree to this statement "Modern reconfigurable computing could not begin untill Ross Freeman invented the FPGA." With the thought that "Modren" distinguishes the current type (FPGA based) of reconfigurable computers from projects that came before the FPGA. -- Steve Casselman, President Virtual Computer Corporation http://www.vcc.comArticle: 17172
I believe the posted example code of division by a 2^N constant can not be mapped to a right shift of N bits due to the use of a negative range in the integer operand. I don't have my VHDL references at hand (and I haven't ever used integer types in code for synthesis), but a quick check of the code with ModelTech shows VHDL signed integer division to be implemented using a "symmetrical around zero" rather than a "floored to -infinity" signed divide algorithm; as a result, the upper bits of the "q" counter do not match the output port "sigma" for negative "q" values. |-symmetrical-||-- floored --| q/8 q/8 int. 7 bit int. 4 bit int. 4 bit q q sigma sigma sigma sigma ______________________________________________ 9 0001_001 1 0001 1 0001 8 0001_000 1 0001 1 0001 7 0000_111 0 0000 0 0000 . . . . . . 1 0000_001 0 0000 0 0000 0 0000_000 0 0000 0 0000 -1 1111_111 0 0000 -1 1111 . . . . . . -7 1111_001 0 0000 -1 1111 -8 1111_000 -1 1111 -1 1111 -9 1110_111 -1 1111 -2 1110 I'm not sure why Synplify has a go at building the extra divider hardware for the version with the '8' on the same line vs. dying with errors when the '8' is defined as a generic; in any case, the synthesized hardware is probably not what you would be expecting: the seven bit counter is followed by some sign dependent offset adder logic to produce the four output bits. Changing the range of the integers to positive values ("range 0 to 2*span-1") eliminates the Synplify error message when using the generic, and builds the 'expected' seven bit counter with the upper four bits directly connected to "sigma" port. Brian Davis In article <7ljcg9$nq5$1@nnrp1.deja.com>, ehiebert@my-deja.com wrote: > I spoke to Synplicity about this exact problem. I was trying to code a > div, and here is what they said. > > "Division is only supported for compile time constants that can be > guarenteed to be a power of 2." > > The key here is compile time. In order to do a division of vectors, I > had to write my own divide unit. The problem is, Synplify 5.1.4 has > problems synthesizing my divide unit. It is optimizing out the divisor > register that I have coded. no word back from Synplicity on this one > yet, except to say this is a new bug..... REALLY?!?!? > > Eldon. > > In article <376577f7.2542275@news.u-net.com>, > jonathan@oxfordbromley.u-net.com wrote: > > Synplify version 5.1.1 (free with Actel Desktop) can't cope with the > > following code. A very old version of Synplify (2.5) processes it > > just fine; FPGA Express has no problem. The compiler blows up > > on the line indicated, but in fact the culprit is a couple of > > lines later: if I make the divisor a hard- coded constant instead > > of the generic, all is well. Before I whinge to Synplicity, has > > anyone come across this one? BTW, I know that the generic 'span' > > has to be a power of 2; in the real thing there's an assert to > > test that. This example intentionally lobotomised. > > > > library ieee; > > use ieee.std_logic_1164.all; > > > > entity accum is > > generic (span: natural := 8); > > port ( > > clk, rst, UpNotDown: in std_logic; > > sigma: out integer range -span to span-1 > > ); > > end accum; > > > > architecture counter of accum is > > -- compiler blows up at the following line: > > signal q: integer range -span*span to span*span-1; > > begin > > sigma <= q/span; -- change to "q/8" and all is OK > > count_proc: process (clk, rst) > > begin > > if rst='1' then > > q <= 0; > > elsif rising_edge(clk) then > > if UpNotDown='1' then > > q <= q+1; > > else > > q <= q-1; > > end if; > > end if; > > end process; > > end counter; > > > > Jonathan Bromley > > > > > > Sent via Deja.com http://www.deja.com/ > Share what you know. Learn what you don't. > Sent via Deja.com http://www.deja.com/ Share what you know. Learn what you don't.Article: 17173
If can use it,how write ? Thanks!Article: 17174
The 4000XLA IOB contains a "tristate register" that can disable the tristate output buffer (OBUFT). Opening a 4000XLA design in EPIC and looking into any IO block shows this register. In EPIC it is called "TRIFF". But if I instantiate a primitive element TRIFF in an XNF file, then 'ngdbuild' does not recognize ist. Xilinx Support didn't even know anything about this register. How can I use the new tristate flip-flop in the 4000XLA IOB?
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z