Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
The best luck I've had at that speed is to use a guideline that each pipeline stage can only be a flop and a simple equation, most of your logic should only be maybe 3 levels of LUTs between flops. You can probably go up to 4 or 5 levels of LUTs in A FEW places but for those make sure you've constrained the logic so these long paths remain fully within a small floorplannable block (small meaning under 50 or 100 LUTs.) Another thing to be cautions about is Block RAMs - they have an access time of almost 50% of your cycle. Try to put RAM outputs directly into flops, maybe a 2-way mux then a flop but any more and you'll spend your life floorplanning and re-running P&R. Also, some of the RAM setups are 20% of your cycle so be careful there, too. Also be careful, don't expect a given signal to drive a lot of loads. I had a wide data path so I sliced it (8 or 16 bits per slice) so I could floorplan to a pretty low level; also I created a copy of the controls for each slice. It helps to exploit the technology, such as minimizing logic myself and implementing it directly instancing primitives. Doing functional decomposition of a large equation so that the final stage is a MUXF5 is very effective, for example. Good luck! -StanArticle: 49476
Austin Franklin wrote: > > "Phil Hays" <SpamPostmaster@attbi.com> wrote in message > news:3DD05DC9.1395672B@attbi.com... > > Mike Treseler wrote: > > > > > > Phil Hays wrote: > > > > > > > Austin Franklin wrote: > > > > > > >>That's simply not true. The Alpha CPUs were designed using schematic > capture > > > > > > > ... by a large building full of designers. > > > > > > Who no longer work for Digital Equipment Corp. > > > > Yea. But fairness requires me to point out that schematic entry wasn't > > the reason why DEC failed. > > Thanks for the laugh, Phil. I never even thought of that reply in that way > ;-) I didn't the first time as well. Glad to be of service, Austin. -- Phil HaysArticle: 49477
Yep, 2V6000 @ 200 MHz. 1) The carry chains are slow. In a -4 device you'll barely make 200 MHz with 20 bit carry chains, and that is if the router is being nice that day (it probably won't make 200 in a densely packed arithmetic design, not because of the silicon but because the router is lazy). 2) The LUTs and routing are quite fast. As long as the logic is placed close together you can easily do at least 3 or 4 layers of logic between flip-flops. YOu may have to do some floorplanning though, as the placer doesn't place the second level of LUTs very intelligently. Unfortunately, if you are using synthesis, inferred LUT names change from run to run, so you'll have to work around that. 3) We do a fair amount of building up hierarchical blocks starting with primitives. That lets us put RLOCs in the VHDL and structurally generate the data paths. Using hierarchy and doing the placement hierarchiaclly like that saves a ton of time in the floorplanner. Too bad the tools can't do hierarchical floorplanning (no, Xilinx they don't. To be hierarchical you need to be able to nest placement in multiple levels). The RLOCs are hierarchical, so you can do hierarchical floorplanning from within the source. 4) Occasionally you need to use syn_keeps or syn_preserves to enforce inferred structures. 5) We don't bother with Amplify. With our structural construction technique, it doesn't offer much value added. For someone working strictly from RTL it may be useful. 6) Multi pass PAR helps a little. Unfortunately, the biggest problem with the current tools is that the router gives up too easily. It used to be that a good placement got you pretty consistent routing results regardless of the effort level because the old algorithm found shortest routes for all connections, and only compromised when there were conflicts. The new router (4.x and on) nails down a few critical paths based on estimated slack, then just routes the remaining runs willy-nilly without considering the obvious shortest routes. As long as it makes timing, no problem (other than the fact that you've just increased your power consumption dramatically, and made nearly every net a critical net, and in dense designs needlessly congested routing making timing closure a dubious proposition. 7) The cost tables are more or less non-deterministic as soon as you make any changes to the design. It helps to run multi-pass to run through a number of cost tables. Somethimes you get lucky. Amy Mitby wrote: > Does anyone have any general suggestions or remarks > from past work on a 200 MHz large Virtex2 design? > For instance, did you have to do things like: > - add input and output flops for each module and pipeline > extensively within modules? > > - other RTL tricks? > - use a physical synthesis tool like Amplify? > - run multi-pass place and route? > - use different cost tables? > - hand place some or all of the design? > - etc... -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 49478
Have you ever tried to push on a rope? Not too effective, is it? You can often get about the same level of satisfaction trying to get a synthesis tool to generate the structure you want (and then the next version of the tools changes everything). aaron wrote: > what does 'pushing the rope' mean, ray? > > aaron -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 49479
I bid one job that was to have 160 channels in one filter last year. Ken Mac wrote: > Hello folks, > > Xilinx coregens DA filter supports up to 8 channels for some of the FIR > filter types. > > Could you please let me know what is the largest number of channels you have > used/seen used through a single FIR filter of any type (including > rate-changing) on an FPGA? > > Thanks for your time, > > Ken -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 49480
You will get higher performance (as measured by minimum clock period) by putting those registers on your outputs rather than none at all as your question seems to indicate. The higher performance synthesis tools can move those registers back and forth (balancing) in an effort to minimize clock period. Not registering your outputs is something you might do for latency concerns or area optimization when you instruct the tool to optimize across hierarchical boundaries. President, Quadrature Peripherals Altera, Xilinx and Digital Design Consulting email: kayrock66@yahoo.com http://fpga.tripod.com ----------------------------------------------------------------------------- amyks@sgi.com (Amy Mitby) wrote in message news:<2d2a8f5d.0211121059.5eb0a76d@posting.google.com>... > Are there any major benefits or disadvantages to optimization > with a synthesis design flow where module boundaries are > registered either at inputs or outputs? In other words, do the > tools' optimizations across module boundaries sometimes work > better than self-imposed sequential boundaries for reaching > better performance, or is it best to put those register boundaries > in yourself and floorplan the location of those registers?Article: 49481
You're going to have to look up the UCF syntax yourself but I wanted to give you a heads up that if you're using Synplicity, it can put attributes in the netlist that will also turn on IOB registers even if the UCF says not to use them. Regards President, Quadrature Peripherals Altera, Xilinx and Digital Design Consulting email: kayrock66@yahoo.com http://fpga.tripod.com ----------------------------------------------------------------------------- Shareef Jalloq <sjalloq@arm_removeMe_.com> wrote in message news:<3DD13234.525C7CA6@arm_removeMe_.com>... > Hi all, > > I'm trying to disable IOB register packing but am having trouble with > the UCF syntax. I know I want to put IOB=FALSE; in there somewhere but > how do I do it? I need to add the constraint to a number of top level > registers that are already grouped by a TIMEGRP constraint. I tried > using the following but it didn't like it: > > TIMEGRP "SRAMData" = FFS("*WDATABuf*"); > INST "SRAMData" IOB=FALSE; > > Any ideas guys? Thanks for your help, Shareef.Article: 49482
Hi What is simulation modes for HyperTransport HT Tunnel ,HT slave,HT-bridge , checkers and monitors Can anybody clarify me SanjayArticle: 49483
Ray, That is a lot of channels! Are you able to elaborate a little more on the specs of the system? (full-parallel/N clocks per sample, filter type (singlerate/rate-changing) input sample widths, coefficient widths, clock rate/sampling rate, device you put it on etc.) - I am just interested to know what sorts of things people do DSP-wise in the real world (I am in academia just now). Thanks for your time, Ken > I bid one job that was to have 160 channels in one filter last year. > > Ken Mac wrote: > > > Hello folks, > > > > Xilinx coregens DA filter supports up to 8 channels for some of the FIR > > filter types. > > > > Could you please let me know what is the largest number of channels you have > > used/seen used through a single FIR filter of any type (including > > rate-changing) on an FPGA? > > > > Thanks for your time, > > > > KenArticle: 49484
Jay wrote: > You're going to have to look up the UCF syntax yourself but I wanted > to give you a heads up that if you're using Synplicity, it can put > attributes in the netlist that will also turn on IOB registers even if > the UCF says not to use them. Thanks guys, I've had to create an instance that contains the flops and I can then use the INST "<inst.name>" IOB=FALSE; syntax. Shareef.Article: 49485
<FAQ> wrote in message news:ee7a528.-1@WebX.sUN8CHnE... > Is there anything 'wrong' with specifiying an output of a lower level VHDL module as a buffer so that you can read the output within the module... versus a dummy signal placed between the assignment to just an output port. Buffers have two restrictions - any net they drive must have only one driver (no tristate busses) - if they are bound to another port at a higher level, that must also be a buffer. As a result, we recommend against them regards Alan P.S. Both these restrictions are removed in VHDL 2002! But I don't know any tools that support VHDL 2002 :-( -- Alan Fitch [HDL Consultant] DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK Tel: +44 (0)1425 471223 mail: alan.fitch@doulos.com Fax: +44 (0)1425 471573 Web: http://www.doulos.com This e-mail and any attachments are confidential and Doulos Ltd. reserves all rights of privilege in respect thereof. It is intended for the use of the addressee only. If you are not the intended recipient please delete it from your system, any use, disclosure, or copying of this document is unauthorised. The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.Article: 49486
"Nicholas C. Weaver" wrote: > In article <2d2a8f5d.0211121414.8a292e8@posting.google.com>, > Amy Mitby <amyks@sgi.com> wrote: > >Does anyone have any general suggestions or remarks > >from past work on a 200 MHz large Virtex2 design? > > I haven't done that on Virtex2, but I have done >100 MHz on Virtex I > (non E) and 175 MHz on VirtexE (AES encryption): > > >For instance, did you have to do things like: > >- add input and output flops for each module and pipeline > > extensively within modules? > > Yes, lots. > > >- hand place some or all of the design? > > Yes, lots. > > Hand mapping and placing isn't that bad, if you use a nice modular > design. The biggest annoying is actually the BlockRAMs, as on Virtex > 1, they can't be relatively placed, only absolute placement, which is > a pain when everything else is RLOCed modules. Yes, p&r engines were very bad during decision of BRAM placements. I had chosen LOC= constraint to get good results. That was for M2.1i and M3.1i. Design was XCV2000E-8. Operating frequency was 80 MHz. UtkuArticle: 49487
That wasn't the question. The question was referring to modules within the FPGA design, e.g. modular design. Putting registers on the I/O of the design alleviates the need for the synthesis tools to try to optimize across module boundaries. As we move into larger devices and get into more of a modular design flow, you'll generally want to at least register the module outputs, just to maintain consistency in timing. When you don't have registers, it can be ugly trying to trace a delay path, and the synthesis tools will not be able to optimize through a module boundary unless both the module and the next level up are visible at the time of the compilation. Jay wrote: > You will get higher performance (as measured by minimum clock period) > by putting those registers on your outputs rather than none at all as > your question seems to indicate. The higher performance synthesis > tools can move those registers back and forth (balancing) in an effort > to minimize clock period. > > Not registering your outputs is something you might do for latency > concerns or area optimization when you instruct the tool to optimize > across hierarchical boundaries. > > President, Quadrature Peripherals > Altera, Xilinx and Digital Design Consulting > email: kayrock66@yahoo.com > http://fpga.tripod.com > ----------------------------------------------------------------------------- > > amyks@sgi.com (Amy Mitby) wrote in message news:<2d2a8f5d.0211121059.5eb0a76d@posting.google.com>... > > Are there any major benefits or disadvantages to optimization > > with a synthesis design flow where module boundaries are > > registered either at inputs or outputs? In other words, do the > > tools' optimizations across module boundaries sometimes work > > better than self-imposed sequential boundaries for reaching > > better performance, or is it best to put those register boundaries > > in yourself and floorplan the location of those registers? -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 49488
Phil Hays <SpamPostmaster@attbi.com> wrote: ... >Webpack XST (5.1) will not target the 4004, so I pointed it to a >Spartan2. > >------------------------------------------ >Number of 4 input LUTs: 26 >------------------------------------------ Ok this is more consistent. Thanks! I wouldn't have guessed that there'd be that much difference between 2 synthesizers on the same code. >> This seems to >> imply that schematic entry is almost twice as efficient as >> structured VHDL! > >Weak examples imply nothing. They do show that how you use a language, and what you use it with, makes a big difference. Given more datapoints anyway... Terry NewtonArticle: 49489
> sig <= A when sel = "11110" else 'Z'; if sel="11110" then sig <= A; else sig <= 'Z'; end if; As I remember the first statement (when) is called a dataflow statement, while the if .. then .. else is a sequential statement and can be used in porcesses. Martin -- JOP - a Java Optimized Processor for FPGAs. http://www.jopdesign.com "Anup Raghavan" <anup@itee.uq.edu.au> schrieb im Newsbeitrag news:9d80c593.0211121820.e951e8d@posting.google.com... > Hello, when i try to synthesize the following code using Leonardo > Spectrum for Xilinx FPGAs, I get errors " Syntax Error near 'when' " > If I dont use a process and then synthesize this code, it works fine. > But I do need to have a process in my design. Can someone provide me a > solution for this. > > Thanks > Anup Raghavan > > entity mux_tbuf is > > port (SEL: in STD_LOGIC_VECTOR (4 downto 0); > A,B,C,D,E: in STD_LOGIC; > clk : in std_logic; > SIG: out STD_LOGIC); > end mux_tbuf; > > architecture RTL of mux_tbuf is > begin > > sync: process (clk) is > > begin > if clk'event and clk = '1' then > sig <= B when sel = "11101" else 'Z'; > sig <= C when sel(2)= '1' else 'Z'; > sig <= D when sel(3)= '1' else 'Z'; > sig <= E when sel(4)= '1' else 'Z'; > end if; > > end process sync; > > end RTL;Article: 49490
Jay <kayrock66@yahoo.com> wrote: > You're going to have to look up the UCF syntax yourself but I wanted > to give you a heads up that if you're using Synplicity, it can put > attributes in the netlist that will also turn on IOB registers even if > the UCF says not to use them. UCF appears to override EDF, in my experience. INST "*" IOB = FALSE; is useful for test routes of sub-modules. Hamish -- Hamish Moffatt VK3SB <hamish@debian.org> <hamish@cloud.net.au>Article: 49491
> Does anybody knows about a free EPP (parallel port) slave interface > module (preferably in VHDL) ? I have checked on opencores, but it seems If you can go with ECP you can use a version I did some time ago. You can find it in a larger zip file at the download section on the link below. The file is ecp.vhd. Martin -- JOP - a Java Optimized Processor for FPGAs. http://www.jopdesign.comArticle: 49492
Anup Raghavan <anup@itee.uq.edu.au> wrote: > begin > if clk'event and clk = '1' then > sig <= A when sel = "11110" else 'Z'; > sig <= B when sel = "11101" else 'Z'; > sig <= C when sel(2)= '1' else 'Z'; > sig <= D when sel(3)= '1' else 'Z'; > sig <= E when sel(4)= '1' else 'Z'; > end if; Unfortunately you can't use '... when ... else ...' inside a process in VHDL, only outside. Annoying, isn't it? Use if/else instead. Hamish -- Hamish Moffatt VK3SB <hamish@debian.org> <hamish@cloud.net.au>Article: 49493
"Steven Derrien" <sderrien@irisa.fr> schrieb im Newsbeitrag news:3DD17371.5C0234D9@irisa.fr... > Hi folks, > > Does anybody knows about a free EPP (parallel port) slave interface > module (preferably in VHDL) ? I have checked on opencores, but it seems > that their EPP controler project has no file on the CVS and has not been > updated for a while. Have a look at www.beyondlogic.org They have tons of techical papers, also many about parallel port /EPP. Doing a state-machne to interface to a EPP is easy. Just sample Data_stobe and Address_strobe, thendo your descision on this. Se the code snippet below. -- MfG Falk ---------------------------------------------------------------------------- ----------------- -- -- A basic EPP state machine -- ---------------------------------------------------------------------------- ----------------- library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; entity epp_fsm is Port ( clk : in std_logic; -- clock input >1 Mhz reset : in std_logic; epp_data : inout unsigned(7 downto 0); -- data epp_write : in std_logic; -- epp_wait : out std_logic; epp_data_strobe : in std_logic; epp_adr_strobe : in std_logic); end epp_fsm; architecture Behavioral of epp_fsm is type state_type is (idle, wait_end, wait_end_read); signal state : state_type; signal data_strobe : std_logic; -- synchronized data strobe form EPP signal adr_strobe : std_logic; -- synchronized address strobe form EPP signal data_register : unsigned (7 downto 0); -- data register, just a example signal address_register : unsigned (7 downto 0); -- address register, just a example begin -- some signal assignments -- IO MUX for EPP data process(state, data_register, address_register) begin if state=wait_end_read then if data_strobe='1' then epp_data <= address_register; -- address read access else epp_data <= data_register; -- data read access end if; else epp_data -- no read access end if; end process; -- sample control lines from EPP process(clk, reset) begin if reset='1' then data_strobe <= '1'; adr_strobe <='1'; elsif clk='0' and clk'event then data_strobe <= epp_data_strobe; adr_strobe <= epp_adr_strobe; end if; end process; -- the state machine process(clk, reset) begin if reset='1' then state <= idle; epp_wait <='0'; elsif clk='0' and clk'event then case state is when idle => if data_strobe='0' then -- beginning of an data access epp_wait <= '1'; if epp_write='0' then -- it is a write access -- place instructions HERE for data write access data_register <= epp_data; -- example state <= wait_end; else -- it is a read access -- place instructions HERE for data read access state <=wait_end_read; end if; elsif adr_strobe='0' then -- adress access epp_wait <= '1'; if epp_write='0' then -- it is a write access; -- place instructions HERE for address write access address_register <= epp_data; -- example state <= wait_end; else -- its a read access -- place instructions HERE for address read access state <= wait_end_read; end if; end if; when wait_end => if data_strobe='1' and adr_strobe='1' -- wait for the end of a write access epp_wait <='0'; state <= idle; end if; when wait_end_read => if data_strobe='1' and adr_strobe='1' -- wait for the end of a read access epp_wait <='0'; state <= idle; end if; when others => null; end case; end if; end process; end Behavioral;Article: 49494
I know this is like asking how long is a piece of string. But being able to make a rough guess how long a project is going to take is useful thing to know. Does anybody have any pointers, rules of thumb or golden rules they can pass on from their experience?. For example time taken (say as a proportion) for paper design, coding and simulation. How much time should be allocated to simulation? Any comments will be useful! Thanks theoArticle: 49495
Dear Computer Arithmetic Gurus, I am currently working on the implementation of an unsigned Parallel Multiplier. After reading some articles I found the modified Booth-2 algorithm suitable. It was described in Al_Twaijry's thesis "Area and Performance optimized CMOS multipliers" page 11 ,1997. I wonder if the figure shown in the thesis page 11 is still the state of the art way to produce partial products? are more advanced techniques discovered since 1997? ThanksArticle: 49497
hamish@cloud.net.au wrote: > INST "*" IOB = FALSE; > is useful for test routes of sub-modules. Hi again guys, although I used the above syntax in the UCF file, the Xilinx tools still packed the flops into the IOB! What can I do aside from adding a dummy output so that the fanout of the Q pin is higher than 1. This would at least guarantee that the flops could not be packed into the IOB. I'm using version 4.2.03i of the Xilinx tools on Solaris. Shareef.Article: 49498
I'm trying to run the Xilinx App 134, "Synthesizable High-Performance SDRAM Controllers". I'm using the VHDL version, When I run "do run_sim.do" in Modelsim, the macro first compiles and then runs the testbench t_sdrm. This runs with the following warnings and errors (partial file only): ** Warning: NUMERIC_STD."=": metavalue detected, returning FALSE # Time: 88200 ps Iteration: 1 Instance: /t_sdrm/sdrmc/sdrm_t_int/brst_cntr_inst # ** Warning: NUMERIC_STD."=": metavalue detected, returning FALSE # Time: 88200 ps Iteration: 1 Instance: /t_sdrm/sdrmc/sdrm_t_int/rcd_cntr_inst # ** Warning: NUMERIC_STD."=": metavalue detected, returning FALSE # Time: 88200 ps Iteration: 1 Instance: /t_sdrm/sdrmc/sdrm_t_int/ki_cntr_inst # ** Warning: NUMERIC_STD."=": metavalue detected, returning FALSE # Time: 88200 ps Iteration: 4 Instance: /t_sdrm/sdrmc/sdrm_t_int/ref_cntr_inst # ** Error: mt48lc1m16a1.v(781): $hold( posedge Clk:88200 ps, Addr:88300 ps, 1 ns ); # Time: 88300 ps Iteration: 3 Instance: /t_sdrm/sdram0 # ** Error: mt48lc1m16a1.v(781): $hold( posedge Clk:88200 ps, Addr:88300 ps, 1 ns ); # Time: 88300 ps Iteration: 3 Instance: /t_sdrm/sdram1 I suspect the metavalues are intialisation problems because they don't recur in the file. The Errors occur until the end of the file. What is going wrong? Also, there are two versions of the source files - one in the directory vhdl\func_sim\ and another in vhdl\src. They are different. The RUN_SIM.DO macro uses the files in vhdl\func_sim. Which is correct? What is the difference? Many thanks. -- JP Nicholls / jpnicholls@pwav.comArticle: 49499
Hi Jan, The area was actually with full pipelining. The problem using it is in the processor part when instructions are sequential unless you add vector instructions. You probably can't pipeline everything since most algorithms has some kind of intermediate calculations and will therefore stall the pipeline. A much better approach on the barrel shifter is to use smaller shift in each clock cycles, going to SRL16 is extreme and I not sure it actually going to save any area. A remember a good report that a FPU core design house did in the 80s on what is the average shift amount for floating point. The report said that a shift upto 8 bits will cover more than 90% of all the cases and they implemented a FPU with just that. That core was quite small and efficient. The smaller shift only lower the average performance by a few percent. BUT at that time I worked for a company that build computer for space application and thus needed to calculate everything on maximum latency (not average). This made the FPU core look bad since it's maximum latency was quite bad. The FPU core made it's way into Sun uSPARC processor which was design for desktop application where average performance is more important. So with this technique you might get down from 800 to at most 400-500 LUTs and if we also consider full pipelining all the time. The new value for floating point would be ((100_000_000/6)*6)/400 = 250000 which still is 30 times worse than integer operations. So if the FPGA was full of these operations, you would need a FPGA which is 30 times bigger. Göran Jan Gray wrote: > "Goran Bilski" <Goran.Bilski@Xilinx.com> wrote > > Quantitative (Number of operations per seconds/ needed area) > > > > Floating point : (100_000_000/6)/800 = 20833 > > Integer : (250_000_000/1)/32 = 7812500 > > > > Integer operations are roughly 400 times more efficient than floating > point. > > Thanks for the interesting data, Goran. > > Can you pipeline the above FP adder to get a factor of ~6 improvement in > ops/area efficiency? > > Also, if you only care about ops/area cost efficiency, and not pure speed, > you might be able to use bit or nybble serial approaches, use lots of SRL16s > for delays, and thereby avoid the big expensive barrel shifters in the > denormalize and renormalize paths. > > Jan Gray, Gray Research LLC
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z