Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
langwadt@ieee.org wrote: > Peter Alfke skrev: > > >>Rather than relying on perfect alignment (and risking race conditions >>if it is not met), I would deliberately offset the two clock domains. >>Obviously, this lowers the max frequency of operation, but it really >>makes things safe, if you have the timing margin. >>Peter Alfke, Xilinx Applications > > > I think I remember Ray saying perfect alignment could not be relied > upon > especially if the load on the two net were very different > > -Lasse > I was burnt with Virtex I on relying on the 1x and 2x clock alignment. Ever since I have been careful to avoid direct transfers from one to the other on the same edge. In my case, it turned out that a combination of a heavily loaded 2x clock tree, a lightly loaded 1x clock tree, and jitter on the input clock conspired to push the 1x and 2x clocks apart by over 500ps, which was enough to cause the design to fail. In that case, the input clock jitter was not from the clock source either, which made it a real bear to find; it was coming from the input clock buffer thresholds being modulated by outputs switching on the same bank as the clock input. The bottom line is that jitter on the input to the DLL or DCM can cause the 1x and 2x output edges to become misaligned, even when the loads are perfectly balanced.Article: 99301
Ron wrote: > I was considering purchasing an AMD64 quad-processor system, but have no > idea whether any of the FPGA development tool software would take > advantage of multi-processors (Synplicity for Lattice, Actel, Xilinx, > etc). Any ideas or recommendations? the primary cpu bound process for Xilinx, par, can be run in a multiprocessor and cluster configuration for converging the best routing solution sooner. Do a serch for MPPR in your par documentation or the Xilinx answer base. At least on older ISE 6.x releases under linux this is VERY useful.Article: 99302
I might suggest that the standardization process is an open one and that individuals motivated to develop new ones are free to do so. You can use this thread to gather support and volunteers and then approach your favorite standards body (IEEE, JEDEC, etc.) and move ahead Further comments... Brannon wrote: > This post is a bit of a flame, but seriously, JTAG has got to go. The > signals are weak. The various drivers and controllers for it are weak. > It causes nonstop headaches for hardware developers and FPGA developers > alike. It's slow, hardly customizable, hard to use, ultra extremely > fantastically flaky on every piece of FPGA hardware I've ever used > (which includes at least a dozen vendors), and ancient technology. > > Here is what I want: > > 1. Support for a lot of chips, say 2048 of them. JTAG supposedly > supports 16 chips. Yeah, right. The 5MHz clock signal dies out after > three or four. The 200KHz signal dies after eight or nine. This will > require some strong signals with error correction, but, heck, if a > basic ethernet layer can do it.... There is no limit. Additional drivers ought to be used on board to strengthen the parallel signals (TCK, TMS) on long chains. Just because TCK and TMS are "slow" doesn't mean that you can ignore layout and signal integrity considerations for them. > > 2. Endpoint enabling. The JTAG methods for specifying an endpoint are > both flaky and redundant. We need some nice protocols, maybe even > packets with headers, etc. > > 3. Speed. It needs to be as fast as my USB2 cable at a bare minimum. > And put some standard, accessible plugs on there while you're at it. Boundary-scan is as fast as the slowest device in the serial chain. This is a weakness of all serial protocols. > > 4. Standard driver interface. Need I say more? How many of you write > directly to the parallel port? All of you? Uh huh, I knew it. I'm sure > you all enjoy it too. How about something like this: > The development of standards of this sort is certainly easy to sketch out but the devil is in the details. Start a standards committee and have at it. > mycard = code to locate the right driver and device and open it.... > ioctl(mycard, HOW_MANY_DEVICES, &devices) > id_struct = new ID_STRUCT[devices] > ioctl(mycard, IDENTIFY_DEVICES, &id_struct) > for each d in devices { > if( id_struct[d].devId == Virtex4Id ) { > targetlist = { d } > ioctl(mycard, SET_TARGET_DEVICES, &targetlist) > command_struct.mode = programming > ioctl(mycard, SEND_COMMAND, &command_struct) > write(mycard, "c:\my programming file.bit") > ioctl(mycard, READ_STATUS, &status_struct) > if( status_struct.mode & programmed) break > else return failed > } > } > Then we go into a loop for reading and writing debug data, etc. > > We could have drivers for a dozen different interfaces including > Firewire, parallel port (urrrg), serial port (double urrrg), etc. > > Yo Xilinx, let's remove the great mystery from Impact. Let's open the > hat on the "platform" driver and make the thing useful for the parallel > port as well. > > Maybe I'm taking this too far. I just want something that works > reliably and is not a pain in the ars to use programmatically. Is that > too much to ask? >Article: 99303
I suppose it depends on your Xilinx tool usage. If you develop in the hardware itself as some people do, the best routing is rarely needed. Hence, this feature of Xilinx is hardly useful in that case. If you develop large projects using software simulation, or if you make a small change to a large project that barely passes timespec on a regular basis, perhaps this is useful, but I remain unconvinced.Article: 99304
dp wrote: > While I agree with you that it is outdated and too slow, > I'd say some single chip USB 2.0 <-> much-faster-than-todays-JTAG > would be a practical enough solution. It will take care of the level > conversion and everything, and the speed will be as high as it > gets. Need more speed for too big a board, put several JTAG chains > on it, ready. > > I do not understand what you mean by "the 5 MHz clock dies out after", > what's wrong with buffering it? But 5 MHz is too slow for todays big > chips anyway, so the point is valid nonetheless . <snip> JTAG itself is OK, it is the implementations that are sometimes 'left till last'. Speed-sag is solved with TinyLogic buffers, as dp suggests. Seems there are two paths : a) Start a committee as Neil suggests (no smiley seen?) b) Start an openCore project, that defines a CPLD fast JTAG interface, to either a Parallel port, or a FTDI device, or a Cypress USB uC etc This would have a BAUD select, and have the ability to run multiple JTAG stubs - ie if Chain is broken ( seems to be common ) then run a star structure. -jgArticle: 99305
>2. Endpoint enabling. The JTAG methods for specifying an endpoint are >both flaky and redundant. We need some nice protocols, maybe even >packets with headers, etc. >1. Support for a lot of chips, say 2048 of them. JTAG supposedly >supports 16 chips. Yeah, right. The 5MHz clock signal dies out after >three or four. The 200KHz signal dies after eight or nine. This will >require some strong signals with error correction, but, heck, if a >basic ethernet layer can do it.... >3. Speed. It needs to be as fast as my USB2 cable at a bare minimum. >And put some standard, accessible plugs on there while you're at it. Two pins differential interface with crc check should enable more robustness? (while keeping things simple=cheap at the same time) >4. Standard driver interface. Need I say more? How many of you write >directly to the parallel port? All of you? Uh huh, I knew it. I'm sure >you all enjoy it too. How about something like this: >mycard = code to locate the right driver and device and open it.... >ioctl(mycard, HOW_MANY_DEVICES, &devices) >id_struct = new ID_STRUCT[devices] >ioctl(mycard, IDENTIFY_DEVICES, &id_struct) >for each d in devices { > if( id_struct[d].devId == Virtex4Id ) { > targetlist = { d } > ioctl(mycard, SET_TARGET_DEVICES, &targetlist) > command_struct.mode = programming > ioctl(mycard, SEND_COMMAND, &command_struct) > write(mycard, "c:\my programming file.bit") > ioctl(mycard, READ_STATUS, &status_struct) > if( status_struct.mode & programmed) break > else return failed > } >} >Then we go into a loop for reading and writing debug data, etc. >We could have drivers for a dozen different interfaces including >Firewire, parallel port (urrrg), serial port (double urrrg), etc. An jtag <-> mcu <-> usb could do the job ..? >Yo Xilinx, let's remove the great mystery from Impact. Let's open the >hat on the "platform" driver and make the thing useful for the parallel >port as well. One way is to make fake usb device with help of a virtual device driver. A kludge ofcourse but still =) >Maybe I'm taking this too far. I just want something that works >reliably and is not a pain in the ars to use programmatically. Is that >too much to ask? As long as the money comes in.. :-)Article: 99306
"Brannon" <brannonking@yahoo.com> wrote in message news:1143048778.266522.35020@z34g2000cwc.googlegroups.com... > > Here is what I want: > > 1. Support for a lot of chips, say 2048 of them. JTAG supposedly > supports 16 chips. Yeah, right. The 5MHz clock signal dies out after > three or four. The 200KHz signal dies after eight or nine. This will > require some strong signals with error correction, but, heck, if a > basic ethernet layer can do it.... > Hi Brannon, I agree with much of what you write. As a workaround for your clocking problems, you could try source terminating the clock driver from your JTAG controller. On my platform cable USB I use a 2x7 2mm header with 4 50 ohm resistors in series with the signal lines. Improves the performance significantly for me. HTH, Syms.Article: 99307
Ron wrote: > I was considering purchasing an AMD64 quad-processor system, but have no > idea whether any of the FPGA development tool software would take > advantage of multi-processors (Synplicity for Lattice, Actel, Xilinx, > etc). Any ideas or recommendations? I've been using dual/quad/octal systems for UNIX/FreeBSD/Linux desktops for about 15 years. Learning to use threads, MPI, PVM is an important programming resource for many classes of difficult problems which can be parallelized, and can be a valuable resume builder. I've made part of my living over the years writing device drivers, where having SMP machines for testing multiprocessor locks is a requirement. As a hardware developer of system boards, it's very useful to learn to write your own device drivers and diagnostics for system bringup, rather than waste a lot of time resources educating software only guys how to program your board (besides learning where to make hardware/software tradeoffs for performance or cost). SMP machines as a UNIX/Linux desktop are much more responsive when doing background compiles, PCB routing, FPGA routing, etc as your interactive desktop session doesn't need to fight for processor access. Justification of the increased cost depends greatly on your use. As more machines are hyperthreaded, have multiple cores and dual/quad systems are getting cheaper, SMP aware programs are getting more common and expected. This should also show in EDA/CAD software over time as well. I have several friends that do high end gaming on dual booted (linux/xp) systems, and feel the SMP gives them that FPS gaming edge. So ... good luck with whatever new machine you get.Article: 99308
On Wed, 22 Mar 2006 05:44:55 -0800, Ron wrote: > I was considering purchasing an AMD64 quad-processor system, but have no > idea whether any of the FPGA development tool software would take > advantage of multi-processors (Synplicity for Lattice, Actel, Xilinx, > etc). Any ideas or recommendations? PAR is multithreaded but none of the simulators are so it's not really cost effective to get an Opteron 8xx system. A couple of Athlon X2 4400+ systems are a much better choice. For simulator performance the cache size is the most important thing. NCVerilog is twice as fast on a 1M cache Athlon 64 as it is on a 1/2M cache system. I haven't updated this page in a while so it doesn't have the 4400+ on it, but these benchmarks are still valid, http://www.polybus.com/linux_hardware/index.htm The 4400+ is about 10% faster on single threaded jobs then the 3400+ (both have 1M caches and run at 2.2GHz, but the 4400+ has two DDR400 memory buses while the 3400+ has a single DDR333 memory bus). The 4400+ is able to use both cores efficiently, I frequently run NCVerilog and PAR at the same time.Article: 99309
I am trying to make a webserver on my ML401 development card and I use BSB to generate my hardware. When I try to "Generate libraries and BSPs" I get the error: //////////////////////////////////// ERROR:MDT - ERROR FROM TCL:- lwip () - child process exited abnormally while executing "exec bash -c "cd src;make all \"COMPILER_FLAGS=$compiler_flags\" \"EXTRA_COMPILER_FLAGS=$extra_compiler_flags\" >& logs"" (procedure "::sw_lwip_v2_00_a::execs_generate" line 51) invoked from within "::sw_lwip_v2_00_a::execs_generate 40121220" ////////////////////////////////////// My version on EDK is 8.101 and ISE is 8.102 My Ethernet component looks like this in my MHS FILE: ///////////////////////////////// BEGIN opb_ethernet PARAMETER INSTANCE = Ethernet_MAC PARAMETER HW_VER = 1.02.a PARAMETER C_DMA_PRESENT = 1 PARAMETER C_IPIF_RDFIFO_DEPTH = 32768 PARAMETER C_IPIF_WRFIFO_DEPTH = 32768 PARAMETER C_OPB_CLK_PERIOD_PS = 10000 PARAMETER C_BASEADDR = 0x40c00000 PARAMETER C_HIGHADDR = 0x40c0ffff BUS_INTERFACE SOPB = mb_opb PORT OPB_Clk = sys_clk_s PORT PHY_rst_n = fpga_0_Ethernet_MAC_PHY_rst_n PORT PHY_crs = fpga_0_Ethernet_MAC_PHY_crs PORT PHY_col = fpga_0_Ethernet_MAC_PHY_col PORT PHY_tx_data = fpga_0_Ethernet_MAC_PHY_tx_data PORT PHY_tx_en = fpga_0_Ethernet_MAC_PHY_tx_en PORT PHY_tx_clk = fpga_0_Ethernet_MAC_PHY_tx_clk PORT PHY_tx_er = fpga_0_Ethernet_MAC_PHY_tx_er PORT PHY_rx_er = fpga_0_Ethernet_MAC_PHY_rx_er PORT PHY_rx_clk = fpga_0_Ethernet_MAC_PHY_rx_clk PORT PHY_dv = fpga_0_Ethernet_MAC_PHY_dv PORT PHY_rx_data = fpga_0_Ethernet_MAC_PHY_rx_data PORT PHY_Mii_clk = fpga_0_Ethernet_MAC_PHY_Mii_clk PORT PHY_Mii_data = fpga_0_Ethernet_MAC_PHY_Mii_data END /////////////////////////////////////////////// Is there anyone that can have an idea what might be the problem? RaymondArticle: 99310
Can someone explain the difference between the Xilinx shift_extract and shreg_extract constraints? Shreg_extract appears to prevent inferring SRL16 based shifters, what does shift_extract do? Thanks! John ProvidenzaArticle: 99311
Despite trying all the read/write options on a Spartan 2 design I am moving to Spartan 3, The Spartan 3 BlockRAMS always seem to take one more clock to valid data output from a write from the other side. I'm using ISE 6.3 and have tried inferring the RAMS with the VHDL source and using the core generator, no change. Is there something fundamentally different between Spartan 2 and 3 blockRAMS timimg wise or is this a tool issue? Pulling whats left of my hair out... Peter WallaceArticle: 99312
Peter C. Wallace wrote: > Despite trying all the read/write options on a Spartan 2 design I am moving > to Spartan 3, The Spartan 3 BlockRAMS always seem to take one more clock > to valid data output from a write from the other side. > > I'm using ISE 6.3 and have tried inferring the RAMS with the VHDL source > and using the core generator, no change. > > Is there something fundamentally different between Spartan 2 and 3 blockRAMS > timimg wise or is this a tool issue? > > Pulling whats left of my hair out... > > Peter Wallace The Spartan3 BlockRAMs will do everything the Spartan2 does. What changes with Spartan3 is the addition of write characteristics that allow "read first" and "no change" in addition to the Spartan2's "write first" mode. If all you're doing is reading, your results should be identical. Both Spartan2 and Spartan3 BlockRAMs have to register the input address to deliver the output data after the clock edge - neither are asynchronous memories. Does your simulation suggest the data doesn't follow the address after the clock edge?Article: 99313
fpga_toys@yahoo.com wrote: > Tech speak, is part of this segment of the electronic world ... as I > say ... get used to it. Feel free not to answer posters in tech speak > ... they will appreciate the increased civility with the lack of your > bitching. After reading the thread I went back and re-read the original post. My suscpicion now is that the strong reactions were not really triggered by the "tech speak" alone (which was as you point out quite minor) but by the combination of fairly ignorant usage of the synthesis tool coupled with a style of writing that similarly can at times give an impression of ignorance. Two hypothetical posts each making one of these transgressions probably would have gotten different replies. For example, a post with the same amount of tech speak that showed a knowledge of the subject and asked a straightforward, interesting question would probably have been answered. And a well written post from an obvious beginner to the effect of "I made a really complicated behavioural design and my computer ran out of memory" might also have elicited helpfull advice. but in combination, the two individually forgivable "offenses" were too much for some...Article: 99314
Ray Andraka wrote: > I was burnt with Virtex I on relying on the 1x and 2x clock alignment. > Ever since I have been careful to avoid direct transfers from one to the > other on the same edge. > > In my case, it turned out that a combination of a heavily loaded 2x > clock tree, a lightly loaded 1x clock tree, and jitter on the input > clock conspired to push the 1x and 2x clocks apart by over 500ps, which > was enough to cause the design to fail. In that case, the input clock > jitter was not from the clock source either, which made it a real bear > to find; it was coming from the input clock buffer thresholds being > modulated by outputs switching on the same bank as the clock input. > > The bottom line is that jitter on the input to the DLL or DCM can cause > the 1x and 2x output edges to become misaligned, even when the loads are > perfectly balanced. In the Virtex-I days, there wasn't proper hold timing analysis in the tools that Xilinx introduced recently. With this added hold timing analysis, the transfers between domains were supposed to be properly analyzed. Even with this help, a properly analyzed input clock would have understated the JITTER constraint needed to have the analysis be 100% and you probably still would have had troubles. If the internal JITTER value can be properly defined, the tools *may* get the designers to a safe point but I haven't gotten any positive feedback or horror stories since the tool changes came about. Ray - it's because of your earlier trials that I'm a little more protective of my domain crossings - thanks. - John_HArticle: 99315
johnp wrote: > I'm upgrading to ISE8.1 from ISE6.2 and my (old) Modelsim V5.5 > is unhappy about some Verilog code in the Xilinx RAM16_S9.V > model (as well as some other of the RAM16 models). > > The Xilinx model has two models built into it - a "legacy_model" > and a new model. A `ifdef selects between them. In the newer model, > there's some code that looks like: > > reg [7:0] mem[2047:0]; > > for (count = 0; count < 32; count = count + 1) begin > mem[count] = INIT_00[(count * 8) +: 8]; > mem[32 * 1 + count] = INIT_01[(count * 8) +: 8]; > // more lines like the above. > > This is code straight from the Xilinx unisims directory. > > Note the +: near the end of each line. > > Is this legal Verilog? It looks illegal to me (and to Modelsim 5.5). > It looks > like an operator is missing. I don't believe Verilog allows this. > > It also looks like the code couldn't possibly do the intended > initialization. > > Has anyone else run into this? > > Thanks for any help! > > John Providenza > The '+:' is an indexed part select operator, which was introduced in Verilog-2001. You can find it on page 53 of the Standard. If you dont have access to it, you can find it in section 3.4 of the following paper. http://sutherland-hdl.com/papers/2000-HDLCon-paper_Verilog-2000.pdf KunalArticle: 99316
Morten Leikvoll wrote: > Got a DCM (spartan3), > CLK1X coming CLK90 pin, thru a BUFG (to get 90deg out of phase from the > incoming clk) > CLK2X coming from CLK2X180 pin, thru a BUFG (to get rising edge in phase > with CLK1X) > > How safe is it to sample a signal coming from the CLK1X domain, into the > CLK2X domain (and maybe back)? > Does this create a race condition or will this always be safe? (I know it > isnt when using modelsim, but when implementing?) > I have problems matching simulation result with real life. I did this recently and it worked fine for me, both in real life and in the simulation (note that in simulation, i generated the clock myself, I didn't simulate the DCM). I'm working on Virtex4 with ISE7.1 & 8.1 When looking at the timing analyzing, I clearly see the path between 1x and 2x being analyzed, taking the skew and clock uncertainty into account. Note that I still tried to minimize the logic between the 1x reg and the 2x reg, basically I just have a mux so this part can run way faster than the 200 MHz I ask for ;) To be sure, you can add a maxdelay constraint in your vhdl file I think and constraint it to be < x% of your clock period for e.g. that should give you some margin. SylvainArticle: 99317
Hi Brannon, Thank you for your help. It is great for several people in a group with great interest in an algorithm and their cooperation rediscovers its best implementation published somewhere and then shares it with everyone. WengArticle: 99318
Kunal - Thanks for the pointer, I found it in the spec. John PArticle: 99319
On Wed, 22 Mar 2006 14:18:31 -0800, John_H wrote: > Peter C. Wallace wrote: >> Despite trying all the read/write options on a Spartan 2 design I am >> moving to Spartan 3, The Spartan 3 BlockRAMS always seem to take one >> more clock to valid data output from a write from the other side. >> >> I'm using ISE 6.3 and have tried inferring the RAMS with the VHDL >> source and using the core generator, no change. >> >> Is there something fundamentally different between Spartan 2 and 3 >> blockRAMS timimg wise or is this a tool issue? >> >> Pulling whats left of my hair out... >> >> Peter Wallace > > The Spartan3 BlockRAMs will do everything the Spartan2 does. > > What changes with Spartan3 is the addition of write characteristics that > allow "read first" and "no change" in addition to the Spartan2's "write > first" mode. If all you're doing is reading, your results should be > identical. Both Spartan2 and Spartan3 BlockRAMs have to register the > input address to deliver the output data after the clock edge - neither > are asynchronous memories. > > Does your simulation suggest the data doesn't follow the address after > the clock edge? Data follows address in the expected way. What is different is how many clocks are required from write on port A to read of valid data at same address on port B. One more for Spartan3... Peter WallaceArticle: 99320
Andreas Ehliar wrote: > To be honest, at this point I prefer to use XC3SProg > http://www.rogerstech.force9.co.uk/xc3sprog/index.html > in Linux. Sure, it is rather slow since it uses the parallel cable IV > in cable III mode, but it feels much more stable in Linux than impact does. > What I would really like something that can write Xilinx .ACE files. The impact from Foundation 6.2 is terribly slow, and from Webpack8.1 is pretty flakey. (It wrote ace files everywhere but where I wanted them.) -- Steve Williams "The woods are lovely, dark and deep. steve at icarus.com But I have promises to keep, http://www.icarus.com and lines to code before I sleep, http://www.picturel.com And lines to code before I sleep."Article: 99321
cs_posting@hotmail.com wrote: > but in combination, the two individually forgivable "offenses" were too > much for some... I agree. The standard in-person, is to walk away. Not start a fight with insults, and rally a lynch mob.Article: 99322
Hi, Forewarning to those who have yet to install 8.1i and who use core generator. Hard to believe I know, but don't use a space in your 8.1i install path otherwise you _may_ have problems running core generator. (I certainly did.). Here is the relevant answer record, scroll down to solution 2. http://www.xilinx.com/xlnx/xil_ans_display.jsp?iLanguageID=1&iCountryID=1&getPagePath=21955 "Solution 2: In the ISE 8.1i design tools, this error message might occur if there is a space in the Xilinx install path" Regards AndrewArticle: 99323
Josh Rosen wrote: > On Wed, 22 Mar 2006 05:44:55 -0800, Ron wrote: > > > I was considering purchasing an AMD64 quad-processor system, but have no > > idea whether any of the FPGA development tool software would take > > advantage of multi-processors (Synplicity for Lattice, Actel, Xilinx, > > etc). Any ideas or recommendations? > > PAR is multithreaded but none of the simulators are so it's not really > cost effective to get an Opteron 8xx system. A couple of Athlon X2 4400+ > systems are a much better choice. For simulator performance the cache size > is the most important thing. NCVerilog is twice as fast on a 1M cache > Athlon 64 as it is on a 1/2M cache system. I haven't updated this page in > a while so it doesn't have the 4400+ on it, but these benchmarks are still > valid, > > http://www.polybus.com/linux_hardware/index.htm > > The 4400+ is about 10% faster on single threaded jobs then the 3400+ (both > have 1M caches and run at 2.2GHz, but the 4400+ has two DDR400 memory > buses while the 3400+ has a single DDR333 memory bus). The 4400+ is able > to use both cores efficiently, I frequently run NCVerilog and PAR at the > same time. YIKES, that thing must be a beast!! I feel so left behind with my XP 2500 @ 2.2Ghz. It is all I can afford for now. It seems fine for me though. But then again, I am not making the huge designs you guys are. -IsaacArticle: 99324
Hi Pablo, Thank you for your useful information. Weng
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z