Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
This is a multi-part message in MIME format. --------------2700911FC1ADC8F92BF29D38 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Alco, Are you using an implementation of the XAPP058 XSVF player code? My answers to your original questions are inserted in the quoted text below... Send me direct email if you need more help. Randal alco wrote: > Hello, > > I have been using a 8051 controller to program a XC9536 cpld using the JTAG > interface. I have programmed several hundreds of units in the last couple of > years without any problems, but now programming the cpld by the 8051 fails > for more than half the units produced. The PCB's do not show shorts or open > connections or other production errors. With some care the cpld (while on > the pcb) was connected to a Multilinx programming device which succesfully > programs the cpld. > > The programming algorithm on the 8051 is based on Xilinx app notes XAPP058 > and XAPP067. The CPLD fails to generate the correct TDO output for > verification after a program or erase instruction. The scope shows > appropriate XRUNTEST idle times for these instructions (640us/1300ms). TCK > period is larger than 2us. > > On the scope I compared JTAG output from the 8051 and the multilinx device. > The only real difference is that the multilinx does not instruct the cpld to > return to Run/test/idle mode after update-IR (see figure 7 in XAPP058) for > the instructions 'isp enable', 'erase' and 'program', which are then > immediately followed by a SDR instruction. > > Questions: > > - Has anything changed recently in the JTAG interface for the xc9536 that > might cause a microcontroller to fail programming the cpld. Nothing has recently changed with the physical device. Many (2-3) years ago, the XC9536 fabrication facility was changed. Software changes have continuously occurred: 1. The XAPP058 SVF2XSVF translator and XSVF player have been updated to v4.xx) that support new additions to the Xilinx family. The SVF2XSVF v4.xx translator output may not be compatible with the old XSVF player. You should continue using the old SVF2XSVF translater if you have an older XSVF player implementation. 2. The new download software called iMPACT was released in the Xilinx 4.1i software package. The iMPACT SVF output is not compatible with the older XSVF tools. > - Where do i find detailed information on JTAG timing and xc9536 timing > during programming. It is not found in the Xilinx data book. The only details are in the combination of the appnotes you listed, the XC9536 data sheet, and the iMPACT/JTAGProgrammer SVF output file for the XC9536. > - Are there timing constraints associated with the Jtag interface or xc9536 > other than the TCK period and XRUNTEST idle times? Since you have read XAPP067 and XAPP058, then you are aware of the conditional "retry" loop. On a TDO test failure, the retry is invoked and the prior XRUNTEST time should be applied again. Actually, it's not guaranteed that the device will erase/program with the given XRUNTEST time. When the retry loop is invoked, it is safer to account for the longer device timing by incrementing the applied XRUNTEST time by 10-25% on each retry attempt. > - besides the xc9536.bsd file there exists a xc9536_v2.bsd file. The device > marking does not say so, but could the device be a v2 type that must be used > with the other bsd file? And if so, how do i use that files, i can not > select a version-2 9536 as a target device. The v2 BSDL file is for the XC9536 that is currently being made. It's the revision 2 device. The switch to this revision occurred with the fabrication facility change about 2-3 years ago. The v2 XC9536 is compatible with the original device. JTAG Programmer always generates SVF programming vectors for the original XC9536 which, as I mentioned, is forward compatible to the newer revisions. iMPACT generates SVF for only the XC9536 v2 version. Randal --------------2700911FC1ADC8F92BF29D38 Content-Type: text/x-vcard; charset=us-ascii; name="Randal.Kuramoto.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for Randal Kuramoto Content-Disposition: attachment; filename="Randal.Kuramoto.vcf" begin:vcard n:Kuramoto;Randal tel;fax:408-626-4216 tel;work:408-879-4819 x-mozilla-html:TRUE url:http://www.xilinx.com org:Xilinx, Inc.;Configuration Solutions Division adr:;;2100 Logic Drive;San Jose;CA;95124-3400;USA version:2.1 email;internet:Randal.Kuramoto@Xilinx.com title:Applications Engineer fn:Randal Kuramoto end:vcard --------------2700911FC1ADC8F92BF29D38--Article: 37626
Rick Filipkiewicz wrote: > [essential ... helpful ... difficult to use ... waste of time & gates] ? A vote for "Helpful". My latest design couldn't use it, as I use all of the block rams, but the gentlemen in the next office over did use it, and while they did swear at it they also seemed to find it quite useful. I would hope that some of the bugs that bit them are now fixed, so I would think it's probably worth trying. -- Phil HaysArticle: 37627
> > Last year, I used Xilinx Foundation Express 3.3i, to develop > > for a Virtex300 part. I recently went to Xilinx's hoomepage, > > and found that the 'Foundation ISE' has replaced the older > > Foundation (non-ISE.) > > > > Does this mean : > > > > 1) goodbye old Windows 16-bit legacy code > > (3.3i would crash on average, every 4-6 compiles, and > > sometimes take down my NT4 workstation) > > What the heck were you doing with your computer that it would cause it > to crash? I ran NT 4 SP6 with the Xilinx service packs, and it never > crashed the computer. Well, the crashes seem to center around the 'project manager' (the top level Foundation application.) When I clicked the "compile" button, it would ask me if I wanted to update some "libraries" (I'm sorry I don't remember the exact dialogs, this was over a year ago.) After clicking yes, I'd take a deep-breath and cross my fingers. Sometimes the app would crash here, with a Win16 dialog box. (I have Visual Studio 6 installed on my system, and every other app-crash has a 'debug' button. The Foundation Project Manager, Schematic Capture, and a few other Xilinx things did NOT.) Once the 'implementation' window pops-up, which shows the progress of the compile, map, par, etc., etc., then the application would always complete successfully. Once I get a crash, I can quit the program. If I try to re-run it, and recompile, the whole program just hangs. So I have to rebootArticle: 37628
On Mon, 17 Dec 2001 10:31:33 -0500, "Pallek, Andrew [CAR:CN34:EXCH]" <apallek@americasm01.nt.com> wrote: >If you just want to devide by 64, shift right by 6 places. The modulo is what was shifted >out. what if the dividend is negative ? Muzaffer Kal http://www.dspia.com DSP algorithm implementations for FPGA systemsArticle: 37629
Some hardware question on FPGA : 1) What's the difference between a part with speed -3 and another with speed -4 , the number is the number of metal layers ?? 2) I read data sheet of Virtex and Virtex E, I didn't found really much difference, can you explain me which is better and why ?? ThanksArticle: 37630
I'd like to hear more inputs from current users. So I am posting this whole thing from the faq here. Please extract only portions you want to comment on. Thanks. =========================================================== FPGA-FAQ 0014 How do I choose between Altera and Xilinx? Vendor Both FAQ Entry Author Martin Thompson FAQ Entry Rebuttal/Commentary Anonymous Altera Fan FAQ Entry Additional Analysis Ray Andraka FAQ Entry Editor Philip Freidin FAQ Entry Date 7 June 2001 One of our many readers suggested that the only way to read this particular page was to do it listening to Arlo Guthrie's Alice's Restaurant, and I agree. Here it is ! Q. How do I choose between Altera and Xilinx? A. Here's a quick (well actually its not as quick as I expected) description of how I chose between Altera and Xilinx. I was comparing Altera's Flex/APEX with the Xilinx's XC4000/Virtex families. Some comments are thrown in about Virtex and Apex and so on, but I will be the first to admit I haven't done any more than read the datasheets. The Flex10KE I have used in anger. A lot of what I have learnt about the two families has come from reading comp.arch.fpga, and in particular, a discussion I had with Ray Andraka. Most of the rest came from reading datasheets and appnotes. Due to the discursive nature of this bit, it is indented with later comments being more indented... Logic The structure of the Xilinx logic cells is well suited to arithmetic structures, compared to the Altera Flex/Apex structure, due to the ability to generate both output and carry from one logic cell. Altera's 4-LUT is divided into two 3-LUTs for arithmetic. I think you misunderstand the basic Altera Logic element. The carry and sum outputs are implemented in a single logic cell. Just as a Xilinx logic cell can be reconfigured to act as a 16-bit memory cell, the Altera logic cell has several configurations optimized for arithmetic, counters, or misc logic. There is no speed penalty for using these modes and there is no inherent advantage to the Xilinx logic cell with regard to arithmetic. I hear all kinds of claims about Xilinx architectural advantages, but I have never heard even the most ardent Xilinx user claim that tha Xilinx architecture has a arthmetic advantage in the logic cells. I think the point is that Altera can only do a 2-bit arithmetic operation with carry in and out - in Flex at least (ref. figure 11 in the 10KE datasheet). Also, with Flex, if you use a CE that has to take the place of an arithmetic input as well, leading to a one input function (without using cascade chains to make it wider, with their, admittedly small, routing delay). Xilinx does have a distinct advantage over Altera when it comes to arithmetic circuits. For arithmetic, the Altera 4-Lut does indeed get partitioned into a pair of 3-Luts: one for the 'sum' function, and one for the 'carry' function. One input to each of these is connected to the carry out from the previous bit. As a result, you are limited to a two input arithmetic function if you wish to stay in one level of logic. Arithmetic functions with more than two inputs, such as adder-subtractors, multiplier partial products, mux-adds, and accumulators with loads or synchronous clears (this last one is addressed by improvements in the 20K family) require two levels of logic to implement. The Xilinx logic cell does not use the Lut for the carry function; it has dedicated carry logic. The 4K/Spartan families use one Lut input to connect the carry chain, leaving three inputs for your function. Virtex, VirtexII, and SpartanII have a dedicated xor gate after the Lut to do this, so these devices can handle 4 input arithmetic functions without having to go to two levels. The relatively limited arithmetic function of the Altera parts means as much as twice the Luts are used in heavily arithmetic applications. Two levels of logic also equates to a significant performance penalty, everything else being equal. Xilinx's logic cells can also be used as 16 bit shift registers or 16x1 SRAMs for small amounts of storage. In addition, in Virtex there are BlockRAMs which are larger blocks of dual-ported memory. Altera only has large blocks of RAM called EABs which are configurable between 256x16bits and 4096x1bit. They are also only partially dual-ported (one read and one write port). The ability to convert the Logic cell into memory is a neat feature. This is one of the key differences in the architectures. My only comment on that is that it isn't used as much as you might think. Xilinx parts have a much lower logic cell count relative to device size since they include so much RAM (example: XVC600E: 13.8K logic cells, 288kbits RAM, 20K400E: 16.6K logic cells, 208Kbits RAM). Because of this it doesn't usually make sense to take away your less abundant resource (logic cells) to create more of something you already have lots of (memory). None-the-less it is a neat and sometimes quite useful feature. For DSP designs, the CLB RAM capability is another significant advantage over the Altera offerings. DSP designs tend to have many small delay queues (filter tap delays, for example) which use up a lot of logic cells if implemented as flip-flops, or severely under-utilize block memories if done there. By using the CLB RAMs (or in the case of Virtex, the shift register mode), you get up to a 17:1 area reduction over using Lut flip-flops. Similar reductions come into play for designs having register files and small fifos. The Virtex SRL16 primitive also gives you the capability to reload Lut contents without reconfiguring the device. This makes it possible to have re-programmable coefficients in a distributed arithmetic filter for instance. There is simply no equivalent capability in the Altera devices. My Virtex designs typically have more than half of the Luts configured as SRL16's. (This is comparing marketing gate counts (600E vs 400E) , actual logic cells (Xilinx actually claims 15.5K, but 13.8K is the actual number of 4LUTS). The 288Kbits of RAM in Xilinx is the block RAM, there can be upto 216Kbits more in the LUTs (which would leave zero for logic). The 208K for Altera is for block RAM only. For each user, a better measure might be to find the product in each vendors product line that can hold a given design, and compare actual price. This gets away from inflated gate and RAM claims, and whether or not it makes sense to trade logic for RAM) Precisely - as with many of the architectural differences, if you need the feature, its brilliant, otherwise, it has no (or even a negative) impact. As far as the memory blocks go, the Altera blocks have built-in circuitry to allow them to be used as CAM (content addressable memory). The Altera CAMs have a huge performance advantage over trying to implement CAMs in Xilinx devices using memory blocks and some logic. The Altera memory blocks can also be used to implment fast, wide, product-term logic. (Xilinx block RAMs can too) This is useful, for example, for implementing a wide address decode in few levels of logic. With that said, I will agree that the Xilinx dual-port mode is more full-featured than the APEX 20KE dual-port (although the advantage disappears when you compare Virtex II vs. Apex II). This is with APEX and later families. As far as I can tell, the Flex devices don't have this ability. Again, great if you need it! On the subject of block memories, the advantages of one over the other are not as clear. Xilinx does have a true dual port capability where Altera's memory is at best (depends on the family) a read only port and a write only port through the 20K family. This is fine for many designs, so unless you need it, not having it is not a problem. Altera does have two very nice unique capabilities in the 20K memories: a CAM mode and the product term mode. The CAM is more than nice to have for network apps and places where you need to sort data. While you can do a CAM in Xilinx, the design is neither trivial nor particularly fast (either the fetch or the write operation has to take multiple clocks; see the Xilinx app notes for details). The product term capability is reminscent of a CPLD, which is very handy when dealing with big combinatorial functions such as address decodes. The flipflops in the logic cells differ in that the Xilinx logic cell has a dedicated clock enable input, whereas Altera use one of the inputs to the LUT to create a CE signal. In addition the Altera flip flops only have a clear input. If you want a preset, the tools will put NOT gates on the input and output of the DFF. Which means that you can't have a preset flipflop implemented in the I/O cell - therefore your Tco can suffer badly. The diagram in the datasheet implies a preset input, but on reading the text you discover the truth! The Altera does have a true clock enable on the LE flip-flop but (except for the 20K) it shares an input to the LE with one of the Lut inputs, so using the clock enable reduces the available functionality of the Lut. In the case of arithmetic logic, using the CE limits you to a single input for one level of logic. FLEX 8000: No clock enable. Software emulates the clock enable by building it into the logic FLEX 6000: No clock enable. Software emulates the clock enable by building it into the logic FLEX 10K, 10KE, ACEX 1K: Clock enable uses one of the LUTs data inputs (per the authors original comment) APEX 20K, 20KE, Mercury, APEX II: Regular clock enable. The logic cells allow you to implement EITHER an asynchronous clear OR an asynchronous preset. You can't do both without using additional logic cells, but you can implement either, even in the I/O cell. By the way, the tco increases by only 0.233ns when using a register near the periphery rather than a register in the I/O cell (APEX EP20K30ETC144-1). Provided you can get the register to be consistently located adjacent to the IOB (can be difficult as the device gets full). Depending on registers placed in the core rather than in the IOB leads to external timing being a function of the place and route solution...not a good thing. Incidently this is also a problem in the 10K if you need bi-directional I/O since there is only one flip-flop in the IOB. If I can return to the Flex architecture, which is what I began the article comparing, according to the 10KE datasheet, an async preset is implemented in one of two ways: Using the clear and inverting both input and output. Inverting the input is 'free' but inverting the output requires a LUT between your register and the pin. Hence, its not just a case of putting the register not in the I/O element, there's extra logic to consider. Admittedly, I missed the other way of doing it, which is to use one of the LUT inputs as a preset. But then you've lost a LUT input, so that's not always possible either. Altera's new Mercury family has a different logic structure, including two carry chains, so the arguments are probably different. I haven't had time/inclination to do any detailed analysis. I/O Both families offer similar I/O families. The biggest difference is that the Altera I/O cell has a single register, which can be used as a output, input or OE register. The Xilinx I/O has all three available for use. Note that the diagrams in the Altera datasheet implies that they have the same capability, but on reading the text you find that the picture shows all the possibilities at once! You're right about that diagram in the datasheet. Also, you can't use the register in the I/O cell for the OE either - just input or output. However, note the comment above regarding using nearby registers not in the I/O cell. The performance penalty in most cases is less than 1ns for using a non-I/O cell register. Fair comment - I admit to being a bit bitter about what I consider to be misrepresentation of the truth in the diagram - still, I've learned not to trust the pictures and read the words now! This is not true for Mercury, which has three flipflops in the I/O cell, and ApexII which has six, for DDR applications. The Virtex/Apex comparison of their respective LVDS implementations is interesting. As far as I can gather the SerDes function is implemented in the FPGA fabric for Virtex, and in custom silicon for Apex. This means that you only get proper SerDes LVDS support with the larger Apex devices. The dedicated SERDES circuitry in the APEX devices allows you to move data around inside the device at 105 MHz and drive it out the LVDS drivers at 840Mbps. The Xilinx solution requires routing data and clocks around internally at 320 MHz (not simple) and they use both edges of the clock to drive data at 640Mbps. Also, the LVDS drivers in the Altera part are balanced (equal rise and fall times) providing a much better eye-diagram than what you get from the unbalanced drivers in the Xilinx device. The Xilinx solution also requires an exernal resistor network to get the right LVDS voltage levels. Finally, the Apex 20KE devices have dedicate de-skew circuitry in the LVDS receivers. This prevents the board designer from having to make all the signal traces exactly the same length. It's hard to argue that the Altera LVDS solution is significantly superior (Apex 20KE vs Virtex-E), but I do have to admire the fact that Xilinx was able to coax 640 Mbps LVDS out of drivers that were never intended to do LVDS. Altera's general-purpose I/Os have trouble making it to 200 Mbps with a Xilinx-type solution. As far as Apex II and Virtex II, I have yet to see details on the Virtex II LVDS. Apex II increased LVDS performance to 1 Gbps and put it on more channels. Apex II also improved the clock de-skew circuitry to reduce even further the need to carefully hand-route the board-level LVDS signals. Also good comments, from someone who has actually done it, rather than simply my reading of those datasheets and appnotes! Routing The routing structures are also different. Altera's main routing strategy is to have many lines connecting the entire chip together This is in contrast to the Xilinx approach, which consists of a hierarchy of short, medium and long connections. This make the job of the place and route tool harder in the Xilinx devices, unless it is guided. The downside for Altera is that larger devices get slower as there is more capacitance to drive. The routing structures of the Xilinx and Altera families are very different; each has different abilities. The Altera structure is a hierarchical structure akin to that of a CPLD. At the lowest level, there are very fast connections between the logic elements (LE's, which consist of a flip-flop and a 4-Lut each) within a LAB (logic array block-with 8 to 10 LE's). These connections are great for very fast state machines, but are useless for arithmetic because the carry chain also runs thru the LAB. The next level up in the routing hierarchy connects the LABs in a row together. The row routes run halfway or all the way across the chip in 10K, with switches connecting to selected LAB's. The rows are then interconnected by column routes. A LAB can drive a row or column route directly, but can only receive input from a row route. This structure has the advantage of having uniform delays for any connections using similar hierarchical resources. That in turn makes placement less critical. Unfortunateiy, it also means even local connections incur the delay associated with a cross-chip connection. A bigger problem appears with heavily arithmetic designs because the routing in and out of every arithmetic LE is forced onto the row routing. There are only six row routes for every eight LE's in a row, so even with perfect routing in a heavily arithmetic data-flow design, the row can only be 75% occupied. The row interconnect matrix is sparsely populated (any one LAB can only directly connect to a fraction of the LAB's on the same row. As the row fills up, some of the connections have to be made via a third LAB, adding to the delay and further congesting the row routes. In a math intensive design, system performance often falls off sharply at 50 to 60% device utilization. The global nature of the row and column routes also means that performance degrades with increasing device size. The 20K architecture fixes many of the routing problems of the earlier families cited above. Another hierarchical layer is added between the row route and the LAB, which has the effect of localizing connections that previously had to go on the row tracks. Since those connections don't have to cross the chip, they are faster. To fix the arithmetic connections, direct connections have been added from each LAB to the LE's in the adjacent LAB's in the so called megaLAB. The Xilinx routing structure is a mix of different length wires connected by switches. For the more local connections, very fast single length connections are used. Longer connections use the longer wires to minimize the number of switch nodes traversed. The routing delays have a strong dependence on the connection distance, so placement is critical to performance. This can make performance elusive to the novice user, but on the other hand, the segmented routing means extreme performance is available if you are willing to do some work to get it. Bottom line is that the Altera routing is more forgiving for moderate designs at moderate densities, which makes it easier for users and tools alike. However, the same things that make it easier for those designs are roadblocks for higher performance. Tools Both vendors now ship FPGA Express for compiling/synthesising. Altera also offer Leanardo Spectrum, which in my opinion is vastly better than the Synopsys tool. Synplify would still be my synthesiser of choice, but that isn't likely to be free any time soon! Altera-specific version of FPGA Express and Leonardo Spectrum are offered FREE on the Altera web site. You do not need a subscription to get them. However, if you do get a subscription, you also get ModelTech's Modelsim program. The place and route (Xilinx) and Fitter (Altera) tools both accomplish the same job. At the time of my investigations (1999) the design I was benchmarking would take several hours to p&r for Xilinx, rather than several minutes for the Altera tools. This is mainly due to the difficulties caused by the Xilinx architecture to the tools. Note that no effort was made to guide the tools, other than providing timing constraints, as the environment I work in places a high priority on speed of turnaround. I'm told (by Xilinx) that things are much improved with the new tools, but I haven't been able to compare. It's quite possible that I could have done the job in a smaller/cheaper Xilinx part, but our production volumes were exptremely small, so the time taken to create/debug the design on the bench was a priority. Other bits Xilinx have DLLs, Altera have PLLs. Altera claim PLLs are better becuase they give you proper 'analogue' control over the timing of your clocks. Xilinx claim DLLs are better because they are not analogue and therefore easier to deal with. Xilinx have an interesting appnote comparing the two, but they have subtracted the jitter of their source clock from the Xilinx numbers and not from the Altera measurements. They didn't measure the jitter of the Altera input, so it's difficult to judge if the PLLs are the cause of the jitter they measure or not. In the interests of fairness, you can look at Altera's jitter comparison - however, it seems to have a lot less experimental details to it. I feel I could reproduce the Xilinx experiment to verify the results if I wanted to! One significant difference between the PLLs and the DLLs that you missed is the ability of the PLLs to create non-integer multiples of the input clock. In fact, the Altera PLL can multiply the input clock by m/(n*k), where m is any value from 1 to 160 and (n*k) is also any value from 1 to 160. Check out App Note 115 for details on the PLLs. Summary Xilinx Potentially smaller and cheaper devices Good at arithmetic functions Flexible I/Os Longer compile times More complex tools More capable tools for the power user Both small and large blocks of embedded RAM Proper dual port RAM Altera Quick compile Simple tools Less flexible tools for the power user Flex and Apex make it tricky to make fast bi-directional I/O Less capable arithmetic No small blocks of embedded RAM RAM has one read and one write port, not proper dual ported. The conclusion about compile times does not hold for all designs. The compile time for dense arithmetic designs in Altera can literally take days where a similar design in Xilinx can finish in under an hour with decent floorplanning. Floorplanning in Altera is not well supported and frankly won't provide as much as it does with Xilinx Because of Altera's row/column architecture, Altera has been able to design-in redundant rows and colums. If a fab defect is found, a redundant row can be switched in and the die is saved rather than thrown away. Since the biggest cost-driver is die size and yield, I would have to dispute the "potentially... cheaper" devices claim. As far as smaller goes, I would have to agree that Xilinx has a wider product offering at the small end of the FPGA size spectrum. (The reality of whether one vendor's parts are cheaper than the other is independent of whether the device includes redundancy logic.The efficiency of the architecture (gates per some metric of silicon usage such as area or transistors), the implementation geometry, test costs, volume, package type, and many other factors all affect the manufacturing cost. The user pays a "Price" not a "Cost", and this price depends on the cost, as well as the supplier's profit margin, and how good you are at negotiating lower prices :-) . While redundancy may help reduce the cost, what matters in the end to the end user is the price they pay for a device that meets their needs. ) Quite right Philip. And there's more than the piece price to think about. If the tools/architecture/whatever allow you to get to market quicker, or your volumes are so low that the development costs outweigh the FPGA price (as it does in my particular application) different things become more important. Regarding "potentially... cheaper", maybe it would be better to say "in some applications, potentially cheaper". And therefore the same should apply to Altera! I'm also going to have to raise an issue with the "More capable tools for the power user". Just because Altera's tools have a nicer GUI doesn't mean that the tools are not for the power user. Quartus II has a built-in TCL console for creating scripts that can do everything that you can do in the Xilinx tools. Well... no! Show me where in their tools you can look at and edit individual wires in the device. You can do that in Xilinx's FPGA editor. How about specifying placement in your source (the edif netlist)? It sure would be nice to be able to constrain the two level arithmetic logic and the registers driving it to lie in the same row. Cliques gave the tools a *HINT* that you want to keep stuff together in the max plus tools, but only if there was a small number of them. Last I checked, Quartus still could not use cliques. If you don't like to use the menus, ask your local Altera FAE and he can provide you with a library of TCL functions (ask for the PowerKit) that will allow you to create constraints like "Real Men" do rather than use the GUI. This is probably my fault - I was referring to Maxplus2 which I have consistently failed to get to do what I want with placing certain logic cells - due to the Quartus fitter ignoring all my assignments - and the older fitter not being able to get close to my timing requirements. Approaches to our local FAE, Altera direct and the c.a.f newsgroup all hit a brick wall. My cursory inspection of Quartus a while ago did lead me to the idea that it was much more capable in this area, but as I've not gone beyond 10K I have no 'real world' comments to make. I do use emacs to enter my constraints in the acf file though :-) I can only encourage you to check out the literature and talk to the FAEs from both Altera and Xilinx to get a more balanced view of the strengths and weaknesses of the two architectures. I have read the literature, and spoken to FAEs from both companies. I think much of our misunderstanding probably stems from the fact that I initially wrote this piece based on 10K compared with 4000, with comments thrown in about other architectures jsut to confuse the issue! Sorry about that! Amazing as it may seem, other people have asked to contribute to this page, and editing each person's input (I am expecting more) is getting to be a bit much, so for your enjoyment, here are comments from others on the topic. Good luck with your selecting a vendor of FPGAs :-) Anonymous Designer: 1. TOOLS Altera supports AHDL, which is more powerful than ABEL, but much easier to learn than VHDL/Verilog. The Maxplus2 tool allows you to target anything between a 7032 and a 10k200, almost seemlessly. When we were just getting started this was a big advantage. 2. SUPPORT Altera data sheets have to be read most carefully to check that the device has the features you want, in the package you want. E.G. only some 10K series have PLLs. When Xilinx says a family has DLLs then the whole family has them. The summary data sheet for Altera's APEX 20KE series states that LVDS is supported, but does not mention that it is only really supported in the 20K400E and larger devices. There is no mention in the summary front page; you have to look really hard in the datasheet to find this. 3. Altera appears to be targetting the router and other network hardware market at the moment. Xilinx seems to be going towards DSP. 4. Some people are of the opinion that Xilinx appears to be far more innnovitive and open : There is code to make your FPGA into a DAC with one resistor and one capacitor ! The Altera app. notes amount to "Yes, we did it" but do not give sufficient detail for ME to do it. Xilinx app notes are far more helpful, and they will respond to postings in comp.arch.fpga. Altera NEVER do. An update from Anonymous Altera Fan The Product-Term mode of the Altera memory blocks is something other than just using the RAMs as a big LUT. A single memory block can be configured to provide 16 product-term outputs based on 32 inputs. Although this can be duplicated using a generic RAM block as a big LUT, it would take an extemely large memory block (32 address lines = 2^32, 16-bit memory cells) to do it in the brute-force manner. Note that this is only a feature in Apex, Apex-E, Mercury, and Apex II devices.Article: 37631
Hi! > I'm curious to know if anyone out there knows where there are some examples > of an SPI interface coded in VHDL. Which type of interface? There is technical information available for the commercial Xilinx and Altera SPI cores, but of course they don't go into detail. > Just curious as I have to code one in the > near future and I always like to compare the various approaches taken by > others. Well, I have to code a SPI-4 (according to the standard phase 2) interface. Maybe we can exchange some information. regards, PatrickArticle: 37632
hi: I am a fpga beginner,now i have a small design.can you advise me how to implement it? There is 8 data in a fifo(16x255). they must be distinguished and divided when they are be read out from the fifo according to clock.So that i can operate anyone of them to do other .For Example:the first data is Data0,and the second data is Data1....the eighth data is Data7.At the begining i want to implement it by shift register or state machine or counter ,but i cann't finished it alone as my poor digital circuit . it is better if you can write out verilog source code for me thanks!Article: 37633
Hi, Does anyone know what is the tradeoff or how should we decide whether to let the Core Gen create rpm or not ? Also, what are advantages/disadvantages of using pipelining in creating DDS in CoreGen ? thanks ennyArticle: 37634
Now,i want to implement it by counter controlling.is it OK? /*counter[2:0] works if read enable.Data was be shifted by counter control*/ always @(posedge NA_Clock or negedge Rst ) begin if(Rst) NA_Count<=0; else if(NA_Read_Enable) NA_Count<=NA_Count+1; else NA_Count<=0; end /*data read out from fifo were allocated is NA_Des_Data0.1.....7 dividually NA_Data_Out[15:0] :fifo data out NA_Des_Data[7:0] [15:0] */ always @(posedge NA_Clock or negedge Rst ) begin if(Rst) begin NA_Des_Data0 <=16'b0; NA_Des_Data1 <=16'b0; NA_Des_Data2 <=16'b0; NA_Des_Data3 <=16'b0; NA_Des_Data4 <=16'b0; NA_Des_Data5 <=16'b0; NA_Des_Data6 <=16'b0; NA_Des_Data7 <=16'b0; end else case(NA_Count) 3'b000: NA_Des_Data0 <=NA_Data_Out; 3'b001: NA_Des_Data1 <=NA_Data_Out; 3'b010: NA_Des_Data2 <=NA_Data_Out; 3'b011: NA_Des_Data3 <=NA_Data_Out; 3'b100: NA_Des_Data4 <=NA_Data_Out; 3'b101: NA_Des_Data5 <=NA_Data_Out; 3'b110: NA_Des_Data6 <=NA_Data_Out; 3'b111: NA_Des_Data7 <=NA_Data_Out; default : begin NA_Des_Data0 <=16'b0; NA_Des_Data1 <=16'b0; NA_Des_Data2 <=16'b0; NA_Des_Data3 <=16'b0; NA_Des_Data4 <=16'b0; NA_Des_Data5 <=16'b0; NA_Des_Data6 <=16'b0; NA_Des_Data7 <=16'b0; end endcase end is it OK? ThanksArticle: 37635
Rick, I'm using ChipScope for the past year. I find it very useful for system debug (replaces the need for probing of internal nodes). Nice feature I use a lot is the "Trigger in"/"Trigger out" that let you sync to/from external equipment (LA / Scope) to the ChipScope trigger while modifying the trigger settings through the JTAG port. The GUI is quite primitive compared to commercial LA. Note that you need to have unused blockrams in your chip in order to be able to use ChipScope. The amount of RAM required depends on the size of the buffer you need. Rotem Gazit Design Engineer High-speed board & FPGA design MystiCom LTD mailto:rotemg@mysticom.com http://www.mysticom.com/ Rick Filipkiewicz <rick@algor.co.uk> wrote in message news:<3C1E2EAB.2CA43896@algor.co.uk>... > As usual I'm in the position of trying to shut the stable door when the > horse is already 2 counties away and accelerating fast but ... > > Has anyone on C.A.F used the ChipScope ILA stuff ? > > Does it work as advertised ? > > Had sucesses/failures ? > > Does it take up a lot of space per embedded analyser ? > > In short where does it lie in the spectrum > > [essential ... helpful ... difficult to use ... waste of time & gates] ?Article: 37636
The Altera NIOS softcore processor comes with a flexible, parameterizable SPI interface module in VHDL or Verilog. The complete NIOS license with all tools, board and of course SPI is US-$ 995,- Check out: http://www.altera.com/literature/ds/ds_niosspi.pdf - Wolfgang "Jason Berringer" <jberringer@trace-logic.com> schrieb im Newsbeitrag news:S5bT7.2519$NC5.476993@news20.bellglobal.com... > Hello again > > I'm curious to know if anyone out there knows where there are some examples > of an SPI interface coded in VHDL. Just curious as I have to code one in the > near future and I always like to compare the various approaches taken by > others. > > Thanks > > Jason > > >Article: 37638
Muzaffer Kal wrote: > > On Mon, 17 Dec 2001 10:31:33 -0500, "Pallek, Andrew [CAR:CN34:EXCH]" > <apallek@americasm01.nt.com> wrote: > > >If you just want to devide by 64, shift right by 6 places. The modulo is what was shifted > >out. > > what if the dividend is negative ? The shift_right() function in ieee.numeric_bit operates on signed numbers by maintaining the sign bits. For an unsigned number, zeros are shifted in.Article: 37639
Hi all. We're using Xilinx Virtex FPGAs and Xilinx Foundation 3.1i tools. We're currently trying to generate a FPGA configuration, where certain parts of the FPGA remain completely un-used. While it is possible to place all the logic in certain areas on the chip using placement constraints, it seems more difficult to influence the routing. Is possible to (completely) prohibit the use of routing ressources on a specific area of the FPGA? Regards, ChristianArticle: 37640
Hi all, We've got a problem with FPGA express (FPGAexpress 3.6.6613 (attached bij Xilinx ISE 4.1)) and bidir pins with a Xilinx device: I've made two blocks and each block has control signals and one bidirectional pin (tri-state buffered). On the upper layer, this two signals are routed to the same output pin. (See attachments) The problem is a warning from FPGA express: "FPGA-pmap-18 (1 Occurrence) Warning: The port type of port '/TryOutBiDir-1/BiDirPin' is unknown. An output pad will be inserted" and FPGA express insert a Outputbuffer instead of a bidir buffer. Internal the signal is bidirectional, to the outside it's unidirectional. I want a bidirectional output pin !! Can somebody help me?? Thanks, Wilco wilco@cardiocontrol.com begin 666 upper.vhd M#0IL:6)R87)Y($E%144[#0IU<V4@245%12YS=&1?;&]G:6-?,3$V-"YA;&P[ M#0H-"F5N=&ET>2!4<GE/=71":41I<B!I<PT*("!P;W)T#0H@("@-"B @("!7 M<FET93)296%D960Q.B!I;B!35$1?3$]'24,[#0H@(" @5W)I=&4R4F5A9&5D M,CH@:6X@4U1$7TQ/1TE#.PT*(" @($AI9VA:,3H@:6X@4U1$7TQ/1TE#.PT* M(" @($AI9VA:,CH@:6X@4U1$7TQ/1TE#.PT*(" @(%)E861E9#$Z(&]U="!3 M5$1?3$]'24,[#0H@(" @4F5A9&5D,CH@;W5T(%-41%],3T=)0SL-"B @("!" M:41I<E!I;CH@:6YO=70@4U1$7TQ/1TE##0H@("D[#0IE;F0@5')Y3W5T0FE$ M:7([#0H-"F%R8VAI=&5C='5R92!4<GE/=71":41I<E]A<F-H(&]F(%1R>4]U M=$)I1&ER(&ES#0H@( T*("!C;VUP;VYE;G0@9')I=F5R#0H@(" @<&]R= T* M(" @("@-"B @(" @(%=R:71E,E)E861E9#H@:6X@(" @4U1$7TQ/1TE#.PT* M(" @(" @:&EG:%HZ(" @(" @("!I;B @("!35$1?3$]'24,[#0H@(" @("!2 M96%D960Z(" @(" @(&]U=" @(%-41%],3T=)0SL-"B @(" @($11.B @(" @ M(" @(" @:6YO=70@4U1$7TQ/1TE##0H@(" @*3L-"B @96YD(&-O;7!O;F5N M=#L-"B @#0H@(&)E9VEN#0H-"B @1')I=F5R7S$@(#H@1')I=F5R("!P;W)T M(&UA<" H5W)I=&4R4F5A9&5D,2Q(:6=H6C$L4F5A9&5D,2Q":41I<E!I;BD[ M#0H@($1R:79E<E\R(" Z($1R:79E<B @<&]R="!M87 @*%=R:71E,E)E861E M9#(L2&EG:%HR+%)E861E9#(L0FE$:7)0:6XI.PT*(" -"B @#0IE;F0@5')Y 03W5T0FE$:7)?87)C:#L-"@`` ` end begin 666 driver.vhd M;&EB<F%R>2!)145%.PT*=7-E($E%144N<W1D7VQO9VEC7S$Q-C0N86QL.PT* M#0IE;G1I='D@9')I=F5R(&ES#0H@('!O<G0-"B @* T*(" @(%=R:71E,E)E M861E9#H@:6X@(" @4U1$7TQ/1TE#.PT*(" @(&AI9VA:.B @(" @(" @:6X@ M(" @4U1$7TQ/1TE#.PT*(" @(%)E861E9#H@(" @(" @;W5T(" @4U1$7TQ/ M1TE#.PT*(" @($11.B @(" @(" @(" @:6YO=70@4U1$7TQ/1TE##0H@("D[ M#0H@(&5N9"!D<FEV97([#0H-"F%R8VAI=&5C='5R92!D<FEV97)?87)C:"!O M9B!D<FEV97(@:7,-"@T*("!B96=I;@T*#0H@(%)E861E9" \/2!$42!W:&5N M(%=R:71E,E)E861E9" ]("<Q)R!E;'-E("<P)SL-"B @1%$@/#T@)UHG('=H M96X@2&EG:%H@/2 G,2<@96QS92 G,2<[#0H@( T*96YD(&1R:79E<E]A<F-H #.PT* ` endArticle: 37641
When starting IDS 7.5 in Windows XP Home Edition the following errorsdescription occurs: vw25.exe has detected an error and hast to be closed. (in german language because of german installation) AppName: vw25.exe AppVer: 0.0.0.0 ModName: ntdll.dll ModVer:5.1.26.00.0 Offset: 0000222c Does anybody know a solution for this problem? THX a lot ElmarArticle: 37642
Hey there Rob, try macro proc RandomGen(Random1,Random2) { /*initialise random numbers*/ Random1 = 0x1234; Random2 = 0xabcd; /* random process */ while (1) { par { Random1 = (Random1 <- 22) @ (Random1[19] ^ Random1[13] ^ ~Random1[22]); Random2 = (Random2 <- 22) @ (Random2[19] ^ Random2[13] ^ ~Random2[22]); } } } Noel robquigley@hotmail.com (rob) wrote in message news:<c48eed90.0112140543.7aa78fbc@posting.google.com>... > Hi folks, > > I was wondering if anyone knew if there is a random number generator > facility in Handel-C. I'm using version 3.0. > > Any help would be much appreciated, > > Cheers and THX, > > Rob.Article: 37643
Hello All, My company is currently comparing 66MHz PCI core solutions from Xilinx and Altera, as well as debating using a home-spun core. One issue I've come upon is the PCI requirement for a MAX clock-to-out time of 6 ns and MIN clock-to-out time of 2 ns. Both the Xilinx ISE and Altera Quartus II tools seem very helpful in supplying MAX (worst-case) Tco times, but I don't see any info on best-case times. Apparently the SDF files for back-annotated timing sim have the same worst-case numbers repeated 3 times, resulting in the same simulation regardless of case selection. My question is: how is anyone (FPGA vendors included) guaranteeing a MIN Tco of 2 ns across all conditions and parts if the design tools don't even yield that information? Thank You, Stephen ByrneArticle: 37644
"Jay Berg" <admin@eCompute.org> wrote in message news:3c1cfff8$0$34821$9a6e19ea@news.newshosting.com... > After making the mistake of getting involved in the current ECCp109 > distributed computing project (see URL below), I'm now casting around to > determine if there's a possibility of finding a PCI board with an FPGA > co-processor capable of handling a small set of modular math functions. > > http://www.nd.edu/~cmonico/eccp109/ > *************** I want to thank everyone (especially Larry) for helping refine an initial design for the problem I posed a couple of days ago regarding modulo math. I have a synopsis of the discussions below. - - - - - One of the biggest design points that fell out of the discussions is that I was looking at the problem with too fine a granularity. Rather than looking at simply providing a modulo multiply, it was strongly suggested that I look at replacing larger sections of logic with FPGA logic. By re-examining the client, it becomes obvious that it is possible to extract substantial math from the client within the very most inner loop. There are three paths in the inner most loop. These are examined below. Each of the first two paths require 5 inputs with two of the inputs being constants with the third path requireing 4 inputs with one being a constant. Thus each path requires in actuality a total of 3 (variable) inputs and each path producing a single result. I've been told that it would be "quite easy" to reduce all three paths to FPGA logic. Where the CPU provides the initial inputs and then selects the logic path to execute. I am now trying to decide whether I can learn enough about FPGAs to do the work myself, or whether I can find someone willing to donate the time in return for a couple of FPGA PCI boards. - - - - - Path 1 - Total of input parameters needed: 5 PY (constant value) PX (constant value) y[i] x[i] needInvert [i<<2] submod_p109 (lambda, PY, y[i]); mulmod_p109 (lambda, lambda, &needInvert [i << 2]); addmod_p109 (temp_ul, x[i], PX); mulmod_p109 (temp2_ul, lambda, lambda); submod_p109 (tempx, temp2_ul, temp_ul); submod_p109 (temp_ul, x[i], tempx); mulmod_p109 (temp_ul, lambda, temp_ul); submod_p109 (res_list[i].y, temp_ul, y[i]); Path 2 - Total of input parameters needed: 5 QY (constant value) QX (constant value) y[i] x[i] needInvert [i<<2] submod_p109 (lambda, QY, y[i]); mulmod_p109 (lambda, lambda, &needInvert [i << 2]); addmod_p109 (temp_ul, x[i], QX); mulmod_p109 (temp2_ul, lambda, lambda); submod_p109 (tempx, temp2_ul, temp_ul); submod_p109 (temp_ul, x[i], tempx); mulmod_p109 (temp_ul, lambda, temp_ul); submod_p109 (res_list[i].y, temp_ul, y[i]); Path 3 - Total of input parameters needed: 4 A (constant value) y[i] x[i] needInvert [i<<2] mulmod_p109 (temp_ul, x[i], x[i]); addmod_p109 (temp2_ul, temp_ul, temp_ul); addmod_p109 (temp2_ul, temp2_ul, temp_ul); addmod_p109 (lambda, temp2_ul, A); mulmod_p109 (lambda, lambda, &needInvert [i << 2]); mulmod_p109 (temp_ul, lambda, lambda); submod_p109 (temp_ul, temp_ul, x[i]); submod_p109 (tempx, temp_ul, x[i]); submod_p109 (temp_ul, x[i], tempx); mulmod_p109 (temp_ul, lambda, temp_ul); submod_p109 (res_list[i].y, temp_ul, y[i]); - - - - - A few side notes: - - - - - 1. The following values are constants. a. PX = 000004CC974EBBCBFDC3636FEB9F11C7 b. PY = 000007611B0EB1229C0BFC5F35521692 c. QX = 00000233857E4E8B5F0055126E7D7B7C d. QY = 000019C8C91063EB4276371D68B6B4D9 e. A = 00000FD4C926FD178E9805E663021744 f. P = 00001BD579792B380B5B521E6D9FB599 Note that P is the modulo value that all functions use in reducing results. 2. All math is 109-bits. 3. All routines reduce the result by modulo P, prior to storing the result. 4. All functions are in the form of (result, op1, op2). Where the result of the operation is stored to 'result'. 5. Each of the three paths result in a single value. 6. The math functions each require between 25 (add and subtract) and 325 (multiply) CPU instructions. Using that estimate of function lengths, the three paths are approximately 1,000 CPU instructions each. 7. The SW seems to lend itself well to parallelism. Currently it appears that the SW is setup to provide calculations in groups of 128 at a time. I've been told that this would aid in the pipelining within the FPGA. I am still waiting for confirmation from the SW author as to the exact behavior of the SW.Article: 37645
You could simulate it, and find out for yourself if it is OK. OK? "chensw20hotmail.com" wrote: > > Now,i want to implement it by counter controlling.is it OK? > > /*counter[2:0] works if read enable.Data was be shifted by counter control*/ > always @(posedge NA_Clock or negedge Rst ) > begin > if(Rst) > NA_Count<=0; > else if(NA_Read_Enable) NA_Count<=NA_Count+1; > else NA_Count<=0; > end > > /*data read out from fifo were allocated is NA_Des_Data0.1.....7 dividually NA_Data_Out[15:0] :fifo data out NA_Des_Data[7:0] [15:0] */ always @(posedge NA_Clock or negedge Rst ) > begin > if(Rst) > begin > NA_Des_Data0 <=16'b0; > NA_Des_Data1 <=16'b0; > NA_Des_Data2 <=16'b0; > NA_Des_Data3 <=16'b0; > NA_Des_Data4 <=16'b0; > NA_Des_Data5 <=16'b0; > NA_Des_Data6 <=16'b0; > NA_Des_Data7 <=16'b0; > > end > else > case(NA_Count) > 3'b000: NA_Des_Data0 <=NA_Data_Out; 3'b001: NA_Des_Data1 <=NA_Data_Out; 3'b010: NA_Des_Data2 <=NA_Data_Out; 3'b011: NA_Des_Data3 <=NA_Data_Out; 3'b100: NA_Des_Data4 <=NA_Data_Out; 3'b101: NA_Des_Data5 <=NA_Data_Out; 3'b110: NA_Des_Data6 <=NA_Data_Out; 3'b111: NA_Des_Data7 <=NA_Data_Out; default : > begin > NA_Des_Data0 <=16'b0; > NA_Des_Data1 <=16'b0; > NA_Des_Data2 <=16'b0; > NA_Des_Data3 <=16'b0; > NA_Des_Data4 <=16'b0; > NA_Des_Data5 <=16'b0; > NA_Des_Data6 <=16'b0; > NA_Des_Data7 <=16'b0; > end > endcase > end > is it OK? > ThanksArticle: 37646
Patrick Loschmidt <Patrick.Loschmidt@gmx.net> wrote in message news:<3C1F002D.4060502@gmx.net>... > Hi! > > > I'm curious to know if anyone out there knows where there are some examples > > of an SPI interface coded in VHDL. > > > Which type of interface? There is technical information available for > the commercial Xilinx and Altera SPI cores, but of course they don't go > into detail. Modelware (www.modelware.com) also makes a Spi4 core for Xilinx. Xilinx re-sells their core. You get much better support if you buy it directly from Modelware.Article: 37647
"Wilco Vahrmeijer" <wilco@cardiocontrol.com> schrieb im Newsbeitrag news:9vnl4h$1j1i$1@news.versatel.net... > Hi all, > > We've got a problem with FPGA express (FPGAexpress 3.6.6613 (attached bij > Xilinx ISE 4.1)) and bidir pins with a Xilinx device: > > I've made two blocks and each block has control signals and one > bidirectional pin (tri-state buffered). On the upper layer, this two signals > are routed to the same output pin. (See attachments) > > The problem is a warning from FPGA express: > "FPGA-pmap-18 (1 Occurrence) Warning: The port type of port > '/TryOutBiDir-1/BiDirPin' is unknown. An output pad will be inserted" > > and FPGA express insert a Outputbuffer instead of a bidir buffer. Internal > the signal is bidirectional, to the outside it's unidirectional. > > I want a bidirectional output pin !! Can somebody help me?? To have a bidirectional bus inside AND outside the FPGA you have to isolate them. entity tristate is port ( BiDirPin: inout STD_LOGIC ); end TryOutBiDir; architecture TryOutBiDir_arch of TryOutBiDir is component driver port ( Write2Readed: in STD_LOGIC; highZ: in STD_LOGIC; Readed: out STD_LOGIC; DQ: inout STD_LOGIC ); end component; begin Driver_1 : Driver port map (Write2Readed1,HighZ1,Readed1,BiDirPin_int); Driver_2 : Driver port map (Write2Readed2,HighZ2,Readed2,BiDirPin_int); BiDirPin<=BidirPin_int when con='1' else 'Z'; end TryOutBiDir_arch; This code ist not complete, the signal declarations are missing. You also need to generate the con signal, which controls the Tristate driver of the IO Pin. -- MfG FalkArticle: 37648
"Christian Plessl" <plessl@remove.tik.ee.ethz.ch> schrieb im Newsbeitrag news:3c1f4e5f@pfaff.ethz.ch... > Is possible to (completely) prohibit the use of routing ressources on a > specific area of the FPGA? Why do you want to do so? -- MfG FalkArticle: 37649
Hi, I had 2 projects, entities A and B. They both fit into separate XC9572-s. Then I wanted to know how much resources I have available if I combine them into one XC95144XL, so I created another project and in it's main architecture I used entities A and B, connected them with few signals and copied sources into this new project. Specified the chip XC95144XL and looked at fitter report. So far so good. But my prototype board is built using XC9572-s, so I changed target chip to XC9572, selected entity A in project window and created bit file. I got no visible warnings that I shouldn't do this. And when I programmed XC9572 with the file that was supposed to be made for XC9572, chip let the magic smoke out. Tried twice on both CPLDs, got 4 fried chips. When I created two separate projects again with entities A and B in them, new chips programmed fine. Any ideas?
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z