Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Hi, there: My design takes up a long time still not routed... Here is the resources consumption and P&R log... How can I make it run faster? It is a simple 1-clock design, with 118 multipliers. Selected Device : 2v6000bf957-6 Number of Slices: 19984 out of 33792 59% Number of Slice Flip Flops: 11278 out of 67584 16% Number of 4 input LUTs: 31877 out of 67584 47% Number of bonded IOBs: 93 out of 684 13% Number of BRAMs: 27 out of 144 18% Number of MULT18X18s: 122 out of 144 84% Number of GCLKs: 1 out of 16 6% Phase 1: 141110 unrouted; REAL time: 21 mins 14 secs Phase 2: 117379 unrouted; REAL time: 28 mins 10 secs ... Intermediate status: 265 unrouted; REAL time: 3 days 8 hrs 16 mins 53 secs Intermediate status: 265 unrouted; REAL time: 3 days 8 hrs 47 mins 21 secs Intermediate status: 260 unrouted; REAL time: 3 days 9 hrs 18 mins 1 secs Intermediate status: 259 unrouted; REAL time: 3 days 9 hrs 48 mins 37 secs Intermediate status: 257 unrouted; REAL time: 3 days 10 hrs 19 mins 6 secs Intermediate status: 246 unrouted; REAL time: 3 days 10 hrs 50 mins 16 secs Intermediate status: 254 unrouted; REAL time: 3 days 11 hrs 20 mins 55 secs Intermediate status: 242 unrouted; REAL time: 3 days 11 hrs 51 mins 34 secs Intermediate status: 244 unrouted; REAL time: 3 days 12 hrs 22 mins 6 secs Intermediate status: 255 unrouted; REAL time: 3 days 12 hrs 52 mins 50 secs Intermediate status: 250 unrouted; REAL time: 3 days 13 hrs 23 mins 34 secs Intermediate status: 248 unrouted; REAL time: 3 days 13 hrs 54 mins 39 secs Intermediate status: 239 unrouted; REAL time: 3 days 14 hrs 25 mins 21 secs Intermediate status: 239 unrouted; REAL time: 3 days 14 hrs 55 mins 51 secs Intermediate status: 233 unrouted; REAL time: 3 days 15 hrs 26 mins 22 secs Intermediate status: 238 unrouted; REAL time: 3 days 15 hrs 56 mins 50 secs Intermediate status: 228 unrouted; REAL time: 3 days 16 hrs 27 mins 19 secs Intermediate status: 236 unrouted; REAL time: 3 days 16 hrs 57 mins 46 secs Intermediate status: 237 unrouted; REAL time: 3 days 17 hrs 28 mins 14 secsArticle: 67001
Hello, I'm using Nios Development Kit (general purpose, APEX). Is it outdated? It seems hard to find reference. All references are talking about Cyclone and Stratix. I even could not find the software development tutorial for APEX. I'm new to this tool suit. Does anyone have good suggestions to shorten time to hand on? Thanks! ChiArticle: 67002
Are you sure you aren't using 32/36 bit wide brams with co-located multipliers. A 32 or 36 bit wide BRAM shares data lines with one of the multiplier multiplicand inputs. It could be that some of your BRAMs are 32/6 bit and there are not enough locations to put them all in places where the adjacent multiplier is not used. If the quantity of each indicate a fit is possible, then you may have to resort to floorplanning the multipliers and BRAMs, as the placer doesn't seem to do so well with either. Kelvin wrote: > Hi, there: > > My design takes up a long time still not routed... > > Here is the resources consumption and P&R log... How can I make it run > faster? > It is a simple 1-clock design, with 118 multipliers. > > Selected Device : 2v6000bf957-6 > > Number of Slices: 19984 out of 33792 59% > Number of Slice Flip Flops: 11278 out of 67584 16% > Number of 4 input LUTs: 31877 out of 67584 47% > Number of bonded IOBs: 93 out of 684 13% > Number of BRAMs: 27 out of 144 18% > Number of MULT18X18s: 122 out of 144 84% > Number of GCLKs: 1 out of 16 6% > > Phase 1: 141110 unrouted; REAL time: 21 mins 14 secs > > Phase 2: 117379 unrouted; REAL time: 28 mins 10 secs > > ... > > Intermediate status: 265 unrouted; REAL time: 3 days 8 hrs 16 mins > 53 secs > > Intermediate status: 265 unrouted; REAL time: 3 days 8 hrs 47 mins > 21 secs > > Intermediate status: 260 unrouted; REAL time: 3 days 9 hrs 18 mins 1 > secs > > Intermediate status: 259 unrouted; REAL time: 3 days 9 hrs 48 mins > 37 secs > > Intermediate status: 257 unrouted; REAL time: 3 days 10 hrs 19 mins > 6 secs > > Intermediate status: 246 unrouted; REAL time: 3 days 10 hrs 50 mins > 16 secs > > Intermediate status: 254 unrouted; REAL time: 3 days 11 hrs 20 mins > 55 secs > > Intermediate status: 242 unrouted; REAL time: 3 days 11 hrs 51 mins > 34 secs > > Intermediate status: 244 unrouted; REAL time: 3 days 12 hrs 22 mins > 6 secs > > Intermediate status: 255 unrouted; REAL time: 3 days 12 hrs 52 mins > 50 secs > > Intermediate status: 250 unrouted; REAL time: 3 days 13 hrs 23 mins > 34 secs > > Intermediate status: 248 unrouted; REAL time: 3 days 13 hrs 54 mins > 39 secs > > Intermediate status: 239 unrouted; REAL time: 3 days 14 hrs 25 mins > 21 secs > > Intermediate status: 239 unrouted; REAL time: 3 days 14 hrs 55 mins > 51 secs > > Intermediate status: 233 unrouted; REAL time: 3 days 15 hrs 26 mins > 22 secs > > Intermediate status: 238 unrouted; REAL time: 3 days 15 hrs 56 mins > 50 secs > > Intermediate status: 228 unrouted; REAL time: 3 days 16 hrs 27 mins > 19 secs > > Intermediate status: 236 unrouted; REAL time: 3 days 16 hrs 57 mins > 46 secs > > Intermediate status: 237 unrouted; REAL time: 3 days 17 hrs 28 mins > 14 secs -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 67003
Hi Jim, > I think the OP was refering to the wider datapaths. > I don't know the cycle level details of the AMD or Intel 64 bit > but an obvious and simple speed gain can come from a wider HW fetch. > (even running < 64 bit opcodes ) and then a simple check if the next > opcode / next data value is in that block. Yes, wider memory interfaces/cache data lines can help, but as you say, this is independent of op-code size. If I recall correctly, AMD and Intel processors already fetch 64-bit blocks, but this may have been increased. The latest m/b chipsets for both families of processors use dual-channel DDR (128-bits wide) and so I would not be surprised if they've increased the size of fetches. As vendors introduce 64-bit capable processors (such as Opteron), they often also enhance various aspects of the CPU architecture in ways that help both 32- and 64-bit code. And while the 64-bitness of x86-64 may not matter much for speed, the doubling of the register files etc. could result in faster performance. It's every computer engineers dream to be a processor architect, isn't it? :-) Regards, - PaulArticle: 67004
Here are some things to check. (a) Check you are not running out of real memory. If you are paging to disk then run times will get very large. You can check the memory usage in "TASK MANAGER" if you are running NT4, WIN2K or XP. (b) Check that you are not overconstrained or simply can never meet the timing. Check this by running "TIMING ANALYSER" at map stage. You may consider using incremental design. Have a look on the Xilinx website for info. You may some pointers from our current TechTips http://www.enterpoint.co.uk/techitips.html but it is more aimed at increment synthesis. You may also wish to consider some floorplanning to help the tools on their way. John Adair Enterpoint Ltd. http://www.enterpoint.co.uk This message is the personal opinion of the sender and not that necessarily that of Enterpoint Ltd.. Readers should make their own evaluation of the facts. No responsibility for error or inaccuracy is accepted. "Kelvin" <kelvin8157@hotmail.com> wrote in message news:4045c71c$1@news.starhub.net.sg... > Hi, there: > > My design takes up a long time still not routed... > > Here is the resources consumption and P&R log... How can I make it run > faster? > It is a simple 1-clock design, with 118 multipliers. > > > > Selected Device : 2v6000bf957-6 > > Number of Slices: 19984 out of 33792 59% > Number of Slice Flip Flops: 11278 out of 67584 16% > Number of 4 input LUTs: 31877 out of 67584 47% > Number of bonded IOBs: 93 out of 684 13% > Number of BRAMs: 27 out of 144 18% > Number of MULT18X18s: 122 out of 144 84% > Number of GCLKs: 1 out of 16 6% > > > Phase 1: 141110 unrouted; REAL time: 21 mins 14 secs > > Phase 2: 117379 unrouted; REAL time: 28 mins 10 secs > > ... > > Intermediate status: 265 unrouted; REAL time: 3 days 8 hrs 16 mins > 53 secs > > Intermediate status: 265 unrouted; REAL time: 3 days 8 hrs 47 mins > 21 secs > > Intermediate status: 260 unrouted; REAL time: 3 days 9 hrs 18 mins 1 > secs > > Intermediate status: 259 unrouted; REAL time: 3 days 9 hrs 48 mins > 37 secs > > Intermediate status: 257 unrouted; REAL time: 3 days 10 hrs 19 mins > 6 secs > > Intermediate status: 246 unrouted; REAL time: 3 days 10 hrs 50 mins > 16 secs > > Intermediate status: 254 unrouted; REAL time: 3 days 11 hrs 20 mins > 55 secs > > Intermediate status: 242 unrouted; REAL time: 3 days 11 hrs 51 mins > 34 secs > > Intermediate status: 244 unrouted; REAL time: 3 days 12 hrs 22 mins > 6 secs > > Intermediate status: 255 unrouted; REAL time: 3 days 12 hrs 52 mins > 50 secs > > Intermediate status: 250 unrouted; REAL time: 3 days 13 hrs 23 mins > 34 secs > > Intermediate status: 248 unrouted; REAL time: 3 days 13 hrs 54 mins > 39 secs > > Intermediate status: 239 unrouted; REAL time: 3 days 14 hrs 25 mins > 21 secs > > Intermediate status: 239 unrouted; REAL time: 3 days 14 hrs 55 mins > 51 secs > > Intermediate status: 233 unrouted; REAL time: 3 days 15 hrs 26 mins > 22 secs > > Intermediate status: 238 unrouted; REAL time: 3 days 15 hrs 56 mins > 50 secs > > Intermediate status: 228 unrouted; REAL time: 3 days 16 hrs 27 mins > 19 secs > > Intermediate status: 236 unrouted; REAL time: 3 days 16 hrs 57 mins > 46 secs > > Intermediate status: 237 unrouted; REAL time: 3 days 17 hrs 28 mins > 14 secs > > >Article: 67005
On Tue, 2 Mar 2004 19:57:58 -0600, Kenneth Land wrote: >Seems to be a common misconception that 64bits just increases the amount of >addressable memory. The only common misconception is that swapping for a 64-bit processor in a desktop PC will lead to a large performance increase. It doesn't. (Other than any gain from a higher clock speed, of course.) Like to make a guess as to the extra overhead in a 64-bit version of current OSs, btw? >More importantly for most applications is that twice >the data is moved or operated on per clock cycle. Data is only data if it's meaningful. The use of 64-bit arithmetic variables is comparatively rare in most applications. Certain scientific and CAD packages do make heavy use of 64-bit floats, but I doubt that's the case here (and high-end processors tend to use 80-bit data paths around the FPU anyway). There's not a lot to be gained from accessing memory in 64-bit chunks if you're only interested in 32 of them (there is an effect on cache hits with vectors, but it's not measurably worthwhile in practice). There will be some effect on prefetch, but it depends on the state of the L1 and L2 caches and the instruction pipeline(s) themselves. Tests I've seen suggest an increase of memory bandwidth efficiency of only around 1-2% at best. If you want a 64-bitter to really earn it's corn, use it in something like a database server with 64GB of RAM and a multi-TB disk farm. Give the poor thing something *meaningful* to do with the extra 32 bits. You'd still need 64-bit software though. -- MaxArticle: 67006
On Tue, 2 Mar 2004 20:05:37 -0600, Kenneth Land wrote: > >On the disk speed issue I have one data point. I upgraded my 1GHz PIII-M >laptop drive from a slow 4200 RPM to the fastest 7200 RPM available (for >laptops) and my Nios system build went from about 16 min. to about 15 min. >Not worth the pain and expense of swapping the drive. Not in a low-spec machine like that, no. The options in a laptop are limited, and there's no way to increase the disk controller bandwidth. But the effect on a powerful workstation of installing a RAID with a high-bandwidth controller and drives such as U-320 SCSI can have a dramatic impact. As always though, it depends on the application. >On memory, I upgraded the memory in my 3.2 GHz P4 from 512 to 1GB and there >was no noticable difference until I set the memory from 333MHz to 400MHz >dual channel. Then my system build went from 5 min. to 4 min. - 20%. That doesn't mean a lot. You only need to add more memory if you're running out of it ;o) -- MaxArticle: 67007
Max <mtj2@btopenworld.com> writes: > If you want a 64-bitter to really earn it's corn, use it in something > like a database server with 64GB of RAM and a multi-TB disk farm. Give Or running synthesis, place & route, static timing analysis etc. on an ASIC design requiring 6GB RAM. Petter -- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?Article: 67008
"Paul Leventis (at home)" wrote: > > Hi Jim, > > > I think the OP was refering to the wider datapaths. > > I don't know the cycle level details of the AMD or Intel 64 bit > > but an obvious and simple speed gain can come from a wider HW fetch. > > (even running < 64 bit opcodes ) and then a simple check if the next > > opcode / next data value is in that block. > > Yes, wider memory interfaces/cache data lines can help, but as you say, this > is independent of op-code size. If I recall correctly, AMD and Intel > processors already fetch 64-bit blocks, but this may have been increased. > The latest m/b chipsets for both families of processors use dual-channel DDR > (128-bits wide) and so I would not be surprised if they've increased the > size of fetches. > > As vendors introduce 64-bit capable processors (such as Opteron), they often > also enhance various aspects of the CPU architecture in ways that help both > 32- and 64-bit code. And while the 64-bitness of x86-64 may not matter much > for speed, the doubling of the register files etc. could result in faster > performance. > > It's every computer engineers dream to be a processor architect, isn't it? > :-) We can all speculate about the relative merits of processor enhancements, but these machines are very complex and the only real way to tell what helps is to try it. Since we are not all ancient Greeks philosophizing in our armchairs, it would be a good idea to pick a design and to run it on a few different workstations, hopefully including an AMD64. I have always been surprised that the FPGA vendors don't put some effort into evaluating platforms and releasing the results. I know this can be a bit of a can of worms, but every time I look at buying a new machine, the first question I research is how fast it will run the FPGA design software. Then I am often trying to speculate on my own since I don't have much info to go on. I seem to recall that there at least used to be some available info on how much memory was needed to optimize run time as a function of part size. But I haven't seen new info on that in quite a while. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAXArticle: 67009
"rickman" <spamgoeshere4@yahoo.com> wrote in message news:404603D6.AA20F818@yahoo.com... > We can all speculate about the relative merits of processor > enhancements, but these machines are very complex and the only real way > to tell what helps is to try it. Since we are not all ancient Greeks > philosophizing in our armchairs, it would be a good idea to pick a > design and to run it on a few different workstations, hopefully > including an AMD64. > > I have always been surprised that the FPGA vendors don't put some effort > into evaluating platforms and releasing the results. I had assumed that had happened already. Silly me. Perhaps we'll just buy an AMD machine and see what it does, but I thought somebody might have tried that already. Anybody know how solid the Quartus II 4.0 Linux port is? I can't get an answer out of Altera.Article: 67010
The introduction to the following paper by Li and Hauck might help with your high level understanding of the configuration architecture: http://www.ee.washington.edu/people/faculty/hauck/publications/VirtexCompressJ.pdf There are 48 frames in a column, with the size of a frame dependent on the number of (CLB) rows in the device. I'm not aware of any documentation on the mapping between individual resources (e.g. a LUT's content) and the configuration bitstream. As suggested by the previous poster, delving into JBits is probably your best option. Irwin.Article: 67011
On Wed, 03 Mar 2004 04:47:19 GMT, Paul Leventis (at home) wrote: >Provided the peak memory consumption of Quartus for the compilation in >question is less than the amount of physical memory in the system, >increasing the amount of memory will not help compile time. For non-trivial >designs, a Quartus compile will be most heavily influenced by CPU speed, and >then by memory sub-system speed -- disk speed will have little influence. I suspected that might be the case, but I wasn't quite sure. I'm more used to programming language tools that use library files extensively, where a fast disk system (or a big ramdisk) can give very worthwhile speed gains. Is there any possibility of making Quartus multi-threaded? That strikes me as the most likely way to get a dramatic performance increase, though I know it's not always easy to achieve with heuristic apps. >CAD tools process a lot of data. I don't know if a Xeon (bigger cache) is >much faster than a normal P4 (smaller cache), but I wouldn't be surprised if >this were the case for the same reason that a Xeon processor is supposedly >better for server applications -- bigger cache helps applications whose data >set doesn't fit into the cache. While the extra cache is important in itself, much of the performance gain of the Xeon is also due to the greater degree of parallelism and deeper prefetch lookahead, thus making better use of memory bandwidth throughout. -- MaxArticle: 67012
Max <mtj2@btopenworld.com> writes: > Is there any possibility of making Quartus multi-threaded? That > strikes me as the most likely way to get a dramatic performance > increase, though I know it's not always easy to achieve with heuristic > apps. I would like to get see synthesis and place and route tools I could run on a cluster of cheap PC's. I would be happy with less than linear speedups, e.g. using a 16-node cluster to get a 8x speedup. Petter -- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?Article: 67013
On 03 Mar 2004 18:46:34 +0100, Petter Gustad wrote: >I would like to get see synthesis and place and route tools I could >run on a cluster of cheap PC's. I would be happy with less than linear >speedups, e.g. using a 16-node cluster to get a 8x speedup. I doubt you'd get anywhere near. Trying to implement those algorithms efficiently on the sort of loosely-coupled architecture you propose would be nigh-on impossible. It's not easy on a single SMP box, but it's doable. A quad Xeon (8 x CPU) box would cost less than four single decent-spec machines anyway. -- MaxArticle: 67014
Max wrote: > > On 03 Mar 2004 18:46:34 +0100, Petter Gustad wrote: > > >I would like to get see synthesis and place and route tools I could > >run on a cluster of cheap PC's. I would be happy with less than linear > >speedups, e.g. using a 16-node cluster to get a 8x speedup. > > I doubt you'd get anywhere near. Trying to implement those algorithms > efficiently on the sort of loosely-coupled architecture you propose > would be nigh-on impossible. It's not easy on a single SMP box, but > it's doable. > > A quad Xeon (8 x CPU) box would cost less than four single decent-spec > machines anyway. Not if the four machines are sitting around all night running screen savers. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAXArticle: 67015
rickman wrote: > > I have always been surprised that the FPGA vendors don't put some effort > into evaluating platforms and releasing the results. Would seem a very good idea. On this topic, I see Intel released a new Xeon with 3GHz and 4MB (!) cache, and they claim 25% faster. Of course, you pay - $3692 (Qty column not given ) :) The PR claims This is the last release before intel adds 64 bit extensions....Article: 67016
Max <mtj2@btopenworld.com> writes: > On 03 Mar 2004 18:46:34 +0100, Petter Gustad wrote: > > >I would like to get see synthesis and place and route tools I could > >run on a cluster of cheap PC's. I would be happy with less than linear > >speedups, e.g. using a 16-node cluster to get a 8x speedup. > > I doubt you'd get anywhere near. Trying to implement those algorithms > efficiently on the sort of loosely-coupled architecture you propose > would be nigh-on impossible. It's not easy on a single SMP box, but > it's doable. I disagree. Synthesis as well as P&R involve exploring many alternatives and sort/explore by some underestimate of expense/delay (typically using a A* search algorithm or similar). This can be done in parallel. The datasets can be copied to each node and there will be very little information which has to be exchanged over the interconnect. Of course there is not much to gain if your P&R takes 1 minute, but for larger designs and/or more accurate wire delay models (e.g. non-linear delay modeling and physical synthesis) the benefit will be larger. This has been implemented in some ASIC tools already. Actually Xilinx has been doing some very simple parallel processing in ISE (on Solaris and now Linux) for a long time. Multiple iterations of "par" can run in parallel on multiple hosts, then you pick the best result. This is of course, extremely coarse grained compared to what I indicated above. Petter -- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?Article: 67017
HI there. Did you look at power supply. During configuration the FPGA draws a lot of power ( can be > 500Ma!!). If PSU can not deliver this, you have aconfiguration problem. success ron proveniers "Dimitris Kontodimopoulos" <dkonto@isd.gr> schreef in bericht news:1609ee5e.0403010148.1b9c61df@posting.google.com... > Hello there > > I'm having serious problems configuring my FPGA using EPC2. We have > designed the circuit exactly as stated in the Altera datasheet and > even played around with the pullups and buffering that's recommended. > To be more specific, We have a board with a FLEX10KE > (EPF10K200SBC356-1) and a EPC2LC20 for in-system configuration. We > also have provided for direct Byteblaster configuration using a > connector (using the same path towards the FPGA and selecting between > them through enabling/disabling a buffer). Finally, we have a JTAG > connector by which we can configure the FPGA directly using the SOF > file generated by Quartus - pls read below. > Anyway, what we're seeing is: > The EPC2 gets progammed OK but then the problems start. When I turn > the system off and on again to initiate configuration the nCONFIG pin > comes out of reset and so does nSTATUS but only for a very small > amount of time. During this time DCLK goes enabled and DATA transfers > configuration data, as normally. Then nSTATUS goes low again and the > configuration is interrupted as you would expect. There is nothing in > the circuit that could pull this pin low - it is a point to point > connection between FPGA and EPC2. It seems however that the EPC2 goes > back on reset state hence pulling its OE pin low. From that point > onwards these signals are going crazy, ie they randomly go high or low > so the FPGA never gets configured. Tried using external pullups whilst > disabling the internal ones through Quartus, but there was no change. > One last point is that so far I've been configuring the FPGA through a > direct JTAG connection using the SOF file - this works fine. Does this > perhaps confuse the device, ie how does it know whether it should be > programmed through JTAG or EPC2. Do I need to set something there? > Finally, I'm using the POF file to program the EPC2 - I'm assuming > this is correct?? Please give me some feedback because I'm really > stuck with this. Any tips would be much welcome. Thanks in advanceArticle: 67018
Max <mtj2@btopenworld.com> writes: > A quad Xeon (8 x CPU) box would cost less than four single > decent-spec machines anyway. My experience is the opposite. I've heard from users in the high performance computing industry that the most cost efficient systems are clusters of dual CPU nodes (assuming your application will run efficiently on a cluster). A 4 CPU Xeon system like a Dell PowerEdge 6650 with 4x Xeon, 3.0GHz and 4GB RAM costs $28,070. A single PowerEdge 750 (1U server) with 3.4GHz P4 (higher clock frequency, but smaller cache) with 1GB RAM costs $3,165. 8 CPU Xeon SMP's (Profusion architecture) are very expensive. A Proliant 8500 costs $100,000+ if memory serves me right. Petter -- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?Article: 67019
Hello, I would like to the different Finite Field Multipliers used for doing Finite Field Multiplication? Also specifically ones that do in Polynomial Basis?? and for FPGA Implementation too!!!! any help regards to references to these design of multipliers and if possible a reference to the comparison of different multipliers would greatly appreciated. Thanx. OP.Article: 67020
On 03 Mar 2004 22:58:46 +0100, Petter Gustad wrote: >A 4 CPU Xeon system like a Dell PowerEdge 6650 with 4x Xeon, 3.0GHz >and 4GB RAM costs $28,070. A single PowerEdge 750 (1U server) with >3.4GHz P4 (higher clock frequency, but smaller cache) with 1GB RAM >costs $3,165. The hyperthreaded Xeons run as two processors, so a quad Xeon board appears to a HT-aware OS as an 8-CPU system. Why pay for all the extra high-end hardware in a top-end server if you don't need it? When I was last looking at building systems like this, about 18 months or so ago, a quad-Xeon mobo from Supermicro was <$2000, and the processors were around $450 apiece. -- MaxArticle: 67021
Max <mtj2@btopenworld.com> writes: > On 03 Mar 2004 22:58:46 +0100, Petter Gustad wrote: > > >A 4 CPU Xeon system like a Dell PowerEdge 6650 with 4x Xeon, 3.0GHz > >and 4GB RAM costs $28,070. A single PowerEdge 750 (1U server) with > >3.4GHz P4 (higher clock frequency, but smaller cache) with 1GB RAM > >costs $3,165. > > The hyperthreaded Xeons run as two processors, so a quad Xeon board > appears to a HT-aware OS as an 8-CPU system. Then you would call a system with single P4 with HyperThreading a dual processor system as well then? This would be a little "unfair" when comparing to a full dual-core CPU like the rumored UltraSparc-IV. > Why pay for all the extra high-end hardware in a top-end server if you > don't need it? When I was last looking at building systems like this, My point was that you usually get lots of extra high-end hardware when you buy large SMP systems, especially when you need to go beyond 4-way. Also, it's usually cheaper to get 4x4GB RAM rather than 16GB RAM for a single MB (unless you have a large enough number of DIMM slots). > about 18 months or so ago, a quad-Xeon mobo from Supermicro was > <$2000, and the processors were around $450 apiece. This is pretty good, I was not aware of the low cost of the Supermicro MB. You would end up at close to $4000, e.g. in the same ballpark as buying 4 P4 systems. So if the application was performing better on the SMP than on the to the cluster I would definitely go with the SMP. Petter -- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?Article: 67022
Rick, abort happens asynchronously (doesn't need to see CCLK rising edge for it to take place). And to avoid this, CS must be deasserted first. The data pins will be asynchrounously driven by the FPGA with the WRITE_B changes to high, but the status words (cfgerr_b, dalign, rip, in_abort_b, 4'b1111) is clocked out to data pin [7:0] by CCLK Regards, Wei rickman wrote: > I am looking at the data sheet for the Spartan 3 parts trying to figure > out how to configure them. It seems like it is the same as most of the > other families, but there is one note that I don't completely > understand. Section 3, page 12, has the following text... > > Figure 5: Waveforms for Master and Slave Parallel Configuration > Notes: > 1. In a given CCLK cycle, when RDWR_B transitions High or Low while > holding CS_B Low, the next rising edge on the CCLK pin will abort > configuration. > > > > This is not exactly the same as XAPP176 describing the Spartan II > configuration, page 14... > > While CS is High, the Slave Parallel interface does not expect any data > and ignores all CCLK transitions. However, WRITE must continue to be > asserted while CS is asserted. If WRITE is High during a positive CCLK > transition while CS is asserted, the FPGA aborts the operation. > > > > In the first case it sounds as if the abort condition is created by CS- > being low and an edge on the RDWR- signal followed by a rising edge on > CCLK (without making it clear if this also has to be during CS- low). > > In the second case, it is just the state of the two signals, sampled at > the rising edge of CCLK which will create an abort. > > If I am trying to use the CS- WR- and IO signals from an MCU to control > this, the difference between these two descriptions is significant. Am > I making this more difficult than it is? Can I connect the signals as > shown below and make this work ok? > > MCU FPGA Write NO > --- ---- Byte Write > CS- RD_WR- ----_______------______--- > WR- CCLK -----_____--------____---- > IO CS- -_____________------------ > > > The other thing I am not clear about is how to use these same signals > after configuration. It looks like I have to set "persist" to off if I > want to put these signals on the MCU bus after config in order to have a > bus interface to the chip. But if I want to perform partial > reconfiguration, I think I have to have "persist" set to on, no? Does > this mean I will have to double up on all these signals, one for > (re)configuration and one for operation? > > I seem to recall that the Lucent chips allowed you to use the MCU > interface after configuration. Do the Xilinx chips have that as well? >Article: 67023
On Tue, 2 Mar 2004 13:34:26 +0800, "Peng Cong" <pc_dragon@sohu.com> wrote: >Why you need follow 2 attribute? >// synthesis attribute keep of e is "true" >// synthesis attribute keep of f is "true" >e and f are not flip-flop. > >Remove them should be OK Remove them and watch XST merge the two flip flops. (I know; I tried it.) I suggest you read the XST documentation. Regards, Allan.Article: 67024
I was just trying to be helpful by sharing my experience. We're only interested in speeding up Quartus builds in this thread and some have been suggesting more memory (32 GB in some instances) and faster drives. I've done both in two different machines and the biggest improvement came from tweaking the memory subsystem, not adding more memory above 512MB or a faster drive. The 7200 RPM drive is very much faster as can be seen with much much faster boot times. Didn't mean much on Quartus builds though. Seems Quartus needs (for my Nios system) a fast CPU with at least several hundred MB's of tweaked memory. I write not slow image processing algorithms and use as many wires as the system can provide. If its an 8 bit cpu then I use 8 bit optimizations, if its 32 bit then 32 bit optimizations. Haven't tried 64 bit yet, but I plan too. Can't imagine any developer worth their salt that wouldn't. Ken "Max" <mtj2@btopenworld.com> wrote in message news:5vtb40lq1kmtcfqefbhdr69ei29kpq6h60@4ax.com... > On Tue, 2 Mar 2004 20:05:37 -0600, Kenneth Land wrote: > > > > >On the disk speed issue I have one data point. I upgraded my 1GHz PIII-M > >laptop drive from a slow 4200 RPM to the fastest 7200 RPM available (for > >laptops) and my Nios system build went from about 16 min. to about 15 min. > >Not worth the pain and expense of swapping the drive. > > Not in a low-spec machine like that, no. The options in a laptop are > limited, and there's no way to increase the disk controller bandwidth. > But the effect on a powerful workstation of installing a RAID with a > high-bandwidth controller and drives such as U-320 SCSI can have a > dramatic impact. As always though, it depends on the application. > > >On memory, I upgraded the memory in my 3.2 GHz P4 from 512 to 1GB and there > >was no noticable difference until I set the memory from 333MHz to 400MHz > >dual channel. Then my system build went from 5 min. to 4 min. - 20%. > > That doesn't mean a lot. You only need to add more memory if you're > running out of it ;o) > > -- > Max
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z