Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Jim Granville wrote: > > Sounds impressive. > You have seen the AS Assembler, and the Mico8 from Lattice ? Yes, I am very much aware of Mico8 and I have used AS in several projects in the past. I know that it supports PicoBlaze (and Mico8 now). But what I want to do now is a small version of a language like HLA or terse for PicoBlaze. Something simple and readable that is easy to modify like the current KCAsm (hey, adding the mul and add/sub instructions took less than one minute. ;o) Here is what sarKCAsm is currently looking like (currently a JavaCC implementation, but I am swapping to ANTLR now because it has better support for trees). ---8<--- s0 = $ca ; load s1 = s0 + $fe ; same as s1 = s0, s1 += $fe func($be, $ef) ; function call, s0 = $be, s1 = $ef s3 = 16 loop: func(s0, s1) s0 == $55 ; compare done Z? ; conditional jump s3 -= 1 done Z? loop ; unconditional jump done: done func(s0: s0, s1): ; result + clobber list s0 <- $0 ; read from port 0 s0 ^= s1 ; xor s1 << C ; sla # ; return --->8--- > FWIR the Mioo8 is very similar to PicoBlaze ( as expected, both are > tiny FPGA targeted CPUs ), but I think with a larger jump and call reach > (but simpler RET options). > If you are loading on features, the call-lengths might need attention ? For now the limits of the PicoBlaze model have been within my needs (IIRC, mico8 has the same 10-bit jumps/calls as PB3 and it is very isomorphic to it). My main drive to create PacoBlaze was to get the most versatile processor that I could use as a peripheral controller in my projects (eg motor control, bus controller, PWM generator, audio co-processor, specifically in the JBRD of my Javabotics project, http://bleyer.org/javabotics/). It isn't difficult to extend the memory model of PicoBlaze using PacoBlaze, though. > Have you tried targeting this to a lattice device ? Not yet. I plan to synthesize the core using different tools that I may have access to, but that is not in my list of priorities. Cheers. -- /"Naturally, there's got to be some PabloBleyerKocik / limit, for I don't expect to live pablo / forever, but I do intend to hang on @bleyer.org / as long as possible." -- Isaac AsimovArticle: 99151
I've got a Dini DN8000K10 in hand that seems to work quite well and having the features you were looking for.Article: 99152
Hello Ivan. For the assured correct coupling what bus macro is it better to use? Probably Hard Macro? I will be very thankful if you were able to lay out separately something like a *.nmc or *.xdl file. Beforehand thankful.Article: 99153
Hi, I created my test IP-core (ANDgate) on ISE and imported it to XPS to connect it to the microblaze. I am using Spartan 3 starter kit. I defined my core to have only 1 register with 32 bit width = slv_reg0. This means i have to put my ANDgate result(X AND Y = Z) to slv_reg0. slv_reg0 <= Z. Then i tried to read this value from microblaze processor. but i couldn't read it. My result is always 0x00000000. Can some one help me on this problem. MogogoArticle: 99154
Isaac Bosompem wrote: > Allan Herriman wrote: > >>On 20 Mar 2006 07:41:37 -0800, "Isaac Bosompem" <x86asm@gmail.com> >>wrote: >> >> >>>John_H wrote: >>> >>>>Isaac Bosompem wrote: >>>> >>>>>Hi Ray and Peter, >>>>> >>>>>I am sorry for hijacking your thread Roger , but I think my question is >>>>>relevant. >>>>> >>>>>I was thinking of using about 8 FIR (bandpass) filters in parallel to >>>>>create a graphic equalizer. Now I know there are some phase problems >>>>>with this method but it seems to me like a very logical way to go about >>>>>this problem. I was wondering if you guys know of any better methods? >>>>> >>>>>I also was thinking of using 16 taps each. >>>>> >>>>>320 FF's is not a lot actually. My XC3S200 (which is probably dirt >>>>>cheap) has almost 4000 FF's. Enough for your filter network and much >>>>>more. >>>>> >>>>>-Isaac >>>> >>>>Another great advantage to FPGA FIRs: most of the time the FIRs are >>>>symmetric which allows half the taps (half the multipliers) to implement >>>>the full FIR by adding t-d to t+d before multiplying by the common >>>>coefficient, the implementation is more elegant. >>> >>>Hi John, >>> >>>I cannot see what you mean? Can you offer a quick example? >> >>An FIR filter is implemented as a dot product of a constant vector and >>a vector made up of the input samples delayed, >> >>y[n] = c[0].x[n] + c[1].x[n-1] + ... c[m-1].x[n-m+1] >>for an m tap filter. >> >>The c[n] are the coefficients. If the filter has a linear phase >>response, the coefficients are symmetric, so >>c[0] = c[m-1], c[1] = c[m-2], etc. >> >>We can group the expression for y[n] as follows: >> >>y[n] = c[0].(x[n] + x[n-m+1) + c[1].(x[n-1] + x[n-m+2]) + ... >> >>This has (roughly) halved the number of multipliers. >>I say roughly, because m is often odd. >> >>Regards, >>Allan > > > Ahh, I see, thanks. > > Do you guys know of a good filter design software, I have an old one > for DOS, but it is quite difficult to use (Also I do not have access to > MATLAB). > > -Isaac > One of the tools I use is Scopefir, which is a FIR filter design tool. http://www.iowegian.com For the money, it is a good value.Article: 99155
I don't take issue with anything Tim stated, but I will add a few comments. I think that the added complexity of floating point in an FPGA will probably be enough to rule it out. Fixed point implementations are often better than floating point implementations. This comparison tends to be true when the result of a multiplication is twice the width as the inputs in the fixed point case and when a floating point result is the same size as its inputs. This is usually the case in a DSP processor. This also assumes that you use a filter structure that takes advantage of the long result. Most IIR filters are constructed as cascaded biquads (and sometimes one first order section). The choice of the biquad structure has a significant impact on performance. If we restrict our choices to one of the direct forms, then usually the direct form I (DF I) structure is best for fixed point implementations. This assumes that we have a double wide accumulator. If this is not the case, the DF I is not a particularly good structure. Floating point implementations are usually implemented as DF II or the slightly better transposed DF II. You can also improve the performance of a fixed point DF I by adding error shaping. This is relatively cheap from a resoursce point of view in this structure. As Tim pointed out, you have to pay attention to scaling with fixed point implementations. Like every design problem, you need to examine the performance requirements carefully. I would look at the pole-zero placement on the unit circle. For you need a high Q filter at some low frequency as compared to the sampling rate, the math precision is going to be critical. The poles might not be on the unit circle, but they will be very close. If the precision is poor, the filter is likely to blow up. In other situations, just about anything will work. Here is a good link describing biquad structures: http://www.earlevel.com/Digital%20Audio/Biquads.html -- Al Clark Danville Signal Processing, Inc. -------------------------------------------------------------------- Purveyors of Fine DSP Hardware and other Cool Stuff Available at http://www.danvillesignal.com Tim Wescott <tim@seemywebsite.com> wrote in news:FNadnRoXwvTbsYLZRVn-qQ@web-ster.com: > Tim Wescott wrote: > >> Roger Bourne wrote: >> >>> Hello all, >>> >>> Concerning digital filters, particurlarly IIR filters, is there a >>> preferred approach to implementation - Are fixed-point preferred >>> over floating-point calculations ? I would be tempted to say yes. >>> But, my google search results leave me baffled for it seems that >>> floating-point computations can be just as fast as fixed-point. >>> Furthermore, assuming that fixed-point IS the preferred choice, the >>> following question crops up: >>> If the input to the digital filter is 8 bits wide and the >>> coefficents are 16 bits wide, then it would stand to reason that the >>> products between the coefficients and the digital filter >>> intermediate data values will be 24 bits wide. However, when this >>> 24-bit value is to get back in the delay element network (which is >>> only 8 bits wide), some (understatemen) resolution will be lost. How >>> is this resolution loss dealt with? so it will lead to an erroneous >>> filter? -Roger >>> >> This is a simple question with a long answer. >> >> Floating point calculations are always easier to code than >> fixed-point, if for no other reason than you don't have to scale your >> results to fit the format. >> >> On a Pentium in 'normal' mode floating point is just about as fast as >> fixed point math; with the overhead of scaling floating point is >> probably faster -- but I suspect that fixed point is faster in MMX >> mode (someone will have to tell me). On a 'floating point' DSP chip >> you can also expect floating point to be as fast as fixed. >> >> On many, many cost effective processors -- including CISC, RISC, and >> fixed-point DSP chips -- fixed point math is significantly faster >> than floating point. If you don't have a ton of money and/or if your >> system needs to be small or power-efficient fixed point is mandatory. >> >> In addition to cost constraints, floating point representations use >> up a significant number of bits for the exponent. For most filtering >> applications these are wasted bits. For many calculations using >> 16-bit input data the difference between 32 significant bits and 25 >> significant bits is the difference between meeting specifications and >> not. >> >> For _any_ digital filtering application you should know how the data >> path size affects the calculation. Even though I've been doing this >> for a long time I don't trust to my intuition -- I always do the >> analysis, and sometimes I'm still surprised. >> >> In general for an IIR filter you _must_ use significantly more bits >> for the intermediate data than the incoming data. Just how much >> depends on the filtering you're trying to do -- for a 1st-order >> filter you usually to do better than the fraction of the sampling >> rate you're trying to filter, for a 2nd-order filter you need to go >> down to that fraction squared*. So if you're trying to implement a >> 1st-order low-pass filter with a cutoff at 1/16th of the sample rate >> you need to carry more than four extra bits; if you wanted to use a >> 2nd-order filter you'd need to carry more than 8 extra bits. >> >> Usually my knee-jerk reaction to filtering is to either use >> double-precision floating point or to use 32-bit fixed point in 1r31 >> format. There are some less critical applications where one can use >> single-precision floating point or 16-bit fractional numbers to >> advantage, but they are rare. >> >> * There are some special filter topologies that avoid this, but if >> you're going to use a direct-form filter out of a book you need >> fraction^2. >> > Oops -- thought I was responding on the dsp newsgroup. > > Everything I said is valid, but if you're contemplating doing this on > an FPGA the impact of floating point vs. fixed is in logic area and > speed (which is why fast floating point chips are big, hot and > expensive). Implementing an IEEE compliant floating point engine takes > a heck of a lot of logic, mostly to handle the exceptions. Even if > you're willing to give up compliance for the sake of speed you still > have some significant extra steps you need to take with the data to > deal with that pesky exponent. I'm sure there are various forms of > floating point IP out there that you could try on for size to get a > comparison with fixed-point math. >Article: 99156
fpga_toys@yahoo.com wrote: > Seems that it can be completely transparent with very very modest > effort. The parts all have non-volatile storage for configuration. If > the defect list is stored with the bitstream, then the installation > process to that storage just needs to read the defect list out before > erasing it, merge the defect list into the new bit stream, as the part > is linked (place and routed) for that system. > With a system level design based on design for test, and design for > defect management, the costs are ALWAYS in favor of defect management > as it increases yeilds at the mfg, and extends the life in the field by > making the system tollarent of intermittants that escape ATE and life > induced failures like migration effects. > Which reconfigurable FPGAs would those be with the non-volatile bitstreams? I'm not aware of any. Posts like these really make me wonder whether you've really done any actual FPGA design. They instead indicate to me that perhaps it has been all back of the envelope concept stage stuff with little if any carry through to a completed design (which is fine, but it has to be at least tempered somewhat with actual experience garnered from those who have been there). In particular, your concerns about power dissipation being stated on the data sheet, your claims of high performance using HLLs without getting into hardware description, your complaints about tool licensing while not seeming to understand the existing tool flow very well, the handwaving in the current discussion you are doing to convince us that defect mapping is economically viable for FPGAs, and now this assertion that all the parts have non-volatile storage sure makes it sound like you don't have the hands on experience with FPGAs you'd like us to believe you have. > >>5) Timing closure has to be considered when re-spinning an FPGA >>bitstream to avoid defects. In dense high performance designs, it may >>be difficult to meet timing in a good part, much less one that has to >>allow for any route to be moved to a less direct routing. > > > In RC that is not a problem ... it's handled by design. For embedded > designs, that is a different problem. What are you doing different in the RC design then? From my perspective, the only ways to be able to be able to tolerate changes in the PAR solution and still make timing are to either be leaving a considerable amount of excess performance margin (ie, not running the parts at the high performance/high density corner), or spending an inordinate amount of time looking for a suitable PAR solution for each defect map, regardless of how coarse the map might be. From your previous posts regarding open tools and use of HLLs, I suspect it is more on the leaving lots of performance on the table side of things. In my own experience, the advantage offered by FPGAs is rapidly eroded when you don't take advantage of the available performance. However, you also had a thread a while back where you were overly concerned about thermal management of FPGAs, claiming that your RC designs could potentially trigger a mini China syndrome event in your box. If you are leaving enough margin in the design so that it is tolerant to fortuitous routing changes to work around unique defects, then I sincerely doubt you are going to run into the runaway thermal problems you were concerned with. I've got a number of very full designs in modern parts (V2P, V4) clocked at 250-400 MHz that function well within the thermal spec with at most a passive heatsink and modest airflow. Virtually none of those designs would tolerate a quick reroute to avoid a defect on a critical route path without going through an extensive reroute of signals in that region, and that is assuming there was the necessary hooks in the tools to mark routes as 'do not use' (I am not aware of any hooks like that for routing, only for placement). Still, I'd like to hear what you have to say. If nothing else, it has sparked an interesting conversation. Having done some work in the RC area, and having done a large number of FPGA designs over the last decade (My 12 year old business is exclusively FPGA design, with a heavy emphasis on high performance DSP applications), most of which are pushing the performance envelope of the FPGAs, I am understandibly very skeptical about your chance of achieving all your stated goals, even if you did get everything you've complained about not having so far. Show me that my intuition is wrong. >Article: 99157
In article <1142888577.488377.237030@t31g2000cwb.googlegroups.com>, "Pablo Bleyer Kocik" <pablobleyer@hotmail.com> wrote: > Hello people. > > As I announced some days ago, I updated the PacoBlaze3 core > [http://bleyer.org/pacoblaze/] now with a wide ALU that supports an 8x8 > multiply instruction ('mul') and 16-bit add/sub operations ('addw', > 'addwcy', 'subw', 'subwcy'). The new extension core is called > PacoBlaze3M. It could be useful performing small DSP functions and math > subroutines when there is a spare hardware multiplier block. Cool, though I have not had had time to even get 2.0 running yet.. ( life got in the way of fun stuff )Article: 99158
Pablo Bleyer Kocik wrote: > Jim Granville wrote: > >>Sounds impressive. >>You have seen the AS Assembler, and the Mico8 from Lattice ? > > > Yes, I am very much aware of Mico8 and I have used AS in several > projects in the past. I know that it supports PicoBlaze (and Mico8 > now). But what I want to do now is a small version of a language like > HLA or terse for PicoBlaze. I realised that; - just checking you knew of them :) > Something simple and readable that is easy > to modify like the current KCAsm (hey, adding the mul and add/sub > instructions took less than one minute. ;o) Good targets. > Here is what sarKCAsm is currently looking like (currently a JavaCC > implementation, but I am swapping to ANTLR now because it has better > support for trees). > ---8<--- > > s0 = $ca ; load > s1 = s0 + $fe ; same as s1 = s0, s1 += $fe > func($be, $ef) ; function call, s0 = $be, s1 = $ef > > s3 = 16 > > loop: > func(s0, s1) > s0 == $55 ; compare > done Z? ; conditional jump > s3 -= 1 > done Z? > loop ; unconditional jump > > done: > done > > func(s0: s0, s1): ; result + clobber list > s0 <- $0 ; read from port 0 > s0 ^= s1 ; xor > s1 << C ; sla > # ; return Will you also do boolean (Flag) functions ? General comments: ( feel free to ignore... ) The expression clarity makes good sense, and I also like languages that can accept flexible constants: viz $55 or 0x55 or 55H, or 2#01010101 or 16#55, or 2#01_0101_01. I've also seen XOR AND OR NOT etc keywords supported, as well as the terse C equivalents. ( which are a real throwback to when source size mattered ). but I'm not sure about labels in the left most code-column - that makes code harder to scan, and indent etc, and not as clear in a syntax highighted editor.... ie If you have to add a comment, then the language is probably not clear enough.... # for return ? => why? - why not return, or RET or IFnZ RET label then condition ? => most languages are IF_Z THEN or if_nZ DestAddr Label for Loop jmp ? => REPEAT Label, or LOOP label If a 12yr old kid can read the source, and not need a raft of prior knowledge, then that's a good test of any language :) -jgArticle: 99159
Pablo Bleyer Kocik wrote: > For now the limits of the PicoBlaze model have been within my needs > (IIRC, mico8 has the same 10-bit jumps/calls as PB3 and it is very > isomorphic to it). I think I recall the Mico8 had more obvious expansion space in the opcodes - but either way, this is the sort of expansion that is nice to allow for early-on. With more smarts, users _are_ going to need larger address space :) The assembler should accept either size, and warn on the smaller/larger ceiling, based on a target/build family define. -jgArticle: 99160
On Mon, 20 Mar 2006 17:06:45 GMT, "John_H" <johnhandwork@mail.com> wrote: >"Allan Herriman" <allanherriman@hotmail.com> wrote in message >news:tumt12l1jq2cnppqn4e9ki3iiceni5d5bb@4ax.com... >> On Mon, 20 Mar 2006 15:52:41 -0000, "Symon" <symon_brewer@hotmail.com> >> wrote: >> ><snip> >>>Well, from a 200MHz clock, you can get exactly 1uHz if you make the >>>accummulator overflow at 200,000,000,000,000. i.e. >>> >>>accum <= (accum + freq) mod 2E14; >>> >>>The accummulator doesn't have to saturate at a power of two. >> >> Depends on your definition of "regular" DDS I suppose... >> >> The logic to implement the pipelined mod 2e14 operation will probably >> be a lot harder than simply making a regular binary phase accumulator >> a few bits wider. Still, if the requirement is for a step size of >> *exactly* 1uHz, then the mod operation is needed. ><snip> > >The logic is extremely simple: >When the MSbit changes, rather than adding PhaseInc add >((2^48)-(2e+14))/2+PhaseInc. The dual-increment value is very easy to >support. I added the /2 in there for "whenever" the MSbit changes rather >than just tracking the high-to-low transition. Of course! I was thinking that the decision to add the extra (2^48-2e14) would have to take place prior to the register, but now I realise that it can be pipelined, which makes it possible to get it to run at 200MHz. This may complicate downstream processing, e.g. use of CORDIC to generate a sinusoid. Regards, AllanArticle: 99161
Ray Andraka wrote: > fpga_toys@yahoo.com wrote: > >> Seems that it can be completely transparent with very very modest >> effort. The parts all have non-volatile storage for configuration. If >> the defect list is stored with the bitstream, then the installation >> process to that storage just needs to read the defect list out before >> erasing it, merge the defect list into the new bit stream, as the part >> is linked (place and routed) for that system. >> With a system level design based on design for test, and design for >> defect management, the costs are ALWAYS in favor of defect management >> as it increases yeilds at the mfg, and extends the life in the field by >> making the system tollarent of intermittants that escape ATE and life >> induced failures like migration effects. >> > > Which reconfigurable FPGAs would those be with the non-volatile > bitstreams? I think John was meaning store the info in the ConfigFlashMemory. Thus the read-erase-replace steps. .. but, you STILL have to get this info into the FIRST design somehow.... -jgArticle: 99162
His 'complete disregard for manners' consisted of 6 abbreviations. And while I agree his English was thoughtless, I think trashing him the way everyone has is even more thoughtless. Instead of calling his actions deplorable, and ridiculing him with bad imitations, perhaps it would have been better manners to politely ask him to rephrase his question, and help him only once his English was satisfactory. I'm a student as well, which may be why his post didn't offend me as it has some. But if he was out to 'butcher' the English language as has been accused, he could have done far far worse (simply refer to the bad imitations). I don't claim to know much about languages, but these abbreviations that have got everyone worked up are becoming more and more mainstream. And when one spends a lot of time informally communicating on the internet, it is understandable that speaking in this fashion wouldn't immediately register as offensive. Given the effort the OP went to in constructing his post (not a simple "help plx!!!11!one"), a gentle request for him to fix his English would have been sufficient IMO.Article: 99163
Ray Andraka wrote: > Which reconfigurable FPGAs would those be with the non-volatile > bitstreams? I'm not aware of any. What are XC18V04's? Magic ROMs? What are the platform flash parts? Magic ROMs? They are CERTAINLY non-volatile every time I've checked. In fact, nonvolatile includes disks, optical, and just about any other medium that doesn't go poof when you turn the power off. and now this assertion that all the parts > have non-volatile storage sure makes it sound like you don't have the > hands on experience with FPGAs you'd like us to believe you have. Ok Wizard God of FPGA's ... just how do you configure your FPGA's without having some form of non-volatile storage handy? What ever the configuration bit stream sources is, if it is reprogramable ... IE ignore 17xx proms ... you can store the defect list? UNDERSTAND? Now, the insults are NOT -- I REPEAT NOT - being civil. > What are you doing different in the RC design then? With RC there is an operating system, complete with disk based filesystem. The intent is to do fast (VERY FAST) place and route on the fly. > From my > perspective, the only ways to be able to be able to tolerate changes in > the PAR solution and still make timing are to either be leaving a > considerable amount of excess performance margin (ie, not running the > parts at the high performance/high density corner), or spending an > inordinate amount of time looking for a suitable PAR solution for each > defect map, regardless of how coarse the map might be. You are finally getting warm. Several times in this forum I discussed what I call "clock binning" where the FPGA accel board has several fixed clocks arranged as integer powers. The dynamic runtime linker (very fast place and route) places, routes, and assigns the next slowest clock that matches the code block just linked. The concept is use the fastest clock that is available for the code block that meets timing. NOT change the clocks to fix the code. > From your previous posts regarding open tools and use of HLLs, I > suspect it is more on the leaving lots of performance on the table side > of things. Certainly ... it may not hardware optimized to the picosecond. Some will be, but that is a different problem. Shall we discuss every project you have done in 12 years as though it was the SAME problem with identical requirements? I think not. So why do you for me? In my own experience, the advantage offered by FPGAs is > rapidly eroded when you don't take advantage of the available > performance. The performance gains are measured against single threaded CPU's with serial memory systems. The performance gains are high degrees of parallelism with the FPGA. Giving up a little of the best case performance is NOT a problem. AND if it was, for a large dedicated application, then by all means, use traditional PAR and fit the best case clock the the code body. > If you are leaving enough margin in the design so that it is > tolerant to fortuitous routing changes to work around unique defects, > then I sincerely doubt you are going to run into the runaway thermal > problems you were concerned with. This is a completely different problem set than that particular question was addressing. That problem case was about hand packed serial-parallel MACs doing a Red-Black ordered simulations with kernel sizes between 80-200 LUT's, tiled in tight, running at best case clock rate. 97% active logic. VERY high transistion rates. About the only thing worse, would be purposefully toggling everything. A COMPLETELY DIFFERENT PROBLEM is compiling arbitrary C code and executing it with a compile, link, and go strategy. Example is a student iterratively testing a piece of code in an edit, compile and run sequence. In that case, getting the netlist bound to a reasonable set of LUTs quickly and running the test is much more important than extracting the last bit of performance from it. Like it or not .... that is what we mean by using the FPGA to EXECUTE netlists. We are not designing highly optimized hardware. The FPGA is simply a CPU -- a very parallel CPU. > Show me that my intuition is wrong. First you have taken and merged several different concepts, as though they were some how the same problem .... from various posting topics over the last several months. Surely we can distort anything you might want to present by taking your posts out of context and arguing them in the worst possible combination against you. Let's try - ONE topic, one discussion. Seems that you have made up your mind. As you have been openly insulting and mocking ... have a good day. When are really interested, maybe we can have a respectful discussion. You are pretty clueless today.Article: 99164
Jim Granville wrote: > Ray Andraka wrote: > > fpga_toys@yahoo.com wrote: > >> The parts all have non-volatile storage for configuration. > I think John was meaning store the info in the ConfigFlashMemory. > Thus the read-erase-replace steps. > .. but, you STILL have to get this info into the FIRST design somehow.... Thanks Jim ... that is EXACTLY what I did say. It doesn't mater if the configuration storage is on an 18V04, platform flash card, or a disk drive.Article: 99165
Hi, What simulation tool will enable me to read internal signals/registers? --- leafArticle: 99166
John, last time I checked, FPGAs did not get delivered from Xilinx with the config prom. Sure, you can store a defect map on the config prom, or on your disk drive, or battery backed sram or whatever, but the point is that defect map has to get into your system somehow. Earlier in this thread you were asking/begging Xilinx to provide the defect map, even if just to one of 16 quadrants for each non-zero-defect part delivered. That leads to the administration nightmare I was talking about. In the absence of a defect map provided by Xilinx (which you were lobbying hard for a few days ago), the only other option is for the end user to run a large set of test configurations on each device while in system to map the defects. Writing that set of test configurations requires a knowledge of the device at a detail that is not available publicly, or getting ahold of the Xilinx test configurations, and expanding on them to obtain fault isolation. I'm not sure you realize the number of routing permutations that need to be run just to get fault coverage of all the routing, switchboxes, LUTs, etc in the device, and much less achieve fault isolation. Your posts regarding that seem to support this observation. > With RC there is an operating system, complete with disk based > filesystem. The intent is to do fast (VERY FAST) place and route on the > fly. > Now see, that is the fly in the ointment. The piece that is missing is the "very fast place and route". There is and has been a lot of research into improving place and route, but the fact of the matter is that in order to get performance that will make the FPGA compete favorably against a microprocessor is going to require a fast time to completion that is orders of magnitude faster than what we have now without giving up much in the way of performance. Sure, I can slow a clock down (by bin steps or using a programmable clock) to match the clock to the timing analysis for the current design, but that doesn't help you much for many real-world problems where you have a set time to complete the task. (yes, I know that may RC apps are not explicitly time constrained, but they do have to finish enough ahead of other approaches to make them economically justifiable). Remember also, that the RC FPGA starts out with a sizable handicap against a microprocessor with the time to load a configuration, plus if the configuration is generated on the fly the time to perform place and route. Once that hurdle is crossed, you still need enough of a performance boost over the microprocessor to amortize that set-up cost over the processing interval to come out ahead. Obviously, you gain from the parallelism in the FPGA, but if you don't also mind the performance angle, it is quite easy to wind up with designs that can only be clocked at a few tens of MHz, and often that use up so much area that you don't have room for enough parallelism to make up for the much lower clock rate. So that puts the dynamically configured RC in a box, where problems that aren't repetitive and complex enough to overcome the PAR and configuration times are better done on a microprocessor, and problems that take long enough to make the PAR time insignificant may be better served by a more optimized design than what has been discussed, and we're talking not only about PAR results, but also architecturally optimizing the design to get the highest clock rates and density. In my experience, FPGAs can do roughly 100x the performance of similar generation microprocessors, give or take an order of magnitude depending on the exact application and provided the FPGA design is done well. It is very easy to lose the advantage by sub-optimal design. If I had a dollar for every time I've gotten remarks that 100x performance is not possible, or that so and so did an FPGA design expecting only 10x and it turned out slower than a microprocessor because it wouldn't meet timing etc, I'd be retired. I guess I owe you an apology for merging your separate projects. I was under the impression (and glancing back over your posts still can interpret it this way) that these different topics were all addressing facets of the same RC project. I assumed (apparently erroneously) that this was all towards the same RC system. I also apologize for the insults, as I didn't mean to insult you or mock you, rather I was trying to point out that, taking all your posts together that I thought you were trying to hit all the corners of the design space at once, and at the same time do it on the cheap with defect ridden parts. I am still not convinced you aren't trying to hit everything at once....you know that old good, fast, cheap, pick any two thing. Rereading my post, I see that I let my tone get out of hand, and for that I ask your forgiveness. In any event, truely dynamic RC remains a tough nut to crack because of the PAR and configuration time issues. By adding the desire to use defect ridden parts, you are only making an already tough job much harder. I respectfully suggest you try first to get the system together using perfect FPGAs, as I believe you will find you already have an enormous task in front of you between the HLL to gates, the need for fast PAR, partitioning the problem over multiple FPGAs and between FPGAs and software, making a usable user interface and libraries etc, without exponentially compounding the problem by throwing defect tolerance into the mix. Baby steps are necessary to get through something as complex as this.Article: 99167
Ray Andraka wrote: <snip> > In my experience, FPGAs can > do roughly 100x the performance of similar generation microprocessors, > give or take an order of magnitude depending on the exact application > and provided the FPGA design is done well. It is very easy to lose the > advantage by sub-optimal design. If I had a dollar for every time I've > gotten remarks that 100x performance is not possible, or that so and so > did an FPGA design expecting only 10x and it turned out slower than a > microprocessor because it wouldn't meet timing etc, I'd be retired. How does a FPGA compare with something like the cell processor ? I'd have thought that for reconfig computing, something like an array of CELLS, with FPGA bridge fabric, would be a more productive target for RC. FPGAs are great at distributed fabric, but not that good at memory bandwidth, especially at bandwidth/$. DSP task can target FPGAs OK, because the datasets are relatively small. Wasn't it Seymour Cray whot found that IO and Memory bandwidths were the key, not the raw CPU grunt ? -jgArticle: 99168
Allan - why not use a down-counter and load it with 2e14, rather than doing a pipelined 2e14 decode?Article: 99169
In article <1142889836.685207.307560@g10g2000cwb.googlegroups.com>, <burn.sir@gmail.com> wrote: >G=F6ran Bilski wrote: >> My bible on CPU design is "Computer Architecture, A Quantitative Approach= >"=2E >> >> I never stop reading it. >> >> G=F6ran Bilski > > >Hello G=F6ran and Ziggy, and thanks for your replies. > >G=F6ran: I have the book right here on my desk and it is great. However, >I was looking for something more hands on. You know, more code & >algorithms and less statistics :) Something like a grad level textbook. I just bought "CPU DESIGN - Answers to Frequently Asked Questions" by Chandra M.R. Thimmannagari (ISBN 0-387-23799-2). It's kind of a dense quick reference guide to every interesting topic in modern CPU design. It covers everything from architecture to verilog to circuit design to improving timing to test-benches to physical layout. I like the low level details. For example, "9. Describe with an example a Picker Logic associated with an Issue Queue in an Out-Of-Order Processor?" Between this and the previous thread about multi-write port memory, I'm ready to write my superscalor PIC :-) -- /* jhallen@world.std.com (192.74.137.5) */ /* Joseph H. Allen */ int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0) +r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2 ]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}Article: 99170
On 20 Mar 2006 21:35:38 -0800, "PeterC" <peter@geckoaudio.com> wrote: > >Allan - why not use a down-counter and load it with 2e14, rather than >doing a pipelined 2e14 decode? It's not a counter, it's an accumulator. With a down counting accumulator, one would need to decode the underflow then load with 2e14 minus the current frequency input value. I think this has the same complexity as the up counting accumulator. Besides, the OP stated that exact 1uHz resolution was not required, so it's cheaper and simpler to avoid decoding altogether and use the full 48 bit binary range. I agree with your sentiments though - it is sometimes eassier to implement a down counter than an up counter. Regards, AllanArticle: 99171
fpga_toys@yahoo.com wrote: > > Emily Postnews ... this is sarcasm: > > Q: Another poster can't spell worth a damn. What should I post? > > A: Post a followup pointing out all the original author's spelling and > grammar mistakes. You were almost certainly the only one to notice > them, genius that you are, so not only will others be intrigued at your > spelling flame, but they'll get to read such fine entertainments rather > than any actual addressing of the facts or issues in the message. Mr. John Bass, As this is a thinly veiled reference to my last post, I am going to treat it as directed at me, personnally. My response is almost certainly ill advised, as I fear you are going to demonstrate to me and a lot of other witnesses very shortly. Einstein's definition of insanity applies to me directly, for as I write, I do so with the expectation of a different result than all of those that have come before me by acknowledging, and not agreeing with, you. Your post was actually almost funny. You got the format of the question and answer right, so either you know the reference, or you allowed google to be your friend and got bored before actually learning anything. Unfortunately, it once again demonstrates that you don't quite understand. You were able to focus in on a detail and understand it well enough to make the joke, but you have in the post, like many of your other posts, completely missed the bigger picture. The bigger picture is this: you are a guest in this community. If you actually understood the underlying concept about which you made your joke, you would have been behaving much better leading up to this. As a guest in any community proper etiquette is that as the community defines it, not that as the guest does. Mark Twain's "Innocents Abroad" is my reading suggestion for you this week. It is the first account of what many today have come to call the "ugly american". (as I am American, I am entitled to go here, flame away) The amusement comes from the characters going to far away lands and expecting it to be like where they came from and complaining about it not being so. Like many fellow americans that I have experienced abroad, your behavior in this group reminds me very much of this book. You come here, as a guest in this foreign land, and you expect our community to bow to your demands. We as a community ask that people make an effort to write in proper english, making a best effort to use proper spelling and grammar, be considerate of others, keep commercial postings to a minimum, etc. We have invested (many far more than I) in keeping this the community that we have wanted it to become. This community existed for ten years before you got here. You proudly profess that you like to play the role of devil's advocate in your posts. But you seem to lack sufficient understanding of the underlying concepts to do this effectively, i.e. your argument that insisting on formal spelling and grammar (as best as any poster is able) is exclusionary to those who don't have english as a first language. Slang and idiomatic language is what comes last before fluency is achieved. If anything, insisting on the minimization of slang and abbreviations helps those who might have to use dictionaries and other translation aids greatly. Try putting your "techno speak", as you like to call it, through babelfish, and see what kind of crap comes out the other side. Many of us in this forum know each other personnally. My company, Ray's, and a number of others', that participate here, are all in the Alliance partnership with Xilinx. The cumulative experience here is huge. We will defend our turf from those that do not show it the respect we ask for. Ray's posts are always informative and well thought through. Last time I spoke with Ray on the telephone I was teasing him about having free time on his hands, that he must have been between projects. He asked how I knew. I replied, simple, whenever you have the time for a handfull of posts in an afternoon, I know it is a safe time to call and say hello. The funny thing about this is when I see that you have the time to post almost 300 times in this group in 60 days, I have to wonder if you have any idea just how much time it is going to take you to complete your ground breaking work in reconfigurable computing. If you did, you would be spending this time actually working on this, or whatever you actually get paid to do, rather than trying to convince us just how smart and benevolent you are. Not only do you lack the bigger picture understanding with much of the subject material that you chose to write on, you also seem to lack an understanding of who you are actually communicating with. So when you, as you did this afternoon, rip someone like Ray a new one, it sends an even more significant message: you not only don't have enough respect for this forum, you don't know enough about the subject matter to know who some of the strongest minds in the field are. I have read as you have been given sensible information by many in this group that reflects years of practical experience with FPGAs, silicon fabrication and test, partial reconfiguration and place and route, and stood back in utter disbelief as you couldn't be bothered to digest this information and allow it to sculpt your view in the least. You simply defend your original view, or one that is simply contrary to the masses, and keep arguing. Mr. John Bass, fpga_toys, you are what is wrong with our community. Welcome to my witch trial. This forum existed for ten years before you got here, and it will exist for many more after you get bored and leave to torment another community. Actually, as you seem to be a rather intelligent individual, that might be able to become a member of this community I respectfully ask that you drop the devil's advocate role and leave it for those that have spent more time in our community and more than a few days in our industry. You might actually discover that what you want to do with reconfigurable computing is possible in ways that you have not imagined. You have been so busy going off on the ways that this is not possible given the current framework of the tools, the licensing, the evil vendors that focus on embedded markets, blah, blah, blah... you have missed the bigger picture that what you want to do is a whole lot more similar to the current use of the devices than you realize. Xilinx wants to sell chips. Xilinx has all kinds of programs that help outside companies that are consumers of Xilinx chips or offer companion products or services that help sell its chips. Xilinx refers to the three hundred or so companies in the Alliance partnership as the "xilinx eco-system". Xilinx has a venture fund that was created to fund this eco-system, and it is huge. At least a half dozen presidents of companies in this eco-system, and at least as many influential people at Xilinx frequent this group. Just a suggestion, it will probably require less time as the devil's advocate, and less time evangelizing, to actually get any positive attention from those that could help you if they wanted to. This post is the extent of the help that I am likely to offer you, as you have royally pissed me off. Maybe some of the others will eventually be more charitable. Regards, Erik. --- Erik Widding President Birger Engineering, Inc. (mail) 100 Boylston St #1070; Boston, MA 02116 (voice) 617.695.9233 x207 (fax) 617.695.9234 (web) http://www.birger.com .Article: 99172
Ray Andraka wrote: > John, last time I checked, FPGAs did not get delivered from Xilinx with > the config prom. Sure, you can store a defect map on the config prom, > or on your disk drive, or battery backed sram or whatever, but the point > is that defect map has to get into your system somehow. Earlier in this > thread you were asking/begging Xilinx to provide the defect map, even if > just to one of 16 quadrants for each non-zero-defect part delivered. > That leads to the administration nightmare I was talking about. Since NOTHING exists today, I've offered several IDEAS, including the board mfg taking responsiblity for the testing and including it to the end user .... as well as being able to do the testing at the end user using a variety of options including triple redundancy and scrubbing. Multiple ideas have been presented so provide options and room for discussion. Maybe you missed that. Not discussed was a proposal that the FPGA vendor could provide maybe subquadrant level defect bin sorting .... which could be transmitted via markings on the package, or by order selection, or even by using 4 balls on the package to specify the subquadrant. For someone interested in finding solutions, there is generally the intellectual capacity to connect the dots and finish a proposal with alternate ideas. For someone being obstructionist, there are no end to the objections that can be raised. I'm not sure you realize > the number of routing permutations that need to be run just to get fault > coverage of all the routing, switchboxes, LUTs, etc in the device, and > much less achieve fault isolation. Your posts regarding that seem to > support this observation. I'm not sure that you understand, where there is a will, it certainly can and will be done. After all, when it comes to routers for FPGA's there are many independent implementations .... it's not a Christ delivered on the mount technology for software guys to do these things. > > With RC there is an operating system, complete with disk based > > filesystem. The intent is to do fast (VERY FAST) place and route on the > > fly. > > > > but the fact of the matter is > that in order to get performance that will make the FPGA compete > favorably against a microprocessor is going to require a fast time to > completion that is orders of magnitude faster than what we have now > without giving up much in the way of performance. Ray, the problem is that you clearly have lost sight that sometimes the expensive and critical resource to optimize for is people. Sometimes it's the machine. > I know that may RC apps are not explicitly time > constrained, but they do have to finish enough ahead of other approaches > to make them economically justifiable). Ray .... stop lecturing ... I understand, and you are worried about YOUR problems here, and clearly lack the mind reading ability to understand everything from where I am coming or going. There are a set of problems, very similar to DSP filters, which are VERY parallel and scale very nicely in FPGA's. For those problems, FPGA's are a couple orders of magnitude faster. Other's, that are truely sequential with limited parallelism, are much better done on a traditional ISA. It's useful to mate an FPGA system, with a complementary traditional CPU. This is true in each of the prototypes I built in the first couple years of my research. More reciently I've also looked at FPGA centric designs for a different class of problems. > Remember also, that the RC FPGA > starts out with a sizable handicap against a microprocessor with the > time to load a configuration, plus if the configuration is generated on > the fly the time to perform place and route. Once that hurdle is > crossed, you still need enough of a performance boost over the > microprocessor to amortize that set-up cost over the processing interval > to come out ahead. Obviously, you gain from the parallelism in the > FPGA, but if you don't also mind the performance angle, it is quite easy > to wind up with designs that can only be clocked at a few tens of MHz, > and often that use up so much area that you don't have room for enough > parallelism to make up for the much lower clock rate. So? what's the point .... most of these applications run for hours, even days. I would like a future generation FPGA that has parallel memory like access to the configuration space with high bandwidth ... that is not today, and I've said so. You are lecturing again, totally clueless about the issues I've considered over the last 5 years, the architectures I've exported, the applications I find interesting, or even what I have long term intent for. There are a lot of things I will not discuss without a purchase order and under NDA. > So that puts the > dynamically configured RC in a box, where problems that aren't > repetitive and complex enough to overcome the PAR and configuration > times are better done on a microprocessor, and problems that take long > enough to make the PAR time insignificant may be better served by a more > optimized design than what has been discussed, and we're talking not > only about PAR results, but also architecturally optimizing the design > to get the highest clock rates and density. So, what's your point? Don't think I've gone down that path? .... there is a big reason I want ADB and the related interfaces that were done for JHDLBits and several other university projects. Your obsession with "highest clock rates" leaves you totally blind to other tradeoffs. > In my experience, FPGAs can > do roughly 100x the performance of similar generation microprocessors, > give or take an order of magnitude depending on the exact application > and provided the FPGA design is done well. It is very easy to lose the > advantage by sub-optimal design. If I had a dollar for every time I've > gotten remarks that 100x performance is not possible, or that so and so > did an FPGA design expecting only 10x and it turned out slower than a > microprocessor because it wouldn't meet timing etc, I'd be retired. With hand layout, I've done certain very small test kernels which replicated to fill a dozen 2V6000's pull three orders of magnitude over the reference SMP cluster for some important applications I wish to target ... you don't get to a design that can reach petaflops by being conservative, which is my goal. I've used live tests on the DIni boards confirm the basic processing rate and data transfers between packages, for a number of benchmarks and test kernels, and they seem to scale at this point. I've also done similar numbers with a 2V6000 array. Later this year my goal is to get a few hundred LX200's, and see if the scaling predictions are where I expect. So, I agree, or I wouldn't be doing this. > Rereading my post, I > see that I let my tone get out of hand, and for that I ask your forgiveness. Accepted. And I do have nearly six different competitive market requirements to either fill concurrently, or with overlapping sollutions. It is, six projects at a time at this point, and will later settle into several clearly defined roles/solutions. > In any event, truely dynamic RC remains a tough nut to crack because of > the PAR and configuration time issues. It's there in education project form .... getting the IP released, or redoing it is a necessary part of optimizing the human element for programming and testing. Production, is another set of problems and solutions. > By adding the desire to use > defect ridden parts, you are only making an already tough job much > harder. Actually, I do not believe so. I'm 75% systems software engineer and 25% hardware designer, and very good at problem definition and architecture issues. I've spent 35 years knocking off man year plus software projects by myself in 3-4 months, and 5-8 man year projects with a small team of 5-7 in similar time frames with a VERY strong KISS discipline. I see defect parts as a gold mine that brings volums up, and prices down to make RC systems very competitive for general work, as well as highly optimized work where they will shine big time. I'm used to designing for defect management ... in disks, in memories, and do not see this as ANY concern. > I respectfully suggest you try first to get the system together > using perfect FPGAs, I've built several, and have several more ready to fab. > as I believe you will find you already have an > enormous task in front of you between the HLL to gates, FpgaC I've been using just over 2-1/2 years, even with it's current faults which impact density by between 2-20%. Enough to know where it needs to go, and have that road map in place. There is a slowly growing user base and developer group for that project. The project will mature during 2006, and in some ways I've yet to talk about. > the need for fast PAR, This is a deal breaker, and why I've put my head up after a couple years and started pushing when JHDLBits with ADB was not released. There is similar code in several other sources that will take more work. I've a good handle on that. > partitioning the problem over multiple FPGAs and between FPGAs > and software, making a usable user interface and libraries etc, without > exponentially compounding the problem by throwing defect tolerance into > the mix. Baby steps are necessary to get through something as complex > as this. I've done systems level design for 35 years ... operating systems, drivers, diagnostics, hardware design, and large applications. I do everything with baby steps and KISS, but by tacking the tough problems as early in a design as possible for risk management. Again ... defect mangement may be scarry to you, because of how it impacts YOUR projects, in this project it is NOT a problem. Reserving defect resources is very similar to having the same resource already allocated. OK?Article: 99173
Jim Granville wrote: > FPGAs are great at distributed fabric, but not that good at memory > bandwidth, especially at bandwidth/$. A traditional multiprocessor shares one more moderately wide dram systems, which are inheriently sequential from a performance perspective, even when shared/interleaved. Caches for some applications create N memory systems, but also can become even a worse bottleneck. The basic building block with FPGAs is lots of 16x1 memories with a FF ... with FULL PARALLELISM. The trick, is to avoid serialization with FSM's, and bulk memories (such as BRAM and external memories) which are serial. DSP and numerical applications are very similar data flow problems. Ditto for certain classes of streaming problems, including wire speed network servers.Article: 99174
I'm currently using the Xilinx ML310 development board. Anybody knows how to generate ace file for designs with dual ppc405 core? Thank you.
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z