Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
if your synthesis tool (e.g. amplify) supports, u can precisely control the pipelining levels and style of the multiplier inferred by * operator. regards --ykaArticle: 61351
Hi I am using the EDK 3.2 to build a MicroBlaze-design with an ethernet-MAC attached. When synthesizing my design it gives me a warning: "Emac Licensing in effect". My question is, how long (hours) can I evaluate the core before the FPGA needs to be reloaded? Best Regards DonArticle: 61352
Looks like you have an equivalent serial rate of 320 MHz, plus some sort of framing overhead so that you can find bit/byte boundries. You can certainly do this with an electrical interface over 10m and save a lot of money over optical. What time frame are you looking at to implement the solution? Reason I ask is that a small company called Core Foundry is finalizing a new FPGA development system called PROTEUS. Their intent is to create a modular FPGA development platform that can be customized to an application simply by changing removable plug-in modules. The first incarnation is a small aluminum box, about 7"w x 9"d x 1.5"h that has room for 4 I/O modules out the front. I've seen modules for dual T1/E1 (so up to 8 ports can fit in a box), a T3/E3/STS-1 single port electrical interface, and a multirate SFP module with CDR that can handle 30Mbps through 3.2 Gbps. This might be the module you care about as it will accept any SFP plug-in, including the copper GbE SFP modules from Molex and others. The I/O modules plug into a main board that has a simple serial interface on it to the outside world. Behind the I/O modules are connectors to accept FPGA boards. A single-wide board connects one I/O module to one FPGA. A double-wide board connects one FPGA to two I/O modules, and a quad-wide board connects one FPGA to 4 I/O modules. Behind that are connectors for a more powerful microcontroller board that is optional. This will have the 10/100, USB and more serial ports on it. The first uC board available will be based on soft FPGA cores as opposed to hard uC chips. The initial FPGA boards will be Cyclones by Altera, as they are readily available. All parts sizes will be available. Spartan 3 boards will most likely follow next year whenever they are production qualified` and readily available. Beyond that they've talked about plans for Stratix and Virtex FPGA boards, and perhaps some non-FPGA boards with DSP chips, or maybe combinations of DSP and FPGA. For your situation, a single SFP I/O module and a single Cyclone 1C3 FPGA module should suffice. Probably wouldn't need the uC board at all. The FPGA board has ribbon cable connectors on top so that you could cable into it (with the box lid removed). Then serialize and format the data and shoot it out the SFP optical port (or maybe an SFP GbE port, but you'd have to do more work on the data up-front to make the transcievers happy). Mapping your data into an OC-12 payload would be trivial, on the other hand. The PROTEUS system is in hardware testing now, and I don't know when it will become available, or at what price points. They don't have any info posted on their website yet either, although they say that info will be posted in October. I'm looking to be an early customer myself. I'm not a wireless expert by any stretch, but Infineon makes a nice looking BT module, ROK104001, that looks like it would make BT simple. I think in small qty they are $20 or so. I heard about it from a local Insight sales rep. "Patrick Twomey" <patrickt@rennes.ucc.ie> wrote in message news:1d183274.0309300805.472f07a1@posting.google.com... > Thank for replying to my post. To answer your first question I want to > mate an optical or wireless communication interface to the Celoxica > RC100 boards. > Set up so far is as follows: > > Camera -> Celoxica Board -> Ribbon Cable -> Celoxica Board -> Monitor > > The Ribbon cable is connected to the Celoxica boards using the expansion > header on the celoxica boards. This expansion header allows digital > communition in and out of the FPGA on the Celoxica board. The data on > the ribbon cable ha a bus width of 32 (i.e. is 32 bits wide) and and the > data changes every 100 ns (10 MHz). All I want to do is remove the > ribbon cable and replace it with an optical or wireless communication > system (preferably a wireless system). So system would be: > > Camera -> Celoxica Board -> Wireless transmitter -> Receiver -> Celoxica > board -> Monitor > > One board and the transmitter would be at one end of a room, the other > board and receiver at other end of room. The transmission range is to be > small e.g. max of 5-8 meters. The data rate is fairly high so not sure > if a wireless system would be up to the task. Hope this has clarified my > situation. > > > "Patrick MacGregor" <patrickmacgregor@comcast.net> wrote in message news:<Vc6dncJWbLocWOWiXTWJhg@comcast.com>... > > > > Can you explain a bit more? Are you planning on looking to replace the > > > > Celoxica boards with something else, or do you want to mate the Celoxica > > > > boards to some optical or wireless transmission system? If so, how would > > > > you want to transfer data to/from the optical or wireless interface boards? > > > > > > > > > > > > "Patrick Twomey" <patrickt@rennes.ucc.ie> wrote in message > > > > news:1d183274.0309290336.5aa14a7c@posting.google.com... > > > > > I am trying to connect two FPGA development boards together. The boards in > > > > > question are two Celoxica RC100 development boards. Video in is from an > > > > analog > > > > > camera. The video data is converted to digital and stored on > > > > > SRAM. There is an expansion header for inter-connectivity. On the other > > > > board > > > > > video out to a monitor occurs after reading data from the SRAM on this > > > > board. > > > > > Have connected to two boards via a ribbon cable connected to the expansion > > > > > headers. Want to replace this cable with wireless or optical transmission. > > > > Is > > > > > there any development boards available for this. The pixel clock is 10 MHz > > > > and > > > > > there are at least 16 bits per pixel (32 aftere error correction > > > > encoding). > > > > > Access to a 80 MHz on board clock is > > > > > available. Any help would be much appreciated.Article: 61353
Stephan Buchholz wrote: > Laurent > > The Philps Semiconductor GTL2010 10-bit bi-directional low voltage > translator might help > Steve Buchholz > "Amontec Team, Laurent Gauch" <laurent.gauch@amontecDELETEALLCAPS.com> wrote > in message news:3f7ac650$1@news.vsnet.ch... > >>Hi all, >> >>Sorry to ask about analog question, but it 's relative to FPGA too. >> >>Do you know a schematic to do 'Automatic I/O voltage sensing' as XILINX >>does with the ParallelCable IV. >> >>I am designing a new JTAG interface (USB), and I want to be able to >>drive correctly the target JTAG signals (3.3V, 2.5V, 1.8V, 1.2V). >> >>Are there any lvttl level shifter device to do this work? >> >>Thanks for your advice. >> >>Laurent Gauch >>www.amontec.com >> > > > Thanks, You confirm what I'm thinking to use. For your info, GTL2010 is corresponding with TVC family from Texas. I will try with this ! Laurent -> www.amontec.comArticle: 61354
> I still like Aldec for design entry. Editor is very much studio editor > like, plus you can run sims right there as well as integrate in the rest > of your tool flow. For the price, I think it is a great value. Heh, I appretiate Aldec cos it is not studio-like as opposed to Xilinx's WebPack. And more feature rich and its level ov integration of different tools (like jumping to errorous code line and more).Article: 61355
I am looking at using CUPL to implement some simple-ish functions in the Atmel ATF150x series parts. I have WinCUPL, and its manual, but I feel like there's something missing in the documentation: I've found some use of a "property" directive in CUPL, which seems to be used to switch device/vendor-specific functions - things like (for Atmel) pin-keeper, JTAG on/off, etc. So, two questions: First, I don't find a "PROPERTY" directive in the WinCUPL manual. Am I missing it, or is there some more complete CUPL language reference available? Second, it seems like the available "properties" and syntax for the "property" directive would need to be identified for each device. The functions are described in the spec sheets, but I find no information on how to activate them with CUPL. It seems like I'm missing a layer of documentation. Any pointers would be appreciated. TIA, George -----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World! -----== Over 100,000 Newsgroups - 19 Different Servers! =-----Article: 61356
Never mind. Once again, as soon as I ask the question, I find the answer. It's in the fitter help, and Atmel's FAQ. -----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World! -----== Over 100,000 Newsgroups - 19 Different Servers! =-----Article: 61357
Have a look at http://www.ultraedit.com --- jakabArticle: 61358
On Thu, 2 Oct 2003 09:07:16 -0400, "jakab tanko" <jtanko@ics-ltd.com> wrote: >Have a look at http://www.ultraedit.com >--- >jakab > I use UltraEdit, too. I also run emacs Verilog mode from inside UltraEdit, in batch mode, to automatically generate port lists, etc. And I store commonly-used Verilog code snippets as UltraEdit templates. Verilog (and VHDL) syntax coloring files are available on the UltraEdit web site. I also have a Xilinx UCF syntax coloring file; if anyone needs it, just send e-mail to bp <at> cambriandesign <dot> com. Bob Perlman Cambrian Design WorksArticle: 61359
Brian, Excellent list. But I have one correction, the capacitance to ground is ~ 8pf, thus the differential capacitance is 4 pf (two 8 pf in series). Unfortunately, to meet ESD, and have the IOB also do the other 35 standards, the capacitance is not as low as everyone would like. Simulations at the die, however, show a very nice waveform, even though it may look questionable at the pins of the device (due to the t-line effects). Nothing beats an on die 100 ohm termination. LVDS_25_DCI was never intended to replace a simple 100 ohm external termination. That was reserved for the improved input terminator (a simple 100 ohms) that was added to Virtex 2 Pro. It was also an afterthought, that was suggested to us by a customer, when they messed up, and forgot all the resistors. It is VERY ugly in the power department, and we did not realize that the power could be as high as ~85 mW per pair due to the way the DCI circuit operates. Also, freezing DCI does mean that you might be trying to measure the 25 ohm termination voltage with the reference resistors, so the current in them does increase, too. If I may suggest, use LVDCI_25_DCI only for clock inputs, or a few signals. Always use DCI_Freeze to reduce the jitter. Also look at what happens when you do not have a 100 ohm termination. For some signals, and lengths of pcb, it may not be required. And we will check out the IBIS model issue. As for allowing the power estimator, spreadsheet, answers, etc. to all catch up with all of the "top ten" list: that is just tough to do, but you are right, we should do it (and will). Spartan 3 addresses a different market than Virtex II, or II Pro, and was never intended to replace them. We reserve the right to differentiate product lines by having different features. I am sure everyone would like to have a Spartan 3 that could replace a Virtex II or II Pro, but that was a) not the market we were after, and b) not possible with the process/design/technology we chose. The Spartan folks are busily planning and designing their next chip(s), and we in the Virtex camp are busy with our next product offering. Thanks for your comments, Austin Brian Davis wrote: > Top Ten Things I wish I never had needed to learn about LVDS_25_DCI: > > 1) Parallel DCI input standards in Virtex2 continuously modulate > the input termination offset voltage unless you enable bitgen's > FreezeDCI option > > 2) With FreezeDCI on, the entire bottom half of 2V40, 2V80, and > any CS144 packages are unavailable for LVDS_25_DCI inputs (this > includes half the global clock inputs to the chip) due to DCI > unavailability in banks having only ALT_VRP/N pins > > 3) With FreezeDCI on, dual purpose config pins cannot be used as > LVDS_25_DCI inputs > > 4) 5.2i S/W doesn't catch illegal pin assignments due to #2 and #3 > > 5) With FreezeDCI on, input terminator accuracy for 2R values > degrades to +/-20% > > 6) With FreezeDCI on, each bank will have a (different) random > input offset voltage due to split terminator 2R variations > > 7) LVDS_25_DCI terminator overhead power per input pair far exceeds > the theoretical 62.5 mW number published in Answer Record 15633 > > 8) With FreezeDCI on, worst case VCCO power overhead per > LVDS_25_DCI input pair approaches 100 mW > > 9) With FreezeDCI on, worst case DCI VRP/N VCCO power overhead > per I/O bank approaches 200 mW > > 10) 5.2i Xpower incorrectly assigns DCI power to the 1.5V VCCINT > supply, and it doesn't use the worst case DCI power numbers > > 11) V2 Power Estimator spreadsheet doesn't support LVDS_25_DCI, > but if you fake it by using two single ended DCI 2R split > terminated inputs per actual LVDS pair, it also uses the > wildly optimistic power numbers > > 12) LVDS_25_DCI IBIS models don't work in HyperLynx > > 13) Massive 8pf IBIS C_COMP input capacitance value for the > V2 LVDS inputs requires external back termination and/or > input matching scheme to achieve reasonable signaling when > driving FPGA inputs from a modern high speed LVDS driver > > Interesting Answer Database Search Keywords: > > FreezeDCI > LVDS AND DCI AND termination > DCI AND power > IBIS AND Hyperlynx ( in answer archive ) > > Suggestions to Xilinx: > > - Have somebody document the plethora of V2 DCI hardware > and software problems ('challenges'? 'features'?) in one > place ( a detailed application note? ) ASAP. > > - Hiding the FPGA IOB/CLB/FF/interconnect power consumption > numbers within an encrypted spreadsheet and buggy SW makes > it impossible to cross-check the resulting power calculations. > > - Please take a look at page 145 of the ORCA-4 datasheet > ("Package Parasitics"): there, in human readable form, is a > usable package model that can be simulated in any SPICE. > > - Also note that the ORCA-4 IBIS C_COMP value for the general > purpose LVDS inputs is a much more reasonable 2 pf. > > - Real differential LVDS input terminators are quite wonderful > (no VCCO power hit, no split terminator DC offset problems). > > Making them available (LXXX_25_DT) only in the V2Pro, and > not in the Spartan3, is an exceptionally HUGE mistake. > > > BrianArticle: 61360
Tom Seim wrote: > > Xilinx's marketing is about as bad as it gets. Frankly, I'm surprised > that they are the largest FPGA vendor. I have had bad experiences with > them in the (far) past. In particular, when they changed vendors for > the serial proms. They cut off the old vendor with the (wishful) > thinking that the new one would take over. Well, the new one choked > big time and us users were left holding the bag. At the time I was > running my own company and desparately needed those parts. Good > luck!!! I was F**KED!!! Peter took exception the last time I mentioned > this. In private e-mail I reminded him that if Xilinx doesn't ship > product he still collects his pay check - as a private business owner > if I didn't ship product the revenue stopped. > > My latest run in with brand X shouldn't have happened. I thought I was > doing them a favor by ordering a license renewal for $4K. Guess what? > XILINX SCREWED UP!!! We have a year end deadline (Sep 25); did Xilinx > care? NO!!!! Only by Hurcelean effort did I mananage to get the order > placed (after I started a week and a half before the deadline). I got > an apology from them. But SO WHAT!! > > I think they have gotten full of themselves and don't really care. > They know us suckers have to deal with them no matter what. Well, > maybe we do. Doesn't make me feel any better. There are a great many aspects of this line of work that put the small business owner at a great disadvantage. Allocation is one of the big ones. Right now everyone is trying to get my business even though it is not very large at this time. But as soon as the market starts growing again I am sure I will be back at the bottom of the "call" list. I won't say your experiences are unique, but I don't think Xilinx is in the habit of ignoring their customers either. But I do agree that the growth of a company makes it much harder to do business with in an efficient way. In that regard, Xilinx is no exception. A larger company has the option of redesigning a product with an alternate FPGA if a vendor switches to the "dark side". But the small company with lower volumes does not have that luxury until the problem becomes untenable. Even then schedules may preclude such a change. In those cases, the small business is just SOL. That is why all new boards use as few parts as possible that can not be replaced. I much prefer to not use serial proms of any kind and like to keep the FPGAs as generic as possible. The Xilinx parts would have had an advantage on our new board, but they are not supporting modular configuration on the Spartan 3s and so they are the same to us as the Cyclone chips at this point. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAXArticle: 61361
Hi, could somebody suggest hi-performance workstation I can use to compile my designs in a fastest possible way? I use Altera Quartus III and EP1C12Q240 device. It 90% full with most of the memory used. My Athlon XP 1800+ on ASUS A7V266-C motherboard with 512M of RAM takes 8/23 minutes to compile/fit. I did try dual Xeon 2200 on TYAN MB and it did only 5% faster.. Did somebody try Athlon 64-bit processor? -- DennisArticle: 61362
> >Could be a typing error, I already found one. In ECS 6.1i (the schematic > >editor), if you put the USELOWSKEWLINES=TRUE attribute on a signal, the > >resulting .vhf file contains a spelling error that keeps the design from > >synthesizing (SIGNAL is spelled "SIGANL"). > > Interesting. I wonder how that one got past testing. I'll bet it > gets tested now/tomorrow. > > Many years ago, a friend described a neat trick for testing software. > The idea was to make sure the each line of code was at > least exerecised in order to find the gross bugs. He started with > a clean listing. Install breakpoints. When you get there, mark > that line and the rest of the block, that is until the next > branch, skip, return or such. Eventually, you have marked off > all the easy stuff and now you have to generate test cases > to tickle the hard/obscure ones. > > This is just the software version of making sure your test bench > wiggles all the signals at least once. > If software testing would be that easy. Actually there are tools (code coverage tools) that are there for this very purpose: measure the percentage of blocks (lines) that were executed during a test. However it's not that easy at all to reach 100% coverage. You might need fault injection and other techniques, and even with that some SW constructs cannot be covered 100% (think about testing a SW for *all possible* out-of-memory conditions or for *all possible* floating-point exceptions). And in many cases the problem is not the branches that are there (and thus can be tested by code-coverage) but the ones that are missing: unchecked special conditions, return values, etc. Anyway, I'm sure you all know all this... This bug should have been found though. Regards, Andras TantosArticle: 61363
rickman <spamgoeshere4@yahoo.com> wrote in message news:<3F7B3F59.191901A4@yahoo.com>... > My experience has been that it does not much matter how you code > combinatorial logic like this. The tools run it through a grinder and > produce an optimal version (in its own mind). When I want to optimize > like this, I either use a "keep" attribute on the wire, or sometimes you > can instantiate primitives. For logic I don't think primitives work > since gates just get remapped. I overuse the syn_keep attribute and I hate the idea of instantiating LUTs. My Carnot skills aren't exactly used regularly. > But I still don't understand your code. Why does the outer loop range > over 64 values. I've had problems with bit ranges in the past where [i+4:i] is a complaint. Perhaps this isn't an issue with for loops but I've learned to avoid them in general logic. They do work fine in generate blocks, however. I stepped through every bit to make a comparison to the adjacent bit; 3 adjacent comparisons lumped into one variable (with an eventual syn_keep) would give me 4-input functions that should pack into LUTs. The complex end of the inside loop is so that the three "LUTs" per byte are 4-input, 4-input, and 3-input functions. > I would code two nested loops where the outer loop > ranges over the 8 outputs and the inner loop ranges over the 9 inputs > for each output. Or just skip the inner loop and use two outputs from > two sets of four inputs feeding a 3 input function and use keeps on the > first two output arrays. Maybe that is what you are doing, but I can't > figure out the code easily. > > I see you are incrementing the i variable by j and ranging j in the > second loop by some complex control expression. Can't you just > increment i by 8? > > for( i=0; i<64; i=i+8 ) begin > k = i % 8; > for( j=0; j<4; j=j+1 ) begin > runBitsA_[k] = runBitsA_[k] & bytePlus1[i+j]; > runBitsB_[k] = runBitsB_[k] & bytePlus1[i+j+4]; > end > runByte_[i] = runBitsA_[i] & runBitsB_[k] & bytePlus1[i+9]; > end > > Put the keep on runBitsA_ and runBitsB_ and you should get your two > level structure. This works very well for runs of ones only. I need to identify runs of ones or runs of zeros. The technique can be expanded to my needs resulting in runBitsA, B, and C where one of them needs to cover 2 comparisons, not 3 like the others. ...which is really is the approach I was coding but using consecutive bits in a vector rather than {A,B,C} and using the one statement rather than 3 to make the assignments, dealing with the 2 comparison exception by terminating the inside loop early. Thanks for the help.Article: 61364
"Vinh Pham" <a@a.a> wrote in message news:<XcSeb.39218$5z.21702@twister.socal.rr.com>... > Whoopsy, brain-fart. My previous code will create 3 levels of logic. If we > didn't have to detect both nine 1s or nine 0s, then it'd work okay. Thanks for noticing :-) I like the code below with respect to its symmetry - it's a lot easier to read than the stuff I generated. The four 3-input LUTs feed a single 4-input LUT with (only a) little arguement from the synthesizer, I'm sure. It can be done with fewer LUTs by using 4-input LUTs covering 3 compares each but then the symmetry gets lost and the coding gets unpleasant. I think I have an acceptable solution together that gives me good speed and good utilization which I'll post separately. Thanks for your thoughs with this. > Here's an idea for one that should generate 2 levels, but it looks uglier. > Definately not as compact as rickman's. > > > data[64:0] -- input signal > ninth_bit[7:0] -- intermediate signal > run_dibble[31:0] -- intermediate signal > run_byte[7:0] -- output signal > > > for byte 0...7 > > ninth_bit[byte] = data[(byte+1)*8] > > for dibble 0...3 > > lsb = byte*8 + dibble*2 > msb = byte*8 + dibble*2 + 1 > > if data[lsb] = ninth_bit[byte] AND data[msb] = ninth_bit[byte] then > run_dibble[byte*4 + dibble] = 1 > else > run_dibble[byte*4 + dibble] = 0 > end > > end loop > > lsb = byte*4 > msb = byte*4 + 3 > > if run_dibble[msb:lsb] = "1111" then > run_byte[byte] = 1 > else > run_byte[byte] = 0 > end > > end loopArticle: 61365
"Martin Euredjian" <0_0_0_0_@pacbell.net> wrote in message news:<FALeb.6677$fB4.1788@newssvr29.news.prodigy.com>... > John, > > 1- How many 65 bit words per second (ms, ns?) do you have to process? This run detection is one part of a 100MHz-200MHz mechanism. > 2- Where do the 65 bits come from? (internal, external) Internal, blindsided from BlockRAMs with a new value per cycle. > 3- Do they get into the FPGA in parallel or serially? Entirely parallel, into the BlockRAMs at full width. > 4- Why are you saying that you need two levels of logic? (trying to control > delay with combinatorial logic is not a great idea). If I go from BlockRAM to registers, I have the (relatively) long Tcko delay for the BlockRAM read and associated routing leaving little time to manipulate the data within the period. If I register the data from the BlockRAM, it's best to generate and use the run values in the next cycle requiring moe logic after I flag the runs, suggesting minimum delay is best. > 5- Why fight with inference? Instantiate what primitives you need. The logic primitives are what I've avoided. I don't want to use LUT4 primitives with INIT attributes since I might mess up the carnot map. This is why the inference has been broken down into bits that can be retained (with syn_keep or other method). > Two logic levels? > > Two LUT's to look at two consecutive nibbles. > One LUT to AND the output of the above with the next most significant bit > (the ninth bit). > That's it. Two levels. 24 LUT's. > Is that what you wanted? Almost. The LUTs can't look at full nibbles. Since I need to make sure all bits are equal to each other, there's a "smear." One attempt was to XOR adjacent bits, then to do the 8-wide AND of the result, letting the synth give me the "best" results. It didn't. Thinking about the XOR-to-AND progression, 4 bits are needed to implement 3 bits of the AND, so two 4-bit LUTs and one 3-bit LUT are needed, feeding a 3-input AND. That's it. Two levels. 32 LUTs. But the synth doesn't like my inferrences. I think I have a solution that "works."Article: 61366
For those who have the same problem, here is a reply from Xilinx tech support: ---------------------------------------------------------------------------- ---------- This is a recent issue that just came up. It looks like the "books" directory is not installed with the Webpack download. I've put one up on the ftp site. It goes in the %WebPACK%\doc\usenglish\ directory. Here is the link to the books.zip: ftp://customer:xilinx@ftp.xilinx.com/download/books.zip ---------------------------------------------------------------------------- ---------- The help browser though seems to be a different issue... /MikhailArticle: 61367
For anyone interested in how I got things together, I ended up using one generate loop to instantiate 8 MUXF5s. Why MUXF5s? 1) One can make an 8-input AND with 2 LUTs and a little extra delay by having the first 4-bit AND feed the select and the sel==0 input - if the AND is false the result is false, if the AND is true, the result is the other AND. 2) By using a primitive, the logic feeding the primitive's pin isn't optimized across the primitive. The synthesizer will produce a nice 2-level implementation for 5 compares but not 8 so splitting it up into 5 compares and 3, the MUXF5 used as an AND can give a nice balance of delays. Its slightly more than 2 LUTs of delay, but very slightly. The code looks cleaner and the implementation is tight. =============================================================== module testRun ( input clk , input [64:0] bytesPlus1 , output reg [ 7:0] runByte ); wire [ 7:0] runMux; wire [63:0] xnorBits = bytesPlus1[63:0] ^~ bytesPlus1[64:1]; // the result of a bit compare a==b is the same as a^~b genvar h; generate for( h=0; h<8; h=h+1 ) begin : run MUXF5 mux ( .O(runMux[h]), .S ( & xnorBits[h*8+2:h*8+0] ) , .I1( & xnorBits[h*8+7:h*8+3] ) , .I0( & xnorBits[h*8+2:h*8+0] ) ); end endgenerate always @(posedge clk) runByte <= runMux; endmoduleArticle: 61368
A simple and workable approach is to use one or more BRAMs as a LINE buffer and a LINE Z buffer, perhaps double buffered for simplicity. Then by sorting (and incrementally updating on the fly) your display list of graphics primitives by Y coordinate and then by X coordinate, you can iterate over them and render them into the line buffer. Works fine for Gouraud shaded filled primitives like trapezoids too. (Textures will require more memory ports, perhaps on-chip, perhaps not.) So long as you can render a line worth of graphics faster than you shift out the previously rendered line, you're looking good. No frame buffer, no high bandwidth frame buffer memory, no frame buffer memory I/Os. Just pretty raster graphics. With a soft CPU core to do display list management, and a simple hardware span-filler coprocessor on the interface to the line buffer, scenes of limited complexity seem quite doable in even a rather spartan FPGA. For more complexity and more performance, move the display list manager and span edge DDAs to hardware. See also my 1995 article on an FPGA-based rendering coprocessor: http://fpgacpu.org/usenet/render.html. Jan Gray, Gray Research LLCArticle: 61369
In article <3f7af52e_1@newsfeed.slurp.net>, DK <dknews@ueidaq.com> wrote: >could somebody suggest hi-performance workstation I can use to compile my >designs in a fastest possible way? > >I use Altera Quartus III and EP1C12Q240 device. It 90% full with most of the >memory used. My Athlon XP 1800+ on ASUS A7V266-C motherboard with 512M of >RAM takes 8/23 minutes to compile/fit. I did try dual Xeon 2200 on TYAN MB >and it did only 5% faster.. If it doesn't swap, getting the latest is a small but substantial speed increase, but you aren't going to see an order of magnitude faster anytime soon. One of the problems is just that many of the techniques are memory latency bound, and memory latency is not getting better. Others are cache bound, and the Athlons have pretty good cache, but cache is still a bottleneck. AFAIK, none of the programs are yet dual-processor or SMT optimized, if they were, a dual SMT P4 machine would be good, but as I said, not currently. >Did somebody try Athlon 64-bit processor? The 64 bit athlon's real improvement is going to be on address space size, which will matter on the largest designs, not on performance. -- Nicholas C. Weaver nweaver@cs.berkeley.eduArticle: 61370
Hi, Why not use the carry-chain? You can do any kind of detection on that primitive and it will save you LUTs Göran John_H wrote: >For anyone interested in how I got things together, I ended up using >one generate loop to instantiate 8 MUXF5s. Why MUXF5s? > >1) One can make an 8-input AND with 2 LUTs and a little extra delay by >having the first 4-bit AND feed the select and the sel==0 input - if >the AND is false the result is false, if the AND is true, the result >is the other AND. > >2) By using a primitive, the logic feeding the primitive's pin isn't >optimized across the primitive. > >The synthesizer will produce a nice 2-level implementation for 5 >compares but not 8 so splitting it up into 5 compares and 3, the MUXF5 >used as an AND can give a nice balance of delays. Its slightly more >than 2 LUTs of delay, but very slightly. The code looks cleaner and >the implementation is tight. > >=============================================================== >module testRun ( input clk > , input [64:0] bytesPlus1 > , output reg [ 7:0] runByte > ); > >wire [ 7:0] runMux; >wire [63:0] xnorBits = bytesPlus1[63:0] ^~ bytesPlus1[64:1]; >// the result of a bit compare a==b is the same as a^~b > >genvar h; >generate > for( h=0; h<8; h=h+1 ) > begin : run > MUXF5 mux ( .O(runMux[h]), .S ( & xnorBits[h*8+2:h*8+0] ) > , .I1( & xnorBits[h*8+7:h*8+3] ) > , .I0( & xnorBits[h*8+2:h*8+0] ) ); > end >endgenerate > >always @(posedge clk) runByte <= runMux; > >endmodule > >Article: 61371
DK wrote: > Hi, > > could somebody suggest hi-performance workstation I can use to compile my > designs in a fastest possible way? > > I use Altera Quartus III and EP1C12Q240 device. It 90% full with most of the > memory used. My Athlon XP 1800+ on ASUS A7V266-C motherboard with 512M of > RAM takes 8/23 minutes to compile/fit. I did try dual Xeon 2200 on TYAN MB > and it did only 5% faster.. Your machine is fine. 23 minutes is not bad. Consider using more simuation before synthesis. A sim is 10x faster than a synth. Consider loading suse or redhat linux and running Quartus/linux (can dual boot win/linux if you like) --Mike TreselerArticle: 61372
John_H wrote: > > rickman <spamgoeshere4@yahoo.com> wrote in message news:<3F7B3F59.191901A4@yahoo.com>... > > My experience has been that it does not much matter how you code > > combinatorial logic like this. The tools run it through a grinder and > > produce an optimal version (in its own mind). When I want to optimize > > like this, I either use a "keep" attribute on the wire, or sometimes you > > can instantiate primitives. For logic I don't think primitives work > > since gates just get remapped. > > I overuse the syn_keep attribute and I hate the idea of instantiating > LUTs. My Carnot skills aren't exactly used regularly. Actually, I don't think logic primatives will work since the back end mapper can redo logic at will. The keep attribute is what is required to define the LUTs and even that is not guaranteed since it only results in a wire being kept; the LUT can still be split if other logic uses the same inputs. > > But I still don't understand your code. Why does the outer loop range > > over 64 values. > > I've had problems with bit ranges in the past where [i+4:i] is a > complaint. Perhaps this isn't an issue with for loops but I've > learned to avoid them in general logic. They do work fine in generate > blocks, however. I stepped through every bit to make a comparison to > the adjacent bit; 3 adjacent comparisons lumped into one variable > (with an eventual syn_keep) would give me 4-input functions that > should pack into LUTs. The complex end of the inside loop is so that > the three "LUTs" per byte are 4-input, 4-input, and 3-input functions. I don't really see what problem you are trying to solve with that, but then I am not as well versed in verilog compared to my VHDL. > > I would code two nested loops where the outer loop > > ranges over the 8 outputs and the inner loop ranges over the 9 inputs > > for each output. Or just skip the inner loop and use two outputs from > > two sets of four inputs feeding a 3 input function and use keeps on the > > first two output arrays. Maybe that is what you are doing, but I can't > > figure out the code easily. > > > > I see you are incrementing the i variable by j and ranging j in the > > second loop by some complex control expression. Can't you just > > increment i by 8? > > > > for( i=0; i<64; i=i+8 ) begin > > k = i % 8; > > for( j=0; j<4; j=j+1 ) begin > > runBitsA_[k] = runBitsA_[k] & bytePlus1[i+j]; > > runBitsB_[k] = runBitsB_[k] & bytePlus1[i+j+4]; > > end > > runByte_[i] = runBitsA_[i] & runBitsB_[k] & bytePlus1[i+9]; > > end > > > > Put the keep on runBitsA_ and runBitsB_ and you should get your two > > level structure. > > This works very well for runs of ones only. I need to identify runs > of ones or runs of zeros. The technique can be expanded to my needs > resulting in runBitsA, B, and C where one of them needs to cover 2 > comparisons, not 3 like the others. ...which is really is the > approach I was coding but using consecutive bits in a vector rather > than {A,B,C} and using the one statement rather than 3 to make the > assignments, dealing with the 2 comparison exception by terminating > the inside loop early. Again, I may not completely understand your problem. This was intended to show you how to solve the problem. To cover the adjacent zeros, you just do the same logic using the OR operator and invert the result. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAXArticle: 61373
http://www.crimsoneditor.com/ and it's free! --Lee "Valentin Tihomirov" <valentin@abelectron.com> wrote in message news:<3f7c1470_1@news.estpak.ee>... > > I still like Aldec for design entry. Editor is very much studio editor > > like, plus you can run sims right there as well as integrate in the rest > > of your tool flow. For the price, I think it is a great value. > > Heh, I appretiate Aldec cos it is not studio-like as opposed to Xilinx's > WebPack. And more feature rich and its level ov integration of different > tools (like jumping to errorous code line and more).Article: 61374
Thanks, I'll go look at your article. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Martin Euredjian To send private email: 0_0_0_0_@pacbell.net where "0_0_0_0_" = "martineu" "Jan Gray" <jsgray@acm.org> wrote in message news:QwYeb.11170$RW4.4195@newsread4.news.pas.earthlink.net... > A simple and workable approach is to use one or more BRAMs as a LINE buffer > and a LINE Z buffer, perhaps double buffered for simplicity. Then by sorting > (and incrementally updating on the fly) your display list of graphics > primitives by Y coordinate and then by X coordinate, you can iterate over > them and render them into the line buffer. Works fine for Gouraud shaded > filled primitives like trapezoids too. (Textures will require more memory > ports, perhaps on-chip, perhaps not.) > > So long as you can render a line worth of graphics faster than you shift out > the previously rendered line, you're looking good. No frame buffer, no high > bandwidth frame buffer memory, no frame buffer memory I/Os. Just pretty > raster graphics. > > With a soft CPU core to do display list management, and a simple hardware > span-filler coprocessor on the interface to the line buffer, scenes of > limited complexity seem quite doable in even a rather spartan FPGA. For > more complexity and more performance, move the display list manager and span > edge DDAs to hardware. > > See also my 1995 article on an FPGA-based rendering coprocessor: > http://fpgacpu.org/usenet/render.html. > > Jan Gray, Gray Research LLC > >
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z