Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive

Compare FPGA features and resources

Threads starting:

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search

vikas_akalwadi@indiatimes.com (vikas) writes: > hi all, > I am presently working on hardware implementation of viterbi decoder > with constraint length K=7 with soft decision width = 3. It would be > very helpful to me if the knowledgable persons can answer my doubts > regarding the issues: > > 1. The ACS modules (64 in number in my case) are taking most of the > area. Some documents mention of modified ACS units. But i couldnot get > those documents as they were preveliged for ieee members. > > 2. To avoid the overflowing of partial path metric values, i am doing > normalisation i.e., subtracting the lowest value from all the partial > path metric values. Some documents mention about "localised > normalisation". How does that work. > > 3. After survival data equal to trace back deapth has been stored, we > start trace back. If we are to start the traceback with lowest partial > path metric, how do we determine that state if we do "localised > normalisation" > > 4. What are the different techniques for trace back operation ? > Since i am implementing it on hardware, functionality, area and timing > all are very important. Jens Sparsø and Steen Pedersen at the Technical University of Denmark did a full-custom implementation of a Viterbi decoder back in around 1990. They wrote several articles based on that, and we poor students had as a "standard case-study" for several years :-/ Maybe you could find something on their website. I believe the right website is http://www.imm.dtu.dk/ [...] Ah, yes. Go search for "Viterbi" and you'll get a bunch of references. Kai -- Kai Harrekilde-Petersen <khp@vitesse.com> Opinions are mine...

There was a time way back in Sept 2000 when the Verilog simprim I was using had a problem where the notifier would kick in because of the setup/hold numbers defined in the X_SRL16E.v simprim. I just added a delay in the simprim where the assignment was made, from: {data[15:0]} <= {data[14:0], d_in}; to: #2 {data[15:0]} <= {data[14:0], d_in}; All it took was a couple picoseconds. I found this out by drilling down into the simprim in my simulation to find *why* I was getting the "x" values for my SRL outputs. The "notifier" was kicking in when it made no sense. This may have no bearing on your current situation but it struck a familiar chord. - John_H "Frank Hoffmann" <fh215@xxx.yyy.ac.uk> wrote in message news:b85uui$27k$1@pegasus.csx.cam.ac.uk... > I hope that somebody can help me with this ? > > I have a small design which uses instantiated SRL16 primitves. > > The design simulates fine with webpack 4.2 and the 'matching' modelsim > simulator, but generates loads of timing errors with the latest set of > tools. The timing errors all seem to be caused by the SRL16 primitives. > > Has anybody come across this and can tell me why this happens and how to > fix it ? > > Thanks for your help in advance, > > - Frank > > PS: > to send no-spam email, replace "xxx" with "eng" and "yyy" with "cam". > > > > ================================================== > Frank Hoffmann > > Laboratory for Communication Engineering (LCE) > University of Cambridge - Dept. of Engineering > William Gates Building, JJ Thomson Avenue, > Cambridge, CB3 0FD, UK > > phone : +44 1223 767031 fax : +44 1223 767010 > ================================================== >

"John Milbanks" <phony@nowhere.cc> wrote in message news:<w61oa.7947$5R6.7215@fed1read01>... > Out of curiosity, does anyone know the reason Xess discontinued its Virtex > prototyping XSV boards and now offers only the SpartanII XSA ones? No > demand? Too expensive? Check at http://www.associatedpro.com APS still carries Virtex boards in a small form factor. PC104 or stand alone It seems like XILINX also quit placing the third party board links on its website., or it is incomplete. Neither APS nor Xess are listed , nor is Annapolis Micro or many others. Maybe that is influencing the board vendors.

I am very impressed with the AMONTEC website, and product line. The VHDL online reference is also very nice. http://www.amontec.com I understand AMONTEC does consulting work in Europe. Laurent Gauch has done some work for me in the US and I highly recommend them. Also the JTAG/I2C/COOLRUNNER/WIGGLER POD is a really cool way to load your FPGA, and get a COOLRUNNER prototype system all at the same time, and its only $150.00 bucks and comes with tons of core and source code. It does everything but catch fish! Great buy! Rick http://www.associatedpro.com

Hey, I need to calculate (n mod 3) in a Virtex-II design. n is a 10-bit unsigned number and 3 is a constant. This has to be done in the same cycle (combinatorial!). Now what's a good way to implement that? I thought of a lookup table (distributed RAM) but this takes quite a lot of space. Any better ideas? (Ray, the arithmetic guru? :-) Do you think I can perform this operation at 200 MHz in a Virtex-II? Thanks! RISC_taker

Robert <rpudlik@poczta.onet.pl> wrote in message news:<3EA6A600.5060405@poczta.onet.pl>... > It was so close: 1.2V core voltage. > In my current design I'm going to use processor with 1.26V core > voltage;) It would be nice to have one regulator less... Howdy Robert, Assuming the regulator can handle the current, you should only need one for voltages that close. The SP3 datasheet states that 1.26 volts is acceptable. Or (making an educated guess) you could probably run your processor 5% low - which would get you to 1.2V. Or perhaps the best of both worlds: you could just split the difference and run that rail at 1.23 Volts. Then each part would be only ~2.5% from the spec'ed mid-point voltage, giving you some margin on both sides. Have fun, Marc

IRQs 1-15 are 'reserved' for system-level exceptions - check out the Nios Programmer's Reference Manual (PDF) file. IRQ 1 is for register window underflow... something that would happen if you return (ret instruction) from a subroutine call, but without entering (call instruction) the subroutine. Are you doing anything fancy during the boot-up process with your own startup code? If you're doing something such as compiling a traditional C program (with main(), and a bunch of subroutines), built with nios-build, then we will link in code that sets up interrupts, the register window, etc. and prevents this sort of thing. - Jesse jim006@att.net (Jim M.) wrote in message news:<6f3fc0f8.0304230440.35a703d9@posting.google.com>... > The error message prints in hex huh? Well that's probably worth > knowing. > > That explains IRQ 17 and 19 (timer and lan respectively) > > Is it possible to receive a spurious interrupt for an IRQ not assigned > in SOPC Builder. I recall having a spurious IRQ #1, although I may > have been mistaken. > > You mention that IRQs 16-64 are for user exceptions. What about IRQs > 1-15 ? > >

"RISC taker" <RISC_taker@alpenjodel.de> schrieb im Newsbeitrag news:18c289aa.0304230854.6897fb3b@posting.google.com... > Hey, I need to calculate (n mod 3) in a Virtex-II design. n is a > 10-bit unsigned number and 3 is a constant. This has to be done in the > same cycle (combinatorial!). Now what's a good way to implement that? > > I thought of a lookup table (distributed RAM) but this takes quite a > lot of space. Any better ideas? (Ray, the arithmetic guru? :-) Iam not a guru, but how about a BRAM, used in x4 configuration. Just store the truth table ther, doen. OK, it has 1 cycle latency, but hey, its pretty easy and fast. 200 MHz should be possible. Otherwise you could try to use a excel sheet or something to generate the truth table. Maybe you will find some clever optimization possibilities. -- MfG Falk

Robert wrote: > > It was so close: 1.2V core voltage. > In my current design I'm going to use processor with 1.26V core > voltage;) It would be nice to have one regulator less... You must be talking about the TI C6711C or C6713. Check the tolerances. I bet you can pick one voltage that will suit both. It is not like the voltage is *that* critical to these chips. If it is 50 mV too high the chip won't *blow*. It will use about 8% more power by my estimation. The match in power voltage is one reason I would like to use the Spartan 3. But I can't wait for 9 months. Beside, this design is not for a single board. We plan to use other DSP chips which will not have the low 1.26 volt power. Fortunately they *do* make LDOs that will drop the 1.5 volt power to 1.26 volts! Think about it. That is 84% efficiency. Not bad for an LDO. You can do a bit better by adding yet another switcher to your design, but is it worth it? -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX

Bob wrote: > Hi, > > When calculating an FFT, how does the inclusion of the DC value affect > things i.e. I have seen some examples of FFT code where the DC is > removed and some where it left. > > Is the best strategy to remove the DC or leave it. For example in a > COFDM system, will its removal or inclusion have any effect? I am > thinking of its > implementation in an ASIC, where DC removal adds overhead and I'm > wondering if it really necessary to remove it. How will adding > removing the DC level affect the ifft? What is the problem ? the DC is an additional number and corresponds to a DC bias. It depends on the problem whether you need it or not. Rene -- Ing.Buero R.Tschaggelar - http://www.ibrtses.com & commercial newsgroups - http://www.talkto.net

How much older. There was a significant slowdown in the SRL16's in the updated speed files, I think it was v3.3 service pack 8. The newer version of the tools will have the most recent speed files. Frank Hoffmann wrote: > Hi- > thanks for that tip. I'll investigate it, but I wonder whether this is > the reason for the unexplained errors I'm getting ? Mind you, the > identical design was simulating without any errors in the older version > of tools ? > > - Frank > > Ray Andraka wrote: > > The clock to Y timing of the SRL16 is not very impressive. You make things > > work much better if you feed the output of the SRL16 directly to the > > flip-flop in the same slice before using it. It adds one more clock delay, > > but eliminates the extra routing and FF set up. > > > > Frank Hoffmann wrote: > > > > > >>I hope that somebody can help me with this ? > >> > >>I have a small design which uses instantiated SRL16 primitves. > >> > >>The design simulates fine with webpack 4.2 and the 'matching' modelsim > >>simulator, but generates loads of timing errors with the latest set of > >>tools. The timing errors all seem to be caused by the SRL16 primitives. > >> > >>Has anybody come across this and can tell me why this happens and how to > >>fix it ? > >> > >>Thanks for your help in advance, > >> > >>- Frank > >> > >>PS: > >>to send no-spam email, replace "xxx" with "eng" and "yyy" with "cam". > >> > >>================================================== > >>Frank Hoffmann > >> > >>Laboratory for Communication Engineering (LCE) > >>University of Cambridge - Dept. of Engineering > >>William Gates Building, JJ Thomson Avenue, > >>Cambridge, CB3 0FD, UK > >> > >>phone : +44 1223 767031 fax : +44 1223 767010 > >>================================================== > > > > > > -- > > --Ray Andraka, P.E. > > President, the Andraka Consulting Group, Inc. > > 401/884-7930 Fax 401/884-7950 > > email ray@andraka.com > > http://www.andraka.com > > > > "They that give up essential liberty to obtain a little > > temporary safety deserve neither liberty nor safety." > > -Benjamin Franklin, 1759 > > > > -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

with only 10 bits input, you only have a 1K address space, and the output only needs 2 bits (0,1,2), a so this can easily fit in a single block RAM. That would be your best bet for single cycle operation at 200 MHz. If you are really adverse to using block RAM, there is a shortcut similar to the one for decimal numbers but I don't recall it off hand. The decimal shortcut is to sum the digits, and divide that sum by 3. The remainder of that division is the mod. It would take a fair amount of area to attempt that combinatorially, and you'd be hard pressed to do it in a single cycle at 200 MHz....go with the BRAM. RISC taker wrote: > Hey, I need to calculate (n mod 3) in a Virtex-II design. n is a > 10-bit unsigned number and 3 is a constant. This has to be done in the > same cycle (combinatorial!). Now what's a good way to implement that? > > I thought of a lookup table (distributed RAM) but this takes quite a > lot of space. Any better ideas? (Ray, the arithmetic guru? :-) > > Do you think I can perform this operation at 200 MHz in a Virtex-II? > > Thanks! > RISC_taker -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

Great! I posted questions about this but never got anything concrete. This is interesting and partly answers my questions. I have extra questions (sorry) : - Is there any VHDL similar example? - What software do you use to play with the JTAG USER1 commands on the PC side? Thanks for your help, Frederic Bastenaire "Philip Freidin" <philip@fliptronics.com> a écrit dans le message de news: n7hv9v438bfif84cg3mjoaeim15b3qkla0@4ax.com... > You need to create your own data register, and connect it to > the JTAG primitive. > > For your entertainment, here is an example of doing it for > Virtex-II. > > > (...) > Philip Freidin > Fliptronics

In article <18c289aa.0304230854.6897fb3b@posting.google.com>, RISC taker <RISC_taker@alpenjodel.de> wrote: >Hey, I need to calculate (n mod 3) in a Virtex-II design. n is a >10-bit unsigned number and 3 is a constant. This has to be done in the >same cycle (combinatorial!). Now what's a good way to implement that? Off the top of my head, I think so: Consider any all-1s binary number with an even number of bits. That number is a multiple of 3. 3, 15, 63, 255, or in your case, 1023. For every EVEN bit NOT set in that number, the remainder goes up by 2. For every ODD bit not set in that number, the remainder goes up by 1. For example: 1111 = 15 = multiple of 3 0111 = 7 = remainder 1 1011 = 11 = remainder 2 1101 = 13 = remainder 1 1110 = 14 = remainder 2 and any combination: 0101 = 5 = remainder (1+1) => 2 0110 = 6 = remainder (1+2) => 3 => 0 Conveniently each pair of bits makes a 2-bit number which you can add up. Now if you add up all of the possible bits that way for a 10 digit number the most you can get is 15 (1 x 5 odd bits + 2 x 5 even bits = 15). You could then handle that with a lookup table or a few well-chosen gates. Or you could repeat this process again. -- Ben Jackson <ben@ben.com> http://www.ben.com/

On 23 Apr 2003 09:54:22 -0700, RISC_taker@alpenjodel.de (RISC taker) wrote: >Hey, I need to calculate (n mod 3) in a Virtex-II design. n is a >10-bit unsigned number and 3 is a constant. This has to be done in the >same cycle (combinatorial!). Now what's a good way to implement that? > >I thought of a lookup table (distributed RAM) but this takes quite a >lot of space. Any better ideas? (Ray, the arithmetic guru? :-) > >Do you think I can perform this operation at 200 MHz in a Virtex-II? > >Thanks! >RISC_taker Hi RISC, If you break up your 10 bit input word into five 2 bit words, you can take the modulus of each (using five pairs of 2 input LUTs), then add the results (to get a four bit number), then take the modulus of that. This works since a mod (a - 1) = 1 and b mod (a - 1) = (a x b) mod (a - 1) (Think of a as being an even power of 2, which means we don't change the modulus if we shift the input by 2 bits.) We can improve the timing by using pairs of 4 input LUTS to take four bit slices of your input. We then sum the three 2 bit values (using 2 levels of logic) to get a 4 bit result, then take the modulus of that in another pair of LUTs. This takes a total of four levels of logic, which should work at 200MHz in Virtex-II, depending on the speed grade, your patience, etc. Regards, Allan.

Mike is right, you cannot synthesize the hdl code generated by the coregen. The core generator outputs HDL code for simulation, and a *.edn for synthesis. You need to place this file in the folder of your top level *.edf file. You can use the HDL code to write the instantiation of the core. I tend not to use the coregen because of the flow it forces.

On Thu, 24 Apr 2003 04:20:49 +1000, Allan Herriman <allan_herriman.hates.spam@agilent.com> wrote: >On 23 Apr 2003 09:54:22 -0700, RISC_taker@alpenjodel.de (RISC taker) >wrote: > >>Hey, I need to calculate (n mod 3) in a Virtex-II design. n is a >>10-bit unsigned number and 3 is a constant. This has to be done in the >>same cycle (combinatorial!). Now what's a good way to implement that? >> >>I thought of a lookup table (distributed RAM) but this takes quite a >>lot of space. Any better ideas? (Ray, the arithmetic guru? :-) >> >>Do you think I can perform this operation at 200 MHz in a Virtex-II? >> >>Thanks! >>RISC_taker > >Hi RISC, > >If you break up your 10 bit input word into five 2 bit words, you can >take the modulus of each (using five pairs of 2 input LUTs), then add >the results (to get a four bit number), then take the modulus of that. > >This works since > >a mod (a - 1) = 1 > >and > >b mod (a - 1) = (a x b) mod (a - 1) > >(Think of a as being an even power of 2, which means we don't change >the modulus if we shift the input by 2 bits.) > > >We can improve the timing by using pairs of 4 input LUTS to take four >bit slices of your input. >We then sum the three 2 bit values (using 2 levels of logic) to get a >4 bit result, then take the modulus of that in another pair of LUTs. > >This takes a total of four levels of logic, which should work at >200MHz in Virtex-II, depending on the speed grade, your patience, etc. BTW, the total hardware is 13 LUTs, which might fit into 2 CLBs. (Or you could use a block ram, as other posters have suggested.) Regards, Allan.

You are on the right track... There are three keys to this solution, the first is (n*4^m mod 3) = n mod 3 for all values of m The second is (4n mod 3) = n mod 3 The third is ((a + b) mod 3) = ((a mod 3) + (b mod 3)) mod 3 Using these reductions you can construct a VERY fast mod 3 calculator with extremely small gate count. I have used this to do 19 bits at over 100MHz with no problems at all, and I am sure it will do 10 bits at 200 MHz. Its size is TINY. For 10 bits it will require 3 levels of LUTs and a total of (I think) 8 LUTs. The basis of the algorithm is that you can take the input 4 bits at a time, and generate the mod 3 (which is 2 bits) in two LUTs (one for each output bit). Then adjacent pairs of these two bit quantities can be concatenated to generate 4 more bits, which can be reduce to 2. You need to do this log2(width)-1 times to get to the final 2 bits. Here it is. You can let the synthesis tool optimize out the LUTs you don't need (since you are only doing 10 bits), or you chop out the stages you don't need by hand. Avrum ------------------------------------------------------------ /* * Module: syn_mod3_32 * Creation Date: Tue Feb 20 2000 * Author: Avrum Warshawsky * Description: Synthetic Mod3 calculator * Instantiated models: none * DEFINE: WIDTH * * * Description: * * This module will calculate mod 3 for any number up to 32 bits. * The parameter WIDTH determines the width of the input. * The width of the output is always 2 bits, which will be * 0, 1, or 2 (never 3) - the MOD3 of the input data * */ module syn_mod3_32(out, in); //************************************************************************** **** // Port Declarations //************************************************************************** **** parameter WIDTH=19; input [WIDTH-1:0] in; output [1:0] out; function [1:0] digit_mod; input [3:0] digit; case(digit) 4'h0: digit_mod = 2'd0; 4'h1: digit_mod = 2'd1; 4'h2: digit_mod = 2'd2; 4'h3: digit_mod = 2'd0; 4'h4: digit_mod = 2'd1; 4'h5: digit_mod = 2'd2; 4'h6: digit_mod = 2'd0; 4'h7: digit_mod = 2'd1; 4'h8: digit_mod = 2'd2; 4'h9: digit_mod = 2'd0; 4'ha: digit_mod = 2'd1; 4'hb: digit_mod = 2'd2; 4'hc: digit_mod = 2'd0; 4'hd: digit_mod = 2'd1; 4'he: digit_mod = 2'd2; 4'hf: digit_mod = 2'd0; endcase endfunction wire [1:0] m00, m01, m02, m03, m04, m05, m06, m07; wire [1:0] m10, m11, m12, m13; wire [1:0] m20, m21; wire [31:0] my_in = in; // Let it zero extend for us assign m00 = digit_mod(my_in[ 3:0 ]); assign m01 = digit_mod(my_in[ 7:4 ]); assign m02 = digit_mod(my_in[11:8 ]); assign m03 = digit_mod(my_in[15:12]); assign m04 = digit_mod(my_in[19:16]); assign m05 = digit_mod(my_in[23:20]); assign m06 = digit_mod(my_in[27:24]); assign m07 = digit_mod(my_in[31:28]); assign m10 = digit_mod({m01, m00}); assign m11 = digit_mod({m03, m02}); assign m12 = digit_mod({m05, m04}); assign m13 = digit_mod({m07, m06}); assign m20 = digit_mod({m11, m10}); assign m21 = digit_mod({m13, m12}); assign out = digit_mod({m21, m20}); // synthesis translate_off initial begin if (WIDTH > 32) begin $display("%t ERROR: Mod3 width must be <= 32 in %m",$realtime); end end // synthesis translate_on endmodule "Ben Jackson" <ben@ben.com> wrote in message news:jFApa.584264$3D1.324134@sccrnsc01... > In article <18c289aa.0304230854.6897fb3b@posting.google.com>, > RISC taker <RISC_taker@alpenjodel.de> wrote: > >Hey, I need to calculate (n mod 3) in a Virtex-II design. n is a > >10-bit unsigned number and 3 is a constant. This has to be done in the > >same cycle (combinatorial!). Now what's a good way to implement that? > > Off the top of my head, I think so: > > Consider any all-1s binary number with an even number of bits. That > number is a multiple of 3. 3, 15, 63, 255, or in your case, 1023. > For every EVEN bit NOT set in that number, the remainder goes up by 2. > For every ODD bit not set in that number, the remainder goes up by 1. > For example: > > 1111 = 15 = multiple of 3 > 0111 = 7 = remainder 1 > 1011 = 11 = remainder 2 > 1101 = 13 = remainder 1 > 1110 = 14 = remainder 2 > > and any combination: > 0101 = 5 = remainder (1+1) => 2 > 0110 = 6 = remainder (1+2) => 3 => 0 > > Conveniently each pair of bits makes a 2-bit number which you can add > up. > > Now if you add up all of the possible bits that way for a 10 digit number > the most you can get is 15 (1 x 5 odd bits + 2 x 5 even bits = 15). You > could then handle that with a lookup table or a few well-chosen gates. > Or you could repeat this process again. > > -- > Ben Jackson > <ben@ben.com> > http://www.ben.com/

Hello all, I am considering experimenting with FPGA's. I was hoping someone here could point me in a good starting direction. BTW, I would prefer to use Linux as a development platform although it is not critical. Are there emulators out there that I can get my feet wet with? I have been a programmer for some time and have programmed a number of different microcontrollers. FPGA stuff is new to me. I am well versed in electronics and am currently doing some EE work. My goal with FPGA's is to work with multi-agent AI systems in hardware. Systems such as genetic algorithms, classifier systems, neural nets, fuzzy logic engines, etc... Is my understanding of FPGA's proper? Can this type of thing be done? Any comments would be greatly appreciated. Even if I am completely whacked... ;-) Thanks in advance. -- Keith Youngblood lifer@notthistime.invalid To email me, replace domain name with o l y w a DOT n e t (minus the spaces)

"Basuki Endah Priyanto" <EBEPriyanto@ntu.edu.sg> writes: > I have Xilinx virtex2-1000 and Textronix Logic Analyzer. The problem > is the TTL output level from my logic analyzer is 3.8 volt and the > maximum volatge to my Virtex2 is 3.3 volt. I asked: > Are you using the Tek as a pattern generator or something? Normally > a logic analyzer has *inputs*, not outputs. "Basuki Endah Priyanto" <EBEPriyanto@ntu.edu.sg> writes: > yes .. the "Tex" is as a pattern generator. Well, if you're sure it's a "Tex", I can't help you much. We only use "Tek" (Tektronix) gear around here. I've never even heard of "Textronix", but if they made logic analyzers I'd expect Tektronix to sue them for trademark infringement. Anyhow, if your "Tex" doesn't have settings for 3.3V CMOS output from the pattern generator, you may need to kludge up some buffers or quickswitches to do level conversion.

Allan Herriman wrote: > > On Thu, 24 Apr 2003 04:20:49 +1000, Allan Herriman > <allan_herriman.hates.spam@agilent.com> wrote: > > >On 23 Apr 2003 09:54:22 -0700, RISC_taker@alpenjodel.de (RISC taker) > >wrote: > > > >>Hey, I need to calculate (n mod 3) in a Virtex-II design. n is a > >>10-bit unsigned number and 3 is a constant. This has to be done in the > >>same cycle (combinatorial!). Now what's a good way to implement that? > >> > >>I thought of a lookup table (distributed RAM) but this takes quite a > >>lot of space. Any better ideas? (Ray, the arithmetic guru? :-) > >> > >>Do you think I can perform this operation at 200 MHz in a Virtex-II? > >> > >>Thanks! > >>RISC_taker > > > >Hi RISC, > > > >If you break up your 10 bit input word into five 2 bit words, you can > >take the modulus of each (using five pairs of 2 input LUTs), then add > >the results (to get a four bit number), then take the modulus of that. > > > >This works since > > > >a mod (a - 1) = 1 > > > >and > > > >b mod (a - 1) = (a x b) mod (a - 1) > > > >(Think of a as being an even power of 2, which means we don't change > >the modulus if we shift the input by 2 bits.) > > > > > >We can improve the timing by using pairs of 4 input LUTS to take four > >bit slices of your input. > >We then sum the three 2 bit values (using 2 levels of logic) to get a > >4 bit result, then take the modulus of that in another pair of LUTs. > > > >This takes a total of four levels of logic, which should work at > >200MHz in Virtex-II, depending on the speed grade, your patience, etc. > > BTW, the total hardware is 13 LUTs, which might fit into 2 CLBs. > > (Or you could use a block ram, as other posters have suggested.) > > Regards, > Allan. If I understand how this algorithm works, I think the logic can be reduced to 10 LUTs in two levels. Instead of adding the values of the pairs of inputs, group them into even and odd bits. Use the LUTs with the F5 mux to implement the modified two bit sum of five equal value inputs. When I say modified, you need to produce a two bit result so logically add the result bit 2^2 back in as another input bit. Five bits in, two bits out. Or you can think of this as a 32 entry truth table. The point is that you only need two outputs from any of these functions to produce a 0, a 1 or a 2. This will use a total of 8 LUTs to give you a two bit even bit sum and a two bit odd bit sum. These four signals can then be run through a pair of LUTs to give you the two bit modulo three result by using a simple truth table. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX

You are right - it can be done with 2 levels of logic and some F5MUXes (which will be faster than the three levels of logic I propsed above). Referring to my earlier implementation, you will still need the function digit_mod, and you will also need an equivalent 5 bit function, lets call it five_bit_mod - it will be a 5 bit input, 2 bit output case table like digit_mod (which just lists the mod3 of all 32 5 bit combinations). This SHOULD be implemented as 2*(2*LUT4+F5MUX), if the synthesis tool does its job. The digit_mod should be implemented as 2*LUT4 wire [1:0] m0, m1; m0 = five_bit_mod(in[4:0]); m1 = five_bit_mod(in[9:5]); out = digit_mod({m1,m0}); This will (oddly) take two more LUTs, but should be faster. Avrum "rickman" <spamgoeshere4@yahoo.com> wrote in message news:3EA6EEEF.9EEC9047@yahoo.com... > Allan Herriman wrote: > > > > On Thu, 24 Apr 2003 04:20:49 +1000, Allan Herriman > > <allan_herriman.hates.spam@agilent.com> wrote: > > > > >On 23 Apr 2003 09:54:22 -0700, RISC_taker@alpenjodel.de (RISC taker) > > >wrote: > > > > > >>Hey, I need to calculate (n mod 3) in a Virtex-II design. n is a > > >>10-bit unsigned number and 3 is a constant. This has to be done in the > > >>same cycle (combinatorial!). Now what's a good way to implement that? > > >> > > >>I thought of a lookup table (distributed RAM) but this takes quite a > > >>lot of space. Any better ideas? (Ray, the arithmetic guru? :-) > > >> > > >>Do you think I can perform this operation at 200 MHz in a Virtex-II? > > >> > > >>Thanks! > > >>RISC_taker > > > > > >Hi RISC, > > > > > >If you break up your 10 bit input word into five 2 bit words, you can > > >take the modulus of each (using five pairs of 2 input LUTs), then add > > >the results (to get a four bit number), then take the modulus of that. > > > > > >This works since > > > > > >a mod (a - 1) = 1 > > > > > >and > > > > > >b mod (a - 1) = (a x b) mod (a - 1) > > > > > >(Think of a as being an even power of 2, which means we don't change > > >the modulus if we shift the input by 2 bits.) > > > > > > > > >We can improve the timing by using pairs of 4 input LUTS to take four > > >bit slices of your input. > > >We then sum the three 2 bit values (using 2 levels of logic) to get a > > >4 bit result, then take the modulus of that in another pair of LUTs. > > > > > >This takes a total of four levels of logic, which should work at > > >200MHz in Virtex-II, depending on the speed grade, your patience, etc. > > > > BTW, the total hardware is 13 LUTs, which might fit into 2 CLBs. > > > > (Or you could use a block ram, as other posters have suggested.) > > > > Regards, > > Allan. > > If I understand how this algorithm works, I think the logic can be > reduced to 10 LUTs in two levels. > > Instead of adding the values of the pairs of inputs, group them into > even and odd bits. Use the LUTs with the F5 mux to implement the > modified two bit sum of five equal value inputs. When I say modified, > you need to produce a two bit result so logically add the result bit 2^2 > back in as another input bit. Five bits in, two bits out. Or you can > think of this as a 32 entry truth table. The point is that you only > need two outputs from any of these functions to produce a 0, a 1 or a > 2. > > This will use a total of 8 LUTs to give you a two bit even bit sum and a > two bit odd bit sum. These four signals can then be run through a pair > of LUTs to give you the two bit modulo three result by using a simple > truth table. > > -- > > Rick "rickman" Collins > > rick.collins@XYarius.com > Ignore the reply address. To email me use the above address with the XY > removed. > > Arius - A Signal Processing Solutions Company > Specializing in DSP and FPGA design URL http://www.arius.com > 4 King Ave 301-682-7772 Voice > Frederick, MD 21701-3110 301-682-7666 FAX

I take that back - it won't work (sorry). 32 mod 3 is NOT 1, therefore you cannot use five_bit_mod on bits 9:5. You can build a different 32->2 function for bit 9:5, but it would be 5'd0: out = 0; 5'd1: out = 2; // the mod3 of 2 5'd2: out = 1; // the mod3 of 4 5'd3: out = 0; // the mod3 of 6 5'd4: out = 2; // the mod3 of 8 etc... This would account for the 1 bit shift. Avrum "Avrum" <avrum@REMOVEsympatico.ca> wrote in message news:tPCpa.2998$2g5.436588@news20.bellglobal.com... > You are right - it can be done with 2 levels of logic and some F5MUXes > (which will be faster than the three levels of logic I propsed above). > > Referring to my earlier implementation, you will still need the function > digit_mod, and you will also need an equivalent 5 bit function, lets call it > five_bit_mod - it will be a 5 bit input, 2 bit output case table like > digit_mod (which just lists the mod3 of all 32 5 bit combinations). This > SHOULD be implemented as 2*(2*LUT4+F5MUX), if the synthesis tool does its > job. The digit_mod should be implemented as 2*LUT4 > > wire [1:0] m0, m1; > > m0 = five_bit_mod(in[4:0]); > m1 = five_bit_mod(in[9:5]); > > out = digit_mod({m1,m0}); > > This will (oddly) take two more LUTs, but should be faster. > > Avrum > "rickman" <spamgoeshere4@yahoo.com> wrote in message > news:3EA6EEEF.9EEC9047@yahoo.com... > > Allan Herriman wrote: > > > > > > On Thu, 24 Apr 2003 04:20:49 +1000, Allan Herriman > > > <allan_herriman.hates.spam@agilent.com> wrote: > > > > > > >On 23 Apr 2003 09:54:22 -0700, RISC_taker@alpenjodel.de (RISC taker) > > > >wrote: > > > > > > > >>Hey, I need to calculate (n mod 3) in a Virtex-II design. n is a > > > >>10-bit unsigned number and 3 is a constant. This has to be done in the > > > >>same cycle (combinatorial!). Now what's a good way to implement that? > > > >> > > > >>I thought of a lookup table (distributed RAM) but this takes quite a > > > >>lot of space. Any better ideas? (Ray, the arithmetic guru? :-) > > > >> > > > >>Do you think I can perform this operation at 200 MHz in a Virtex-II? > > > >> > > > >>Thanks! > > > >>RISC_taker > > > > > > > >Hi RISC, > > > > > > > >If you break up your 10 bit input word into five 2 bit words, you can > > > >take the modulus of each (using five pairs of 2 input LUTs), then add > > > >the results (to get a four bit number), then take the modulus of that. > > > > > > > >This works since > > > > > > > >a mod (a - 1) = 1 > > > > > > > >and > > > > > > > >b mod (a - 1) = (a x b) mod (a - 1) > > > > > > > >(Think of a as being an even power of 2, which means we don't change > > > >the modulus if we shift the input by 2 bits.) > > > > > > > > > > > >We can improve the timing by using pairs of 4 input LUTS to take four > > > >bit slices of your input. > > > >We then sum the three 2 bit values (using 2 levels of logic) to get a > > > >4 bit result, then take the modulus of that in another pair of LUTs. > > > > > > > >This takes a total of four levels of logic, which should work at > > > >200MHz in Virtex-II, depending on the speed grade, your patience, etc. > > > > > > BTW, the total hardware is 13 LUTs, which might fit into 2 CLBs. > > > > > > (Or you could use a block ram, as other posters have suggested.) > > > > > > Regards, > > > Allan. > > > > If I understand how this algorithm works, I think the logic can be > > reduced to 10 LUTs in two levels. > > > > Instead of adding the values of the pairs of inputs, group them into > > even and odd bits. Use the LUTs with the F5 mux to implement the > > modified two bit sum of five equal value inputs. When I say modified, > > you need to produce a two bit result so logically add the result bit 2^2 > > back in as another input bit. Five bits in, two bits out. Or you can > > think of this as a 32 entry truth table. The point is that you only > > need two outputs from any of these functions to produce a 0, a 1 or a > > 2. > > > > This will use a total of 8 LUTs to give you a two bit even bit sum and a > > two bit odd bit sum. These four signals can then be run through a pair > > of LUTs to give you the two bit modulo three result by using a simple > > truth table. > > > > -- > > > > Rick "rickman" Collins > > > > rick.collins@XYarius.com > > Ignore the reply address. To email me use the above address with the XY > > removed. > > > > Arius - A Signal Processing Solutions Company > > Specializing in DSP and FPGA design URL http://www.arius.com > > 4 King Ave 301-682-7772 Voice > > Frederick, MD 21701-3110 301-682-7666 FAX > >

Mike Harrison wrote: > > On Wed, 23 Apr 2003 16:57:29 +0800, "Joshua Yin" <joshuayin@cytecht.com> wrote: > <snip> > > >Do you really need a microcontroller? > > I think the fact that we don't see low pin-count PLDs is that for the vast majority of applications > that might use one, a micro is a better solution, in terms of cost, flexibility, functionality and > power consumption. > > In most cases the argument would be 'do you really need a PLD?' > Micros are infinitely more flexible and powerful, usually take a lot less power, and the only reason > to use a PLD is that a micro isn't fast enough. You need to be carefull to compare like-process devices. Microcontrollers are highly flexible devices, especially at variable manipulation, and state-variable designs. That said, they are NOT lower power (same process), and using software to 'spin fast' to emulate hardware is inherently inefficent, from a power viewpoint, and also from a time-domain viewpoint. Scenix (Ubicom) took the pathway of 'all in SW', and they have very high Icc levels - this is why most uC have an extensive HW peripheral array. A PLD is inherently parallel, is very fast (eg protection), and it does not need 'SW refresh'. It is also harder to 'crash' a PLD :) There are many TimerChain / Peripheral IO expansion / FastPWM / DataPath / Power Management tasks where a low pin count PLD would complement a uC very nicely, and we do many designs where a uC is used with one (or more) SPLD/CPLD. The end result is much better than trying to get the uC to 'be all things' :) One trend we see with uC is as they get smaller (8/11/14 pin devices ), there is more opening for distributed IO expansion : eg with lower pin counts PLDs. The PCF8574 is a good IO expansion example, and the prices of these are HIGHER than many CPLDs, and they are slower, and far less flexible. Atmel CPLDs have uA region static Icc's, and the newer devices from Xilinx/Lattice also have uA Icc, but they are following the speed-dominant path, and there is certainly room for a smaller package PLD device that follows a uAFrugal-dominant pathway. Imagine what you could do with a uA PLD that was a morph of a PCF8563(RTC) / PCF8574 (IOexp) / 4060(Counter) / TinyLogic ? -jg

On Wed, 23 Apr 2003 15:52:15 -0400, rickman <spamgoeshere4@yahoo.com> wrote: >Allan Herriman wrote: >> >> On Thu, 24 Apr 2003 04:20:49 +1000, Allan Herriman >> <allan_herriman.hates.spam@agilent.com> wrote: >> >> >On 23 Apr 2003 09:54:22 -0700, RISC_taker@alpenjodel.de (RISC taker) >> >wrote: >> > >> >>Hey, I need to calculate (n mod 3) in a Virtex-II design. n is a >> >>10-bit unsigned number and 3 is a constant. This has to be done in the >> >>same cycle (combinatorial!). Now what's a good way to implement that? >> >> >> >>I thought of a lookup table (distributed RAM) but this takes quite a >> >>lot of space. Any better ideas? (Ray, the arithmetic guru? :-) >> >> >> >>Do you think I can perform this operation at 200 MHz in a Virtex-II? >> >> >> >>Thanks! >> >>RISC_taker >> > >> >Hi RISC, >> > >> >If you break up your 10 bit input word into five 2 bit words, you can >> >take the modulus of each (using five pairs of 2 input LUTs), then add >> >the results (to get a four bit number), then take the modulus of that. >> > >> >This works since >> > >> >a mod (a - 1) = 1 >> > >> >and >> > >> >b mod (a - 1) = (a x b) mod (a - 1) >> > >> >(Think of a as being an even power of 2, which means we don't change >> >the modulus if we shift the input by 2 bits.) >> > >> > >> >We can improve the timing by using pairs of 4 input LUTS to take four >> >bit slices of your input. >> >We then sum the three 2 bit values (using 2 levels of logic) to get a >> >4 bit result, then take the modulus of that in another pair of LUTs. >> > >> >This takes a total of four levels of logic, which should work at >> >200MHz in Virtex-II, depending on the speed grade, your patience, etc. >> >> BTW, the total hardware is 13 LUTs, which might fit into 2 CLBs. >> >> (Or you could use a block ram, as other posters have suggested.) >> >> Regards, >> Allan. > >If I understand how this algorithm works, I think the logic can be >reduced to 10 LUTs in two levels. Yes, I realised how to do it in 10 LUTs in *three* levels just after I posted, but by that time Avrum had already posted the equivalent solution in Verilog so I didn't bother with a retraction. If anyone is interested, I have appended the equivalent code in VHDL (with "automatic" depth detection, and without the 32 bit limitation) to this post. Your solution using the F5 mux is faster of course (if less portable). I'll think about adding it to the VHDL when I get some time. >Instead of adding the values of the pairs of inputs, group them into >even and odd bits. Use the LUTs with the F5 mux to implement the >modified two bit sum of five equal value inputs. When I say modified, >you need to produce a two bit result so logically add the result bit 2^2 >back in as another input bit. Five bits in, two bits out. Or you can >think of this as a 32 entry truth table. The point is that you only >need two outputs from any of these functions to produce a 0, a 1 or a >2. > >This will use a total of 8 LUTs to give you a two bit even bit sum and a >two bit odd bit sum. These four signals can then be run through a pair >of LUTs to give you the two bit modulo three result by using a simple >truth table. Regards, Allan. library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; entity mod3 is generic ( width : positive := 10 ); port ( input : in unsigned(width - 1 downto 0); output : out unsigned(1 downto 0) ); end entity mod3; architecture rtl of mod3 is pure function digit_mod (arg : unsigned(3 downto 0)) return unsigned is type mod3_table is array (0 to 15) of unsigned(1 downto 0); constant luts : mod3_table := ( "00","01","10","00", "01","10","00","01", "10","00","01","10", "00","01","10","00" ); begin return luts(to_integer(arg)); end digit_mod; pure function work_out_depth (width : positive) return positive is variable depth : integer := 1; variable my_count : integer := (width - 1) / 4; begin while my_count > 0 loop depth := depth + 1; my_count := my_count / 2; end loop; return depth; end work_out_depth; constant depth : positive := work_out_depth(width); type t_unsigned_array is array (depth downto 0) of unsigned(width + 3 downto 0); signal unsigned_array : t_unsigned_array := (others => (others => '0')); begin unsigned_array(0)(input'range) <= input; -- zero extend input g1: for d in 1 to depth generate g2: for w in 0 to (width - 1) / 4 generate unsigned_array(d)(2 * w + 1 downto 2 * w) <= digit_mod(unsigned_array(d - 1)(4 * w + 3 downto 4 * w)); end generate g2; end generate g1; output <= unsigned_array(depth)(1 downto 0); end architecture rtl;

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive

Compare FPGA features and resources

Threads starting:

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search