Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
I I suspect too a timing problem first you are using a very large device for few usage then as the router does not get any constraints it probably spread all the logic around and you have a quite large register to register delays second as data_out is cleared by the reset signal the data_out register cannot be inserted into I/O pads and you might end up a problem between r3 and data_out in xilinx fpga , natural direction are horizontal for datas and vertical for control signals . This means which P&R force data path from left to right side device and ends up large routing delays . - Try constraining the design (UCF) - keeping away the data_out reset - or select a smaller array - or force inputs and output on the same corner close each others I hope it will help -- Use our news server 'news.foorum.com' from anywhere. More details at: http://nnrpinfo.go.foorum.com/Article: 43801
hi, I have an application in mind witch needs a lot of RAM (maybe SDRAM is a good solution), I need a lot of flash for program code, and I can dump also the configuration data for the FPGA into the flash. I need a lot of glue logic for other dedicated IC's (CAN controllers, ethernet controller, ...) (I thought of using IP cores for these, but the amount of systems we intend to make is too low for this (approx 100pcs / yr)). I have also to integrate a 150k gates custom logic block, that needs to communicate with the CPU. So in this case, where the logic content is already 200k gates, I think I can afford the extra 100k gates for the processor if this means that I can benifit from the flexibility. (more or less the flexibility of added timers, serial ports, dual ported rams ( = integrated ram blocks), ...). Our customers (and sales people) never agree on the current implemented features, specialties is our normal bussiness. Another nice thing about the use of an FPGA as a processor is that the board layout can be very easy because you can place the I/O where you want and you can even duplicate the adress/databus for a better routing. The only disadvantage I see with the use of flash as a configuration memory, is that you need an extra "configuration EPLD", although this can be a very small one. Stefaan "rickman" <spamgoeshere4@yahoo.com> schreef in bericht news:3CFA3688.8E661E69@yahoo.com... > Janusz Raniszewski wrote: > > > > Hi, > > > > > There may also be soft core starting points for this, plus you > > > have reference silicon to compare with :-), and the option of a > > > hard core, with FLASH, which is likely to be more ecconomical > > > than a soft core. > > > > > > -jg > > > > I think in prototypes or small production the soft core is more economical why? > > - Is possibly unification of the device hardware and change it by soft > > - At first point reduce cost of the board design and manufacturing > > - Reduce of the around processor chips obtain cost > > - Is possibly to design a new commands or hardware processor > > - Project of the board is more simplicity because all chips around processor are > > in FPGA structure > > - FPGA chips have more pins than typical hardware processor. It provides to ease > > design of device. > > - Start to work a new device is quickly > > - Is possibly to design a new architecture of device e.g. delta-sigma DAC > > converter which in traditional > > technique require many chips or hard (not flexibility) solution by hard core DAC > > chips > > - Generally soft core grows freedom of the desing process and reduces cost of the > > activate a new device > > > > One disadvantage is difficulty to protect of the project copy. > > > > JanuszR > > You mention cost as two of the reasons for using a soft CPU, but I think > you will find that the cost of a CPU includes a lot of peripherals that > are not digital in nature and therefore have to be added back to an FPGA > implementation. The clock is one, power on reset, ADC/DAC and analog > comparators are others. Flash and large RAM are other things missing > from FPGAs, they have a few kBytes in the small ones at best. The large > ones cost big, big $$$. By the time you have added back the missing > functions, you will likely have added more $$ and size than you would > have saved with the single MCU and a smaller FPGA. > > Of course some of the other issues you have raised are not mitigated by > the MCU. Adding commands is the only one that especially seems to stand > out however. > > The idea that a soft core makes the design process easier is not > expecially valid. Just the fact that you need to design a new C > compiler seems to make this harder. Until the new compiler has been > worked enough to fully debug, it will be a major liability. > > The idea that an FPGA has more pins is not valid if you combine an MCU > with an FPGA of a size that matches the task. Keeping them separate > helps to match the hardware to the task. If I use a soft CPU I may need > to buy a huge FPGA to get enough RAM for my task while only needing a > small amount of logic. > > Starting a design more quickly is a feature of using an MCU. I can > start writing and testing code as soon as I have hardware while an FPGA > design is still messing with P&R. I can then integrate my FPGA design > once I have the basice MCU working. I can even use the MCU as a debug > tool for the FPGA. > > So unless you have a unique design that needs little RAM and nearly no > analog functions, you will do well to use one of the many MCUs on the > market. They have been designed for tight integration of perpherals and > cost effective deployment. They are hard to beat. > > > -- > > Rick "rickman" Collins > > rick.collins@XYarius.com > Ignore the reply address. To email me use the above address with the XY > removed. > > Arius - A Signal Processing Solutions Company > Specializing in DSP and FPGA design URL http://www.arius.com > 4 King Ave 301-682-7772 Voice > Frederick, MD 21701-3110 301-682-7666 FAXArticle: 43802
Eyal Shachrai (eyals@hywire.com) wrote: : does anyone know of an elegant way to divide a number : of 21 bits by 5 ? Eyal, I've outlined the way I do this for various fixed divisors below, I hope it is usefull. I also hope I am not doing this in some silly way... How precise do you need the result to be? Is the divisor fixed at 5? If the divisor is fixed, you are better of thinking of the operation as a 'multiply by 1/5.' rather than a 'divide by 5' If this is the case, you can do the 'divide' as a multiply by 0.2, which you can decompose into a series of bit shifts (not really existant in an FPGA...) and additions. 0.2*X can be aproximated to within ~ 0.02% as x/8 + x/16 + x/128 + x/256 + x/2048 + x/4096. I am doing something similar on data with a highish data rate (~100MHz). If it is hard to meet the timing requirements of your design adding all 6 signals in one colck cycle, the additions can be performed in pairs, pipelined. (how I am doing it) If you have a lower speed reqirement, you could use bit serial arithmetic, which would result in a highly reduced logic usage. This may be a daft question, but are you sure you can't just divide by 4 and compensate somewhere else? Probably not! Also, I note you are using a Virtex-II - this leads me on to a question / idea..... I am working on something where a low latency divider (<10cycles) with variable coeficients (a/b) could be a 'magic bullet' - b is ~ 16 bits wide, a~10. Is there any reason I can't use the embeded multiplier, connected to a BlockRAM acting as a look up table for the coefficients to convert a divider of X into a multiplier of 1/X. I understand this will have serious scaling issues as the number of bits of acuracy required by 'b' rises mind... If you have a situation of a/b, where a has a high data rate, and b changes infrequently, perhaps c=1/b could be generated with bit serial arithmetic, and a fast parallel multiplier used to calculate a*c. Comments appreciated! Regards, Chris SaunterArticle: 43803
I don't know if there's an elegant solution, because you need to multiply by 1/5, which is a number that can't be represented exactly in a binary number of finite digits (I don't think). However, 51/256ths is pretty close: 0.1992. When this is expressed in binary, it is 0.0110011, which has only four ones so you can do the multiplication with four shifted adds (if you don't have a multiplier). So the optimized code would look something like: wire [20:0] in; // number to be divided wire [20:0] result = (in + (in<<1) + (in<<4) + (in<<5)) >> 8; The right shift is to get the radix point in the correct place. As an example, we'll try 21: 21 + 42 + 336 + 672 >> 8 = 1071 >> 8 = 4.18. Since we made the result an integer, the result would be rounded and only the '4' would be retained. You can keep some fraction if you like. You can pipeline the adders too, for more speed. -Kevin "Eyal Shachrai" <eyals@hywire.com> wrote in message news:70029bf5.0206030753.79ed5416@posting.google.com... > does anyone know of an elegant way to divide a number > of 21 bits by 5 ? > please note that I'm using xilinx's virtex 2 > and mentor's leonardo for synthesys.Article: 43804
"Michael Boehnel" <boehnel@iti.tu-graz.ac.at> schrieb im Newsbeitrag news:3CFB4771.F64A2370@iti.tu-graz.ac.at... > Hello! > > Is it possible to kill (thermically destroy) an FPGA by a highly > optimized design (hand-placed; high-density; litte unrelated logic) ;-)))) Nice phrase. (My design is too good for the technology nowadays ) If you have a good (optimized) design, wouldnt it dissipate LESS power?? > assuming that interface lines are OK/room temperature? > > Did anybody observe such a behavior? Hmm, no. The IOs of FPGAs are really though guys, even a short for hours doest damage them too much, I heard. But for a medium sized (lets say 200k gates) FPGA, its hard to overheat them with a normal design, unless you turn them into a 10.000 stage shift register and clock them with 200 MHz. I did this with a Spartan-II 100, draws ~2.7 W, gets real hot in a PQ208 but doesnt melt (at least not after 30s of my testing) With the big guys (1M gates++), there are good chances to fry the FPGA, since power density is much bigger. -- MfG FalkArticle: 43805
"harkirat" <i1073@yahoo.com> schrieb im Newsbeitrag news:e3e8e2b7.0206021520.4592ebc3@posting.google.com... > Hi All:) > Can anyone tell me how i can get the B5 Spartan 2 > board(http://www.burched.com.au/B5Spartan2.html)to communicate with a > 68HC11 motorola EVB11 board that is to say what would be needed to > interface the two so that they can communicate to each other There are many ways to rome. What do you mean with communication? Via SPI? So you need a SPI target inside your FPGA. Or memory mapped onto the processor bus? So you have to connect the Adress/Data Bus plus the control lines (RD/WR etc) -- MfG FalkArticle: 43806
"John Williams" <j2.williams@qut.edu.au> schrieb im Newsbeitrag news:3CFB0167.C3CF2E15@qut.edu.au... > Hi, > > I'm experimenting with some pipeline architectures to speed up some of > my designs. I'm targetting a Virtex 300K (speed grade 4) using XST > under ISE4.2i. > > For a test, I created a 4 stage pipeline that does nothing, just passes > data from the inputs, through 4 registers, then off to the outputs, all > synched on a common clock. There is also a ready output that goes high > once the pipeline is full, which would be used like a clock enable > signal for a downstream module. > > I synthesised targetting maximum speed, and am doing post-P&R sims to > see just how fast I could clock this thing. What I found is that at low > clock speeds (<20 MHz), it behaves as expected. However, if I sim it Hmm, 20 Mhz is no problem, you can have tons of logic levels. What does P&R say? Have you entered a timing constraint for your clock in the UCF? NET my_clock period ns; Then after P&R, you will see how fast yur design can run. > at, say, 50MHz, there's no output for the first 10 cycles then all of a > sudden it starts working, with the appropriate delay. This is rather > disconcerting, I would have thought either it works or it doesn't. The > core VHDL code is attached - any tips appreciated. Also, any Be carefull, in a post P&R simulation, there is a gloabl reset signal, that helds a logic in a reset state. It takes somewhere 200ns (dont know how much excatly, but less than a microsecond) to release the reset signal. -- MfG FalkArticle: 43807
Kevin, I think at 21 bits resolution, this is a really big error (0.02...)......but...... I thought of using Newton's method, whereby you initially "guess" by shifting the 21 bit number to the right (divide by four), and then subtract another shifted value (divide by 8). (Your first guess could be just 1/4 the original value). The first guess is then multiplied by 5 (easy to do, shift by two to the left (X4) and add to the original guess). If it is larger, you shift the guess by two right (divide) and add or subtract from the orgginal guess (save the new add/sub value), and repeat the mult by 5. Also save the running corrected guess (which at the end is the answer). At each compare, you continue to divide the add/sub value by 2, getting closer to the final answer. At each successive compare, you are converging on the solution. After 20 cycles, you have converged on the answer to the required 21 bits of resolution. I think this is 21 cycles, each cycle is a two shifts, add, compare, two shifts, then add or subtract. If each operation takes a clock cycle, that is 7*21 clocks. I am sure with some pipelining it can be done in less. Austin Kevin Neilson wrote: > I don't know if there's an elegant solution, because you need to multiply by > 1/5, which is a number that can't be represented exactly in a binary number > of finite digits (I don't think). However, 51/256ths is pretty close: > 0.1992. When this is expressed in binary, it is 0.0110011, which has only > four ones so you can do the multiplication with four shifted adds (if you > don't have a multiplier). > > So the optimized code would look something like: > > wire [20:0] in; // number to be divided > wire [20:0] result = (in + (in<<1) + (in<<4) + (in<<5)) >> 8; > > The right shift is to get the radix point in the correct place. As an > example, we'll try 21: > 21 + 42 + 336 + 672 >> 8 > = 1071 >> 8 = 4.18. > > Since we made the result an integer, the result would be rounded and only > the '4' would be retained. You can keep some fraction if you like. You can > pipeline the adders too, for more speed. > > -Kevin > > "Eyal Shachrai" <eyals@hywire.com> wrote in message > news:70029bf5.0206030753.79ed5416@posting.google.com... > > does anyone know of an elegant way to divide a number > > of 21 bits by 5 ? > > please note that I'm using xilinx's virtex 2 > > and mentor's leonardo for synthesys.Article: 43808
hi John, see some notes in the design below "John Williams" <j2.williams@qut.edu.au> wrote in message news:3CFB0167.C3CF2E15@qut.edu.au... > Hi, > > I'm experimenting with some pipeline architectures to speed up some of > my designs. I'm targetting a Virtex 300K (speed grade 4) using XST > under ISE4.2i. > > For a test, I created a 4 stage pipeline that does nothing, just passes > data from the inputs, through 4 registers, then off to the outputs, all > synched on a common clock. There is also a ready output that goes high > once the pipeline is full, which would be used like a clock enable > signal for a downstream module. > > I synthesised targetting maximum speed, and am doing post-P&R sims to > see just how fast I could clock this thing. What I found is that at low > clock speeds (<20 MHz), it behaves as expected. However, if I sim it > at, say, 50MHz, there's no output for the first 10 cycles then all of a > sudden it starts working, with the appropriate delay. This is rather > disconcerting, I would have thought either it works or it doesn't. The > core VHDL code is attached - any tips appreciated. Also, any > recommended resources (online or print) for learning the nitty gritty of > pipelined architecture design? > > Cheers, > > John > > entity pipeline is > Port (clk : in std_logic; > reset : in std_logic; > data_in : in std_logic_vector(7 downto 0); > data_out : out std_logic_vector(7 downto 0); > rdy : out std_logic); > end pipeline; > > architecture Behavioral of pipeline is > > signal r1,r1_next : std_logic_vector(7 downto 0); just to tidy up: you dont use R1_next so erase it. > signal r2,r2_next : std_logic_vector(7 downto 0); > signal r3,r3_next : std_logic_vector(7 downto 0); > signal r4,r4_next : std_logic_vector(7 downto 0); you dont use r4 either > signal counter,counter_next: integer range 0 to 5; and you dont use counter next > > begin > > -- connections between registers > r2_next <= r1; > r3_next <= r2; > r4_next <= r3; > > -- register process > process(clk,reset) > begin > if(reset='1') then > r1 <= (others => '0'); > r2 <= (others => '0'); > r3 <= (others => '0'); > data_out <= (others => '0'); > elsif(clk'event and clk='1') then > r1 <= data_in; > r2 <= r2_next; > r3 <= r3_next; > data_out <= r4_next; i'd rewrite this r1 <= data_in; r2 <= r1; r3 <= r2; data_out <= r3; > end if; > end process; > > -- counter update process > process(clk,reset) > begin > if(reset='1') then > counter <= 0; you have forgotten the reset of rdy. > elsif(clk'event and clk='1') then > if(counter<3) then > counter <= counter+1; > rdy <= '0'; > else > counter <= 3; your value of counter cannot be moved once it reaches 3. is this right? it might make some kind of feed back mux here, because you have a combinational loop with counter. if you are using asynchronous reset, you maybe better off renaming this (i.e. add a port input reset_pipeline) else you could get a combinational reset across your device (not good). > rdy <= '1'; > end if; > end if; > end process; > > end Behavioral; I assume that you are using modelsim as you use ISE4.2, it can be a pain to do a post p&r sim: make sure its taking the correct version of the code!!! in most cases a manual compile and load helps this Hope that helps -- Benjamin Todd European Organisation for Particle Physics SL SPS/LHC -- Control -- Timing Division CERN, Geneva, Switzerland, CH-1211 Building 864 Room 1 - A24Article: 43809
Are you just looking for a USB download cable (e.g. Xilinx Multilinx), or an FPGA protoboard that has a USB connector on it? "Kyle Davis" <kyledavis@nowhere.com> wrote in message news:<g%gK8.3805$Zd.261084368@newssvr13.news.prodigy.com>... > Hi folks, > I am looking for FPGA board that use USB port or IEEE 1394 (Firewire) for > downloading to the chip. My notebook only comes with USB and IEEE1394 port > so using FPGA board that only use parallel or serial port won't work! > > Thanks in advance!Article: 43810
What was the expected operating frequency as calculated by the tool? John Williams <j2.williams@qut.edu.au> wrote in message news:<3CFB0167.C3CF2E15@qut.edu.au>... > Hi, > > I'm experimenting with some pipeline architectures to speed up some of > my designs. I'm targetting a Virtex 300K (speed grade 4) using XST > under ISE4.2i. > > For a test, I created a 4 stage pipeline that does nothing, just passes > data from the inputs, through 4 registers, then off to the outputs, all > synched on a common clock. There is also a ready output that goes high > once the pipeline is full, which would be used like a clock enable > signal for a downstream module. > > I synthesised targetting maximum speed, and am doing post-P&R sims to > see just how fast I could clock this thing. What I found is that at low > clock speeds (<20 MHz), it behaves as expected. However, if I sim it > at, say, 50MHz, there's no output for the first 10 cycles then all of a > sudden it starts working, with the appropriate delay. This is rather > disconcerting, I would have thought either it works or it doesn't. The > core VHDL code is attached - any tips appreciated. Also, any > recommended resources (online or print) for learning the nitty gritty of > pipelined architecture design? > > Cheers, > > John > > entity pipeline is > Port (clk : in std_logic; > reset : in std_logic; > data_in : in std_logic_vector(7 downto 0); > data_out : out std_logic_vector(7 downto 0); > rdy : out std_logic); > end pipeline; > > architecture Behavioral of pipeline is > > signal r1,r1_next : std_logic_vector(7 downto 0); > signal r2,r2_next : std_logic_vector(7 downto 0); > signal r3,r3_next : std_logic_vector(7 downto 0); > signal r4,r4_next : std_logic_vector(7 downto 0); > > signal counter,counter_next: integer range 0 to 5; > > begin > > -- connections between registers > r2_next <= r1; > r3_next <= r2; > r4_next <= r3; > > -- register process > process(clk,reset) > begin > if(reset='1') then > r1 <= (others => '0'); > r2 <= (others => '0'); > r3 <= (others => '0'); > data_out <= (others => '0'); > elsif(clk'event and clk='1') then > r1 <= data_in; > r2 <= r2_next; > r3 <= r3_next; > data_out <= r4_next; > end if; > end process; > > -- counter update process > process(clk,reset) > begin > if(reset='1') then > counter <= 0; > elsif(clk'event and clk='1') then > if(counter<3) then > counter <= counter+1; > rdy <= '0'; > else > counter <= 3; > rdy <= '1'; > end if; > end if; > end process; > > end Behavioral;Article: 43811
In article <adgdkh$114bvl$1@ID-84877.news.dfncis.de>, Falk Brunner <Falk.Brunner@gmx.de> wrote: >> Is it possible to kill (thermically destroy) an FPGA by a highly >> optimized design (hand-placed; high-density; litte unrelated logic) > >;-)))) Nice phrase. (My design is too good for the technology nowadays ) >If you have a good (optimized) design, wouldnt it dissipate LESS power?? Not if, in the process of hand optimization, your really bump up the clock. But even then, it's pretty hard to get above 110 MHz on the .22 uM parts, 160-170 MHz on the .18uM parts. >> assuming that interface lines are OK/room temperature? >Hmm, no. The IOs of FPGAs are really though guys, even a short for hours >doest damage them too much, I heard. >But for a medium sized (lets say 200k gates) FPGA, its hard to overheat them >with a normal design, unless you turn them into a 10.000 stage shift >register and clock them with 200 MHz. I did this with a Spartan-II 100, >draws ~2.7 W, gets real hot in a PQ208 but doesnt melt (at least not after >30s of my testing) Well, SII-100 is "Small" theses days. >With the big guys (1M gates++), there are good chances to fry the FPGA, >since power density is much bigger. It's pretty easy to disipate ~1.5W or so with a ~1000 slice design, so on a bigger FPGA, 20W+ should be straightforward to draw, which WILL require some significant cooling to keep from melting the chip. -- Nicholas C. Weaver nweaver@cs.berkeley.eduArticle: 43812
Was the CF card at 5V and the FPGA at 3.3V? jetmarc@hotmail.com (jetmarc) wrote in message news:<af3f5bb5.0206011953.44d7423a@posting.google.com>... > Hi. I'm prototyping a new design with AT40K FPGA and > CompactFlash memory card. The CompactFlash is being > talked to in the "common memory mode", meaning that > the data is not addressed with ADDRx lines but read by > consecutive read cycles to one single address. > > The CompactFlash card has an internal address pointer > that increments after each read cycle. > > My circuit doesn't work. After reading 512 bytes I > find that some are missing, and the buffer is padded > with dummy bytes. > > Obviously the CompactFlash card has taken a few of > the read cycles for two (double-clocked on the falling > /OE edge), so that its internal address has reached > the end while my FPGA was still reading more data. > > I spent hours already, trying to remove this problem. > But no avail. I tried all software options that the > AT40K gives (io pin slew rate, pullup/pulldown). I > added external RC filters on the /OE, /CS, /WE pins > (making the problem worse). I inserted schmitt trigger > '244 bus drivers in the signal path. > > On the scope, the signals look OK, but it is only a > 5ns/200MHz model. > > The problem only occurs when there is a lot of change > on the 8bit data bus. Reading a sector with all 0x00 > or all 0xff is possible (without errors). The data > signals are far away from the clock signals (/OE, /CS, > /WE) both on the (short) cable and the FPGA pinout. > > The same card (with same cable) has previously worked > fine on another prototype. That prototype used 5.0v > (while the new one uses 3.3V) and connected the CF card > to an ATmega microcontroller (while the new one has an > AT40K FPGA). > > I believe that the AT40K IO pin driver generates noisy > signals, at least more noisy than the ATmega. I don't > know how to fix that. Schmitt trigger buffers didn't help. > > Do you have an idea what I can try to fix it? > > MarcArticle: 43813
jerry1111 wrote: > > > Starting a design more quickly is a feature of using an MCU. I can > > start writing and testing code as soon as I have hardware while an FPGA > > design is still messing with P&R. I can then integrate my FPGA design > > once I have the basice MCU working. I can even use the MCU as a debug > > tool for the FPGA. > > > > So unless you have a unique design that needs little RAM and nearly no > > analog functions, you will do well to use one of the many MCUs on the > > market. They have been designed for tight integration of perpherals and > > cost effective deployment. They are hard to beat. > > But, when you develop one device, after a month you must > develop other device, you can use the same board with the same > uC (I assume that you invest some time and/or money to make own > board with CPU-able FPGA with at least flash, ram and reset). > My work is (generally) developing controlling systems which are producted > in small quantities & I save much, much time by making ONE uC board for > almost all designs (it's small, about 6x6cm). > > This is the point, where savings can be done. > I know, I could make a board with uC and FPGA, but it still needs more than > one piece of silicon. It's simpler to keep all the logic in one reconfigurable device > (when I made the board, I've forgotten where is my soldering tool ;)) I solder i Quartus ;)) > > jerry I just don't follow your point. Either way, it is multiple chips and it is one design. With the soft CPU you have an FPGA, Flash and likely RAM along with various analog chips such as power on reset, etc. With an MCU the Flash and RAM are in the chip unless you need huge amounts of RAM. The MCU also comes complete with full power on/off reset (brown out) ADC/DAC and other functions as well. Where is the advantage of the soft CPU other than being able to change the CPU instruction set (which is VERY hard to support in the compiler, which is where this thread started... "how do I get a custom C compiler?"). -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAXArticle: 43814
Hi, Interesting discussion, a couple of points: Prager Roman <rprager@frequentis.com> wrote in message news:<ad74mv$leh$1@frqvie15ux.frequentis.frq>... > > If you add some PIOs (especially with interrupt ability) you soon end up > with 2000 LEs. However, if you also want to use internal RAM/ ROM, nearly > all of the APEX is used for NIOS. > On a recent board I have an APEX 20K400E in the 672pin- BGA- package, but > I do not have enough pins left for SRAM and Flash for NIOS. Thats why I > decided to use the system flash where the APEX image is stored and use > internal flash meory and RAM for Nios. But then it gets very large ( about > 7000 LEs). The apex chips are really scant when it comes to memory, but the memory that is there is of a much higher speed compared to traditional processor instruction memory... running anything beyond a basic software application out of FPGA memory can be likened to running a car on rocket fuel :) Since there isn't much of it (1 LE = 1 register, and then there is the embedded memory arrays, ESBs I think they call them), memory on an apex chip for storing code is very expensive - better to find a way to scrape up the IOs for a single tri-state bus interface! The literature I've read says that the newer architecture (starix) has much more embedded memory (~10x more than an apex chip of the same density). > > You can use the NIOS to write the new image. There should be a description > in the Nios documentation how to write such a routine, or maybe there is > already some example included. > Check out the GERMS monitor.. yes, the default boot monitor included with the kit. The source code will be under your project's SDK directory\lib folder, called "nios_germs_monitor.s". THis moniror program can read in the motorola s-record standard (usually to download volatile code into SRAM for debugging & testing). When you make a ".flash" image, the s-rec you're working with has germs monitor commands appended to it (to erase & program flash). Check it out! - JesseArticle: 43815
Sometimes it IS neccessary to run post place and route timing simulatuion to verify the functionality. One of my customers requires me to use some auto-generated VHDL which I am not allowed to change. This code contains a multi-cycle path (as may other code originally designed for ASIC). This path requires upto 8 clock cycles from input to output. The post P&R sim is required to verify that the data strobe is delayed in its pipeline by more than the data is delayed in the multi-cycle path. (Sim without timing would indicate correct operation even if there was only one cycle delay on the data strobe, which in reality is incorrect). Regards, Ken Morrow, Morrow Electronics Limited, Milton Keynes, UK. Allan Herriman <allan_herriman.hates.spam@agilent.com> wrote in message news:3cfb78c0.35718610@netnews.agilent.com... > On Mon, 03 Jun 2002 02:46:57 GMT, Ray Andraka <ray@andraka.com> wrote: > > >Main reason is just that you have the mapped output without going through the > >Xilinx tools too. A functional sim using the timing output is about the same > >simulation time from what I have seen (I don't often go either place). > > Thanks Ray, I thought they'd be about the same speed. > > Our standard build scripts generate the post-PAR VHDL, so that's why > I've only ever used that. > > Regards, > Allan. > > >Allan Herriman wrote: > > > >> On Sat, 1 Jun 2002 17:18:03 +0000 (UTC), nweaver@CSUA.Berkeley.EDU > >> (Nicholas Weaver) wrote: > >> > >> >In article <3cf8fc35.13519329@netnews.agilent.com>, > >> >Allan Herriman <allan_herriman.hates.spam@agilent.com> wrote: > >> >>>Ray Andraka proposed to do post-mapping simulation to veryfy the sythesis > >> >>>result, since a post-maping is (much?) faster than post P&R. > >> >> > >> >>Why is it faster? The simprim blocks take most of the simulation > >> >>time, and these will be the same post map and post PAR. > >> >>(Note that you *don't* have to load the SDF if you are doing a post > >> >>PAR functional simulation.) > >> > > >> >I think because you end up ignoring all the timing info (which ends up > >> >being pretty substantial post-routing. > >> > >> I thought it was possible to ignore all timing in the post-PAR VHDL. > >> You don't have to load the SDF and (in Modelsim) you can use the > >> +notimingchecks command line option to turn off the VITAL timing. > >> > >> I haven't ever done a post-map sim to know if there's a difference or > >> not. Does anyone have any quantitative results? > >> > >> Allan. > > > >-- > >--Ray Andraka, P.E. > >President, the Andraka Consulting Group, Inc. > >401/884-7930 Fax 401/884-7950 > >email ray@andraka.com > >http://www.andraka.com > > > > "They that give up essential liberty to obtain a little > > temporary safety deserve neither liberty nor safety." > > -Benjamin Franklin, 1759 > > > > >Article: 43816
You can do serial division if you can spend a few cycles getting your answer. If you need the results pipelines, multiply by 1/5. Since the fraction is a little more than 20'h33333 / 21'h100000, you can do some addition stages. To account for the rounding problems 1) if truncating the accumulator and 2) for the fraction resolution you need to make some adjustments. You can play with the number to figure out a "best fit". Since you probably need at least 18 bits you'd need a minimum of 3 additions making a 32 bit equivalent fraction just as easy to implement. I've had trouble getting a balance between truncating the adders mid-stream and rounding but the following scheme appears to work. In Verilog, some of the elements might look like: input [20:0] in; reg [22:0] x3; reg [26:0] x33; reg [21:0] x3333; reg [18:0] result; always @(clk) begin x3 <= (in<<1) + in; x33 <= (x3<<4) + x3; x3333 <= (x33 + (x33>>8)) >> 13; result <= (x3333 + (x3333>>16) + 1) >> 3; end Eyal Shachrai wrote: > does anyone know of an elegant way to divide a number > of 21 bits by 5 ? > please note that I'm using xilinx's virtex 2 > and mentor's leonardo for synthesys.Article: 43817
Jake, The 3.1i ALdec Schematics are not compatible with the <br> ISE version of 4.1i/4.2i. There is a Schematic Editor <br> that comes with the ISE tool. 4.1i/4.2i also comes with <br> StateCAD (for state machines), HDL Bencher (an easy <br> way to make HDL testbenches), and a whole other array <br> of new and update tools (i.e., iMPACT instead of JTAG <br> Programmer). The 4.1i/4.2i tools and upgrades will be needed to <br> target the newer technologies as they become available. <br> For more information and pricing, Xilinx telesales can be <br> reached at: 1-800-888-FPGA (3742) or email fpga@xilinx.com. <br> If you would like to take a tour of the new features in 4.1i/4.2i <br> the following website has information on them and also links <br> to more in-depth information: <br> http://www.xilinx.com/ise/products/foundation_config.htm <br> Regards, BrianArticle: 43818
Brian Philofsky wrote: > > The Xilinx VHDL timing netlists have a global set/reset signal that is > defaultly held for 100 ns. That means for the first 100 ns, the entire > design is held in a reset state and therefore does not do anything. Hold off > inputing data (you can continue running the clock, just no data) until at > least 100 ns and I'll bet it will work as you expect. Thanks Brian, you were spot on. A followup question, related to the setup/hold time comments some folks made: In the Clock Contraints dialog in HDL bencher you can specifiy the clock period, and also setup/hold times. I've currently got the clock high and low periods at 10 ns each (thus 50MHz clock), and the input setup time / output valid delay I've put 5 ns (for want of any better idea). During post P&R sim I get warnings like: # ** Warning: */X_SUH SETUP High VIOLATION ON I WITH RESPECT TO CLK; # Expected := 5.426 ns; Observed := 5 ns; At : 145 ns # Time: 145 ns Iteration: 2 Instance: /testbench/uut/gsuh_en_clk It seems that this warning is related to the setup/hold times I've specified - any advice on what causes this and how I can fix up my testbench to more accurately reflect "reality"? Thanks again to all who responded to my original question. Regards, JohnArticle: 43819
Hi:) Im trying to implement a Genetic algorithm on the FPGA board which will do control computations based on parameters fed to it by the 68HC11EVB(which inturn takes the input from a motor which is the motor speed) via the serial interface and the sends the control signal back to the EVB(and hence to the motor) The FPGA board doesnt have any facility for doing this.I was wondering what components i need to make it communicate with the EVB via its serial interface I found this peripheral connector http://www.burched.com.au/B5PeripheralConnectors.html on the FPGA manufacturers website.Im not too familiar with electronics im a mechanical guy..:o)Could you tell me if that would do the trick? Thanx a heap:) Warm Regards Harkirat "Falk Brunner" <Falk.Brunner@gmx.de> wrote in message news:<adgdkk$114bvl$2@ID-84877.news.dfncis.de>... > "harkirat" <i1073@yahoo.com> schrieb im Newsbeitrag > news:e3e8e2b7.0206021520.4592ebc3@posting.google.com... > > Hi All:) > > Can anyone tell me how i can get the B5 Spartan 2 > > board(http://www.burched.com.au/B5Spartan2.html)to communicate with a > > 68HC11 motorola EVB11 board that is to say what would be needed to > > interface the two so that they can communicate to each other > > There are many ways to rome. What do you mean with communication? > Via SPI? So you need a SPI target inside your FPGA. > Or memory mapped onto the processor bus? So you have to connect the > Adress/Data Bus plus the control lines (RD/WR etc)Article: 43820
Without multiplying per se, consider that to 32 bits, 1/5=858993459/2^32 (rounded). The numerator is 3 * 17 * 257 * 65537. Each of these factors has two non-zero bits. So you get: n := n + (n<<1); n := n + (n<<4); n := n + (n<<8); n := n + (n<<16); return n>>32; Of course, rounding and moving the final shift to among the prior four can make for a smaller implimentation. But that's essentially 4 pipelined adders. Jim HornArticle: 43821
> I hadn't found this anywhere (forgive the redundancy if someone else > already did), so I put together a port of the NIOS GNU Pro environment > for Linux. > > There are three files available: > > http://www.cryptoapps.com/~chris/NIOS/nios-linux-src.tar.bz2 Hi, I can't compile the SDK in Cygwin environment. "configure" command is correct, but "make install" error reported. Have anybody compiled the SDK? JanuszRArticle: 43822
A slightly simpler approach may be to notice that 1/5 = 3/16 + 3/256 + 3/4096 + ... 3/(2^(4n))... = .0011001100110011001100110011001100110011.... ---- -------- ---------------- -------------------------------- ---------------------------------------------------------------- First approximation is 3/16, a shift, add, and a shift. Take result, shift down by four and add again, this gives 8 bits precision. Take result, shift down by eight, add again, 16 bits of precision. Take result, shift down by sixteen, add again, 32 bits of precision. Thus Four adds => 32-bits of precision Five adds => 64 bits ... This is a general technique that can be used for _any_ _fixed_ divisor, as there is always a repeating pattern... 1/5 is nice illustrative case, 1/3 and 1/7 also work well. HTH Austin Lesea <austin.lesea@xilinx.com> wrote in message news:<3CFBB6A3.59598A79@xilinx.com>... > Kevin, > > I think at 21 bits resolution, this is a really big error > (0.02...)......but...... > > I thought of using Newton's method, whereby you initially "guess" by shifting > the 21 bit number to the right (divide by four), and then subtract another > shifted value (divide by 8). > > (Your first guess could be just 1/4 the original value). > > The first guess is then multiplied by 5 (easy to do, shift by two to the left > (X4) and add to the original guess). If it is larger, you shift the guess by > two right (divide) and add or subtract from the orgginal guess (save the new > add/sub value), and repeat the mult by 5. Also save the running corrected guess > (which at the end is the answer). > > At each compare, you continue to divide the add/sub value by 2, getting closer > to the final answer. > > At each successive compare, you are converging on the solution. After 20 > cycles, you have converged on the answer to the required 21 bits of resolution. > > I think this is 21 cycles, each cycle is a two shifts, add, compare, two shifts, > then add or subtract. If each operation takes a clock cycle, that is 7*21 > clocks. I am sure with some pipelining it can be done in less. > > Austin > > Kevin Neilson wrote: > > > I don't know if there's an elegant solution, because you need to multiply by > > 1/5, which is a number that can't be represented exactly in a binary number > > of finite digits (I don't think). However, 51/256ths is pretty close: > > 0.1992. When this is expressed in binary, it is 0.0110011, which has only > > four ones so you can do the multiplication with four shifted adds (if you > > don't have a multiplier). > > > > So the optimized code would look something like: > > > > wire [20:0] in; // number to be divided > > wire [20:0] result = (in + (in<<1) + (in<<4) + (in<<5)) >> 8; > > > > The right shift is to get the radix point in the correct place. As an > > example, we'll try 21: > > 21 + 42 + 336 + 672 >> 8 > > = 1071 >> 8 = 4.18. > > > > Since we made the result an integer, the result would be rounded and only > > the '4' would be retained. You can keep some fraction if you like. You can > > pipeline the adders too, for more speed. > > > > -Kevin > > > > "Eyal Shachrai" <eyals@hywire.com> wrote in message > > news:70029bf5.0206030753.79ed5416@posting.google.com... > > > does anyone know of an elegant way to divide a number > > > of 21 bits by 5 ? > > > please note that I'm using xilinx's virtex 2 > > > and mentor's leonardo for synthesys.Article: 43823
That's pretty smooth. "John" <john.l.smith@titan.com> wrote in message news:5b9931fd.0206031557.3502685b@posting.google.com... > A slightly simpler approach may be to notice that > > 1/5 = 3/16 + 3/256 + 3/4096 + ... 3/(2^(4n))... > > = .0011001100110011001100110011001100110011.... > ---- > -------- > ---------------- > -------------------------------- > ---------------------------------------------------------------- > > First approximation is 3/16, a shift, add, and a shift. > > Take result, shift down by four and add again, this gives 8 bits precision. > Take result, shift down by eight, add again, 16 bits of precision. > Take result, shift down by sixteen, add again, 32 bits of precision. > > Thus Four adds => 32-bits of precision > > Five adds => 64 bits ... > > This is a general technique that can be used for _any_ > _fixed_ divisor, as there is always a repeating pattern... > 1/5 is nice illustrative case, 1/3 and 1/7 also work well. > HTH > > > Austin Lesea <austin.lesea@xilinx.com> wrote in message news:<3CFBB6A3.59598A79@xilinx.com>... > > Kevin, > > > > I think at 21 bits resolution, this is a really big error > > (0.02...)......but...... > > > > I thought of using Newton's method, whereby you initially "guess" by shifting > > the 21 bit number to the right (divide by four), and then subtract another > > shifted value (divide by 8). > > > > (Your first guess could be just 1/4 the original value). > > > > The first guess is then multiplied by 5 (easy to do, shift by two to the left > > (X4) and add to the original guess). If it is larger, you shift the guess by > > two right (divide) and add or subtract from the orgginal guess (save the new > > add/sub value), and repeat the mult by 5. Also save the running corrected guess > > (which at the end is the answer). > > > > At each compare, you continue to divide the add/sub value by 2, getting closer > > to the final answer. > > > > At each successive compare, you are converging on the solution. After 20 > > cycles, you have converged on the answer to the required 21 bits of resolution. > > > > I think this is 21 cycles, each cycle is a two shifts, add, compare, two shifts, > > then add or subtract. If each operation takes a clock cycle, that is 7*21 > > clocks. I am sure with some pipelining it can be done in less. > > > > Austin > > > > Kevin Neilson wrote: > > > > > I don't know if there's an elegant solution, because you need to multiply by > > > 1/5, which is a number that can't be represented exactly in a binary number > > > of finite digits (I don't think). However, 51/256ths is pretty close: > > > 0.1992. When this is expressed in binary, it is 0.0110011, which has only > > > four ones so you can do the multiplication with four shifted adds (if you > > > don't have a multiplier). > > > > > > So the optimized code would look something like: > > > > > > wire [20:0] in; // number to be divided > > > wire [20:0] result = (in + (in<<1) + (in<<4) + (in<<5)) >> 8; > > > > > > The right shift is to get the radix point in the correct place. As an > > > example, we'll try 21: > > > 21 + 42 + 336 + 672 >> 8 > > > = 1071 >> 8 = 4.18. > > > > > > Since we made the result an integer, the result would be rounded and only > > > the '4' would be retained. You can keep some fraction if you like. You can > > > pipeline the adders too, for more speed. > > > > > > -Kevin > > > > > > "Eyal Shachrai" <eyals@hywire.com> wrote in message > > > news:70029bf5.0206030753.79ed5416@posting.google.com... > > > > does anyone know of an elegant way to divide a number > > > > of 21 bits by 5 ? > > > > please note that I'm using xilinx's virtex 2 > > > > and mentor's leonardo for synthesys.Article: 43824
Falk Brunner (Falk.Brunner@gmx.de) wrote: : Hmm, no. The IOs of FPGAs are really though guys, even a short for hours : doest damage them too much, I heard. : But for a medium sized (lets say 200k gates) FPGA, its hard to overheat them : with a normal design, unless you turn them into a 10.000 stage shift : register and clock them with 200 MHz. I did this with a Spartan-II 100, : draws ~2.7 W, gets real hot in a PQ208 but doesnt melt (at least not after : 30s of my testing) : With the big guys (1M gates++), there are good chances to fry the FPGA, : since power density is much bigger. I have it on good authority that connecting a 3.3 volt 1M gate part to a flakey connector that shorts I/O pads to 5 volts WILL destroy the FPGA. And in the true interest of science the folks that performed that little experiment then verified that it was reproduceable. John Eaton
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z