Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
I've written a small program for configuration in serial mode (but not JTAG) with the printer port some time ago. If you're interested drop me a note. Martin "Juha Pajunen" <juha.pajunen@bitboys.com> schrieb im Newsbeitrag news:4b980638.0207190558.37890b2d@posting.google.com... > Hi All, > > I am planning to do my own stand alome software > that programs Altera APEX via ByteBlasterMV > with *.SOF and *.POF (FlexChain and JTAG) > files, so I do not need Altera QuartusII > for sending data to device. (Can use my HW design > w/o huge QuartusII software...) > > So, I have been looking all over WWW to find out > information about how does Quartus do programming > and how those those signals acts on ByteBlasterMV > cable. > > Can you help me where to find some kind of timing > waveform / wavediagram where I can start to learn > ByteBlasterMV "protocol? > > Is it possible to do that kind of program...? > > If there is existing softwares I am also intrested > in those *.EXE files. > > Thank You vert much and have a nice weekend :-) > > Sincerely, > Juha Pajunen, Hw EngineerArticle: 45576
Interesting idea. IIRC, the Altera cascade chain was inferred by Synplicity pretty well if we used predecoded enables. I am curious about the rotator you mention. You said that you could implement a 64 bit rotator in 192 slices (384 LUTs?) with a standard method and 66 slices (132 LUTs) with an optimal technique. I can only picture a N/2 x log(N) array of 2:1 muxes where N is 64 bits. This gives 256 LUTs which is neither of your answers. Even if you find a way to use an extra embedded 2:1 mux in the slice, that would only bring it down to about 171 LUTs and would not change the architecture at 8 levels of logic. Care to share your techniques, both the large and the small one? John_H wrote: > > The carry chain in the Xilinx part can do the same thing as the Altera cascade > chain if I recall correctly. If the Xilinx MUXCY element passes a 1 on the carry > and a zero when the LUT result is false, you get a wide AND cascade chain. Wide > word muxes can still take N/2 LUTs in the Xilinx architecture independent of which > method you use. The cascade chain would probably need a manual instantiaton in > Xilinx, possibly in Altera. A 4-1 mux ends up being the same in either > architecture, really: 2 LUTs. The rotator I was talking about ends up beating > out the cascade approach significantly in either architecture. > > rickman wrote: > > > I am reaching back now, but I seem to remember that when it came to > > implementing muxes the Altera parts (maybe only the 10K parts) have a > > "cascade" backbone in each group of LEs that allows them to do very fast > > muxes as well as AND-OR or just wide AND type logic. The cascade logic > > is a two input AND (or is it an OR?) gate that combines the cascade > > input with the LUT output. Although the delays are additive, they are > > very short like a carry chain and can frequently beat an equivalent tree > > mux. > > > > But to use the cascade chain for a mux you need to change your logic to > > use decoded enables rather than encoded selects. The number of LEs for > > the mux then becomes N/2 where N is the number of inputs. This can be > > very optimal for wide word muxes where the decoding the enables uses > > much less logic than what is saved in the mux. > > > > I don't remember Synplify doing a great job of synthesis with these > > structures. It may have worked well if you used a particular coding > > style. But otherwise it would only use two LEs instead of the four or > > five that were optimal. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAXArticle: 45577
Hi, This is the add/sub part of the ALU from the risc5x core on opencores, where you can also get the package and the generic vhdl / simulation model. Output is A + B, A -B or A. The trick is to force a one on the carry in when doing a subtract. Logic usage : 1 slice for every 2 bits. Some people argue (rightly so) that this level of code is unreadable. True, it is, but you can build up a library of these things which have been simulated to death (I have simulation models of LUT4, MUXCY etc) and then just use them when you need them. hope this helps, Mike. -- -- Risc5x -- www.OpenCores.Org - November 2001 -- -- -- This library is free software; you can distribute it and/or modify it -- under the terms of the GNU Lesser General Public License as published -- by the Free Software Foundation; either version 2.1 of the License, or -- (at your option) any later version. -- -- This library is distributed in the hope that it will be useful, but -- WITHOUT ANY WARRANTY; without even the implied warranty of -- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. -- See the GNU Lesser General Public License for more details. -- -- A RISC CPU core. -- -- (c) Mike Johnson 2001. All Rights Reserved. -- mikej@<NOSPAM>opencores.org for support or any other issues. -- -- Revision list -- -- version 1.0 initial opencores release -- library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_arith.all; use ieee.std_logic_unsigned.all; -- -- op <= A +/- B or A -- entity ADD_SUB is generic ( WIDTH : in natural := 8 ); port ( A : in std_logic_vector(WIDTH-1 downto 0); B : in std_logic_vector(WIDTH-1 downto 0); ADD_OR_SUB : in std_logic; -- high for DOUT <= A +/- B, low for DOUT <= A DO_SUB : in std_logic; -- high for DOUT <= A - B, low for DOUT <= A + B CARRY_OUT : out std_logic_vector(WIDTH-1 downto 0); DOUT : out std_logic_vector(WIDTH-1 downto 0) ); end; use work.pkg_xilinx_prims.all; library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_arith.all; use ieee.std_logic_unsigned.all; architecture VIRTEX of ADD_SUB is signal lut_op : std_logic_vector(WIDTH-1 downto 0); signal mult_and_op : std_logic_vector(WIDTH-1 downto 0); signal carry : std_logic_vector(WIDTH downto 0); signal op_int : std_logic_vector(WIDTH-1 downto 0); function loc(i : integer) return integer is begin return (((WIDTH+1)/2)-1) - i/2; end loc; begin carry(0) <= DO_SUB; INST : for i in 0 to WIDTH-1 generate attribute RLOC of u_lut : label is "R" & integer'image(loc(i)) & "C0.S1"; attribute RLOC of u_1 : label is "R" & integer'image(loc(i)) & "C0.S1"; attribute RLOC of u_2 : label is "R" & integer'image(loc(i)) & "C0.S1"; attribute RLOC of u_3 : label is "R" & integer'image(loc(i)) & "C0.S1"; attribute INIT of u_lut : label is "C66C"; begin u_lut : LUT4 --pragma translate_off generic map ( INIT => str2slv(u_lut'INIT) ) --pragma translate_on port map ( I0 => ADD_OR_SUB, I1 => A(i), I2 => B(i), I3 => DO_SUB, O => lut_op(i) ); u_1 : MULT_AND port map ( I0 => ADD_OR_SUB, I1 => A(i), LO => mult_and_op(i) ); u_2 : MUXCY port map ( DI => mult_and_op(i), CI => carry(i), S => lut_op(i), O => carry(i+1) ); u_3 : XORCY port map ( LI => lut_op(i), CI => carry(i), O => op_int(i) ); end generate; CARRY_OUT <= carry(WIDTH downto 1); DOUT <= op_int; end Virtex; <SNIP>Article: 45578
Thank you guys for your valuable comments. Dmitri dmitrik@mailandnews.com (Dmitri Katchalov) wrote in message news:<3db7c986.0207250834.7ae051c6@posting.google.com>... > I'm trying to synthesize a simple ALU.Article: 45579
A couple of comments for points that were not fully addressed. Dmitri Katchalov wrote: > > Hi, > > I'm new to FPGA. I'm trying to replicate PIC16Fxxx core as an exersize > (any real programmer should write at least one OS and compiler :) > > I'm trying to synthesize a simple ALU. I'm using VHDL and XST (WebPack). > Target is SpartanIIE. It sortof works but is rather inefficient. > At first I tried a big case statement for all ALU operations. > XST happily infers lots of built-in macros (one for each ALU op) > and a huge output mux. For example it produces 6 carry-chain adders > (one for each ADD, SUB, INC, DEC and another two to get the > half-carry bit for ADD/SUB) where I would think one is enough. > > I've narrowed the problem down to a simple adder/subtractor: > > if add='1' then > Y <= A + B; > else > Y <= A - B; > end if; > > This works fine, produces a single 8-bit adder/subtractor. 4 slices in total. > But this does not give me carry/borrow bit. > > if add='1' then > Y <= ('0' & A) + ('0' & B); > else > Y <= ('0' & A) - ('0' & B); > end if; > > produces 8bit adder with carry-out, a separate 9bit subtractor and > a 9bit 2x1 mux. 9 slices. I tried different variations of the above > with the same results. > > Finally I have come up with the following code. > It uses the fact that A-B = A +(-B) = A + ((not B) + 1). > > variable tmp: integer; > variable cin: std_logic; > > if op = '1' then > tmp := conv_integer(B); > cin := '0'; > else > tmp := conv_integer(not B); > cin := '1'; > end if; > > Y <= conv_std_logic_vector(conv_integer(A) + tmp + conv_integer(cin),9); > > This infers 1 "9bit adder carry in" and 8 2x1 muxes and takes only 4 slices. > Much better. One small detail: if I declare cin as integer instead > of converting it from std_logic at the last step, I'm back to 9 slices. > > Now the questions. > > * Am I on the right track? Yes, but this will be somewhat compilier dependent. > * I'm trying to describe purely combinatorial logic here. The output > is supposed to be the same fixed boolean function of inputs no matter > how it is described. Why such big variations (more than 2 times the area)? > Is this a problem with the tool or they all like that? As you said, "any real programmer should write at least one OS and compiler", try writing code to or even just figuring out how to translate this stuff into hardware. Not so easy. Compliers are simple in comparison. > * Should I be tweaking XST settings instead? Is there a magic setting > like "Do what I mean not what I say" :) No, issues with carry and the like are not easy since different chips deal with them differently. So the compiler needs to be able to map to different architectures. > * Xilinx lib has "8bit adder carry out" but it doesn't seem to have > "8bit subtractor borrow out". Is this right? Don't know, but as you found, an adder and a subtractor are the same thing with inverters on one input and the carries. > * How do I get the half-carry bit out of the 8bit adder? I guess I can > instantiate/infer two separate 4bit adders. Is there a better way? The last time I tried to get a carry out of the middle of a carry chain, I found that the Xilinx architecture does not support that without breaking the carry chain. So it will need to be done with two 4 bit adders, as you say. > * What's the story with IEEE.std_logic.SIGNED vs .UNSIGNED? I heard that > they are are mutually exclusive and math operations produce different > results depending on which one is in use. Webpack automatically inserts > IEEE.STD_LOGIC_UNSIGNED.ALL at the beginning of every VHDL source it > creates. Should I always use UNSIGNED? Both of these libraries are NOT IEEE standards. They are Synopsis proprietary IIRC. So avoid using them and use the "numeric_std" library instead. use IEEE.NUMERIC_STD.all; > * Is there a decent on-line reference for all those IEEE.* libraries? > I've found several good VHDL tutorials but none of them covers > std_logic in details. If you find one, let us all know. Type conversion is the only thing I have trouble with in VHDL. I recently worked with some Verilog people and could not convince them that VHDL was even viable because of all the issues created by strong typing. Verilog is much like C and lets you do anything you want, no matter how stupid or wrong. But then in a year of coding, I only made two mistakes from that and it was the same mistake twice! Sometimes I am a little slow to learn :) -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAXArticle: 45580
"Falk Brunner" <Falk.Brunner@gmx.de> wrote in message news:ahtt4h$vv1v5$1@ID-84877.news.dfncis.de... > "suchitra" <ssbhide@rediffmail.com> schrieb im Newsbeitrag > news:110cc2fe.0207262205.2e37143c@posting.google.com... > > hello all > > i just wanted to know that can 555 be used for cplds as clock input if > > the frequency is very low something like 1 hz. > > Hmm, the datasheet says something about 100..300ns rise/fall time. This is > terrible slow for a FPGA. You should use a 74xx14 (schmitt-trigger) after > the NE555 to get fast edges. An RC oscillator can be implemented with a 74HC14 inverter, and followed with another of the inverters as a buffer. Leon -- Leon Heller, G1HSM leon_heller@hotmail.com http://www.geocities.com/leon_hellerArticle: 45581
First to Jeff: Jerr, I have been working with the opencore PCI device at www.opencores.org. It consists of about 50 Verilog modules and contains both a target and master implementation. It is also a bridge between PCI and something called Wishbone which is a SoC (SystemOnChip) type of internal bus. I have gotten it to synthesize with ISE 4.2 just fine and load into a VirtexE where I am currently reading/writing configuration and memory spaces. It looks like a very reasonable way to implement a PCI target. Dear Kevin: If you dont mind, I would be most appreciative if you would e-mail the Verilog code you described in your last post to me. It would be interesting to compare it with the PCI opencore implementation to see differences in design philosophy. Having two PCI implementations to compare strikes me as very useful in trying to understand this somewhat complicated concept. My e-mail address is cfk@pacbell.net. Charles Krinke "Kevin Brace" <killspam4kevinbraceusenet@killspam4hotmail.com> wrote in message news:ahn2ha$9if$1@newsreader.mailgate.org... > > > Jeff Reeve wrote: > > > > I'm looking for a synthesizeable 32-bit 33MHz PCI Target only design to be > > placed into a FPGA or large CPLD. Minimal implementation is fine. Does > > anybody know if such a thing is available in VHDL or Verilog and is open > > sourced? I seem to recall Xilinx publishing a target only design quite some > > time ago but I can no longer find it on their web site. > > > > Any help is much apprecieated! > > Jeff > > > This is what you are probably talking about. > > ftp://ftp.xilinx.com/pub/applications/pci/ > ftp://ftp.xilinx.com/pub/applications/pci/00_index.htm > > > For some reason, a Verilog version of the reference design is missing, > but if you want it I can E-mail it to you (Some kind, long time Xilinx > user sent it to me.). > I also believe Lattice Semiconductor and Quicklogic also have their own > PCI reference design (I know the Lattice one is written in Verilog, but > not sure about the Quicklogic one.). > However, here is a caveat of using reference designs offered by > device manufacturers. > Even if the design is written in a device independent form (Uses generic > Verilog or VHDL statements, and no vendor specific primitives.), when > using reference designs offered by device manufacturers, you are often > legally required to use the reference designs on their devices. > Opencores.org also has a free PCI IP core, but it is a lot more > complex (Supports initiator and target transfers.) than any of the above > mentioned reference designs, so I feel like you will likely have a hard > time modifying it to suit your own needs. > When modifying a PCI interface, PCI specification Appendix B's > state machine examples and the following article may be helpful. > > http://www.eedesign.com/editorial/1995/fpgafeature9502.html > > > > Kevin Brace (In general, don't respond to me directly, and respond > within the newsgroup.)Article: 45582
Dear Broto: I can definitely tell you that the top.v that comes with the opencore PCI interface will synthesize with all of its sub-modules and load into both a Spartan and a VirtexE as I have done both. I have seen this "Unable to combine" message a couple of times in the last month or so, and it invariably had to do with my defining either two gates trying to drive the same IOB or both a GCK input and a normal IOB input trying to come from the same pin. Go back to the original TOP.v that came with Opencore's PCI interface, synthesize that and then add your changes. Somewhere along the way, the problem will become obvious. Charles > > BROTO Laurent wrote: > > > > Hi! > > I've succed to synthetize opencore PCI IP Core and now I try to do a top > > with this core and another one. > > I can synthetize without problem but when webpack try to map this top, I get > > the following error: > > > > ERROR:Pack:1107 - Unable to combine the following symbols into a single IOB > > component: > > PAD symbol "CLK" (Pad Signal = CLK) > > BUF symbol "CLK_IBUF" (Output Signal = CLK_IBUF) > > Each of the following constraints specifies an illegal physical site for > > a > > component of type IOB: > > Symbol "CLK" (LOC=C11) > > Please correct the constraints accordingly. > > Problem encountered during the packing phase. > > > > I would like to know how can I solve this problem. > > > > Thanks, > > > > BROTO LaurentArticle: 45583
rickman wrote: > A couple of comments for points that were not fully addressed. > > I think there is an adder/subtractor in the coregen, if you insist on using a generated core. > > > * Xilinx lib has "8bit adder carry out" but it doesn't seem to have > > "8bit subtractor borrow out". Is this right? > > Don't know, but as you found, an adder and a subtractor are the same > thing with inverters on one input and the carries. > > > * How do I get the half-carry bit out of the 8bit adder? I guess I can > > instantiate/infer two separate 4bit adders. Is there a better way? It can be done, but it takes a little mind-bending. Basically, you need to turn your 8 bit adder into a 9 bit one with bit 4 being a dummy so that you can pull out the carry out through the bit. It takes a bit of caressing the tools to make them infer it. -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 45584
Maybe you got your math wrong and you really do envision the large solution. 64/2 x log2(64) = 192, not 256. The large rotator is the standard mux arrangement using 4:1 multiplexers, requiring 2 LUTs per element (either in the Altera or Xilinx) for 128 LUTs (64 slices) per stage. Since two bits of the rotation are taken care of at each stage, it takes 3 stages to accommodate the full 64 bit rotation. Three stages of 4:1 mux give a 64:1 effective multiplexer.At any stage of a rotator, N bits input map to N bits output so - in this case - the 64 bit width is maintained in the interim stages as well as the output. Three stages result in 3x128=384 LUTs. The 4:1 muxes are inferred very nicely with MUXF5 elements in the Synplify Xilinx flow. The 6 address bits are used directly so there is no replication required unless the three stages are pipelined. As I was starting to put together the smaller solution, I realized I goofed once again. The cross-coupled MUXF6 elements don't give me the 4 outputs per CLB for an 8 bit rotate that I "remembered" but indeed just 2 outputs. The number I supplied should have been 76, not 66, in the first place but to compound things the real value ends up being 140 slices (280 LUTs) because of my forgotten efficiency. So, onto the (not as spectacular) implementation . . . In the Virtex(E)/Spartan-II(E) devices, the 4 LUTs can implement an 8:1 mux or two 4:1 muxes with the MUXFn elements and a 2 element select in each LUT. The classic 8:1 mux would have one select bit go to all the LUTs, another select bit to the two MUXF5s and the third select bit to the MUXF6. Interesting thing is there's still an unused MUXF6 in the CLB. This extra MUXF6 can be tied to the same select control as the MUXF6 in the standard 8:1 mux and the result is a bit that's 180 degrees out of phase for an 8 bit rotate (assuming the 4 LSbits are in one slice and the 4 MSbits in the other). Using simple 8:1 muxes in the 64 bit rotator would require 4 LUTs per bit per stage for 2 stages or 512 LUTs (256 slices). 8 unique inputs would be required for each bit at each multiplex stage. If rotators are used instead, the 8 inputs don't have to be unique allowing us to take advantage of the other MUXF6 in the CLB. Two stages of 8 bit rotators don't quite make a 64 bit rotator without a little help. If the first stage is rotated in the simple sense, the second stage can be rotated partially by the "simple" value and the rest rotated by that value the "simple value plus one. A rotate of 37, given the original ordering in the grid below, would be a rotate within the rows of 5 (37 mod 8) followed by a rotate between the rows in the same column of either 4 (37\8) or 5 (37\8+1) where "\" indicated integer divide. Be sure to view this in fixed space font. Original: 3f 3e 3d 3c 3b 3a 39 38 37 36 35 34 33 32 31 30 2f 2e 2d 2c 2b 2a 29 28 27 26 25 24 23 22 21 20 1f 1e 1d 1c 1b 1a 19 18 17 16 15 14 13 12 11 10 0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 Rotate left 5: 3a 39 38 3f 3e 3d 3c 3b 32 31 30 37 36 35 34 33 2a 29 28 2f 2e 2d 2c 2b 22 21 20 27 26 25 24 23 1a 19 18 1f 1e 1d 1c 1b 12 11 10 17 16 15 14 13 0a 09 08 0f 0e 0d 0c 0b 02 01 00 07 06 05 04 03 Rotate up 4 or 5: _4 _4 _4 _5 _5 _5 _5 _5 1a 19 18 17 16 15 14 13 12 11 10 0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 3f 3e 3d 3c 3b 3a 39 38 37 36 35 34 33 32 31 30 2f 2e 2d 2c 2b 2a 29 28 27 26 25 24 23 22 21 20 1f 1e 1d 1c 1b The replication of the address bits for the control over n\8 vs n\8+1 needs to be done for 7 of the 8 columns (the leftmost is always n\8 or the upper 3 bits).This decision and replication increased the 128 slices to about 140. A full 64 bit rotate in 2 stages with 73% of the resources. Not quite the gains I claimed but pretty respectable. The technique can be applied to 4 bit rotators instead of 8 bit rotators (don't use one LUT in each slice of the CLBs with the cross coupled MUXF6s) for 16 and 32 bit rotators with good resources savings. The resource savings might not be worth the trouble for many designs but there are gains in speed due to reduced fanout and fewer stages of decode. - John_H rickman wrote: > Interesting idea. IIRC, the Altera cascade chain was inferred by > Synplicity pretty well if we used predecoded enables. > > I am curious about the rotator you mention. You said that you could > implement a 64 bit rotator in 192 slices (384 LUTs?) with a standard > method and 66 slices (132 LUTs) with an optimal technique. I can only > picture a N/2 x log(N) array of 2:1 muxes where N is 64 bits. This > gives 256 LUTs which is neither of your answers. Even if you find a way > to use an extra embedded 2:1 mux in the slice, that would only bring it > down to about 171 LUTs and would not change the architecture at 8 levels > of logic. > > Care to share your techniques, both the large and the small one? > > John_H wrote: > > > > The carry chain in the Xilinx part can do the same thing as the Altera cascade > > chain if I recall correctly. If the Xilinx MUXCY element passes a 1 on the carry > > and a zero when the LUT result is false, you get a wide AND cascade chain. Wide > > word muxes can still take N/2 LUTs in the Xilinx architecture independent of which > > method you use. The cascade chain would probably need a manual instantiaton in > > Xilinx, possibly in Altera. A 4-1 mux ends up being the same in either > > architecture, really: 2 LUTs. The rotator I was talking about ends up beating > > out the cascade approach significantly in either architecture. > > > > rickman wrote: > > > > > I am reaching back now, but I seem to remember that when it came to > > > implementing muxes the Altera parts (maybe only the 10K parts) have a > > > "cascade" backbone in each group of LEs that allows them to do very fast > > > muxes as well as AND-OR or just wide AND type logic. The cascade logic > > > is a two input AND (or is it an OR?) gate that combines the cascade > > > input with the LUT output. Although the delays are additive, they are > > > very short like a carry chain and can frequently beat an equivalent tree > > > mux. > > > > > > But to use the cascade chain for a mux you need to change your logic to > > > use decoded enables rather than encoded selects. The number of LEs for > > > the mux then becomes N/2 where N is the number of inputs. This can be > > > very optimal for wide word muxes where the decoding the enables uses > > > much less logic than what is saved in the mux. > > > > > > I don't remember Synplify doing a great job of synthesis with these > > > structures. It may have worked well if you used a particular coding > > > style. But otherwise it would only use two LEs instead of the four or > > > five that were optimal. > > -- > > Rick "rickman" Collins > > rick.collins@XYarius.com > Ignore the reply address. To email me use the above address with the XY > removed. > > Arius - A Signal Processing Solutions Company > Specializing in DSP and FPGA design URL http://www.arius.com > 4 King Ave 301-682-7772 Voice > Frederick, MD 21701-3110 301-682-7666 FAXArticle: 45585
Hi Hal Murray, > I think the answer depends upon how many you want to build > and who is going to be using them. only One Device for a privat project. > Several years ago, I had the same problem. is the PCI-core-design better in a faster chip? do you have lesser problems with the timing? > I put a scope on the system I was interested in running in. > I didn't see any reflections significantly over 3 V. > > We decided it was a risk we were willing to take. i have the same results at my board Greatings, ErikArticle: 45586
>, or read/write config registers. I guess there is a concern of >metastability when a status bit is changing during a read, is this a problem >I should be concerned with? Metastability is evil. Far better to avoid it with clean design (even if it looks like overkill) than to have to track it down. I/we got bit on a case like the one you are describing on a PCI bus that computes parity. A junk status but was changing during the read cycle. -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.Article: 45587
suchitra wrote: > > hello all > i just wanted to know that can 555 be used for cplds as clock input if > the frequency is very low something like 1 hz. > regards Probably, but you might prefer to look at - Tiny Logic SChmitt gates ( X14 ), in SOT23 packages - XX4060 counter chains, in SO16/TSSOP16, that have 2^14 dividers, so allow more precise and smaller/cheaper RC components, as well as a fast test mode. 1Hz in a NE555 will be something of a lottery :) -jgArticle: 45588
> * How do I get the half-carry bit out of the 8bit adder? I guess I can > instantiate/infer two separate 4bit adders. Is there a better way? Ray Andraka <ray@andraka.com> writes: > It can be done, but it takes a little mind-bending. Basically, you > need to turn your 8 bit adder into a 9 bit one with bit 4 being a > dummy so that you can pull out the carry out through the bit. It > takes a bit of caressing the tools to make them infer it. Is there any advantage to doing that rather than two four-bit adders? For instance, with two four-bit adders, does the synthesizer not recognize that it can continue the carry chain between them? Or does the FPGA not allow you to tap the carry from intermediate stages of the chain?Article: 45589
Hello :) I want to configure my Virtex II parts with FLASH (Atmel or Intel). I found app note how to do that But question is How to Programm FLASH?? I am thinking of using CPLD and CPLD Logic convert "Xilinx Parallel cable III" output to program FLASH Is it possible?? Any other Idea?? I want to use Xilinx software and their cable but i can add only CPLD to my board. Thanks In AdvanceArticle: 45590
I update the SFL2VL that is a conversion program from SFL(Structured Functional Language) to Verilog. If you want a quick review on SFL, see Jan Gray's article on http://www.fpgacpu.org/ The program is now compatible with Exemplar Leonard. In the following web site, I placed the program with some test suit such as: m65: 6502 compatible processor mz80: Z80 semi-compatible processor my88: i8088 semi-compatible processr The SFL2VL is free to use and redistribute, feel free to download it. http://shimizu-lab.dt.u-tokai.ac.jp/pgm/sfl2vl/index.html Enjoy. ---------------------------------->--------------------------->> Naohiko Shimizu Department of Communications Engineering, School of Information Technology and Electronics, Tokai University 1117 Kitakaname Hiratsuka 259-1292 Japan TEL.+81-463-58-1211(ext. 4084) FAX.+81-463-58-8320 http://shimizu-lab.dt.u-tokai.ac.jp/ <<--------------------------------<-----------------------------Article: 45591
> I have one more quesiton: > Our company have modelsim and Tanner L-edit, > What other tools I need for complete IC development? > > Which part of tools is free and which must buy? > (Personally, I am interested to design a chip for practise, so, I do not > need powerful tools for me). Well if you're designing an ASIC with just digital-logic (no analog blocks or other 'custom IP', like a custom-layout multiplier block), and you want to carry the design all the way through the 'backend' process, at a minimum you need the following: #1) synthesis tool (example, Synopsys Design Compiler) #2) place&route tool (example, Cadence PKS) #3) clock-tree insertion (not sure, could be part of #1 or #2?!?) #4) design rule-check, layout verification?!? (not sure) I'm not aware of any "free" development tools. The ones I list above are all commercial, and range in cost (for 1 year license) from $90,000 USD upward of $1 million USD. unfortunately, I'm not that familiar with the backend process, so I'm not 100% certain about the tools. There's some overlap in capability among the vendors. For example, Cadence has a synthesis tool (Buildgates), which you can acquire along with their PKS tool. Synposys is in the process of acquiring Avanti, so when all is said and done, Synopsys will offer a place&route tool, too. You are better off posting this sort of question in comp.cad.cadence (where you'll get answers heavily biased toward Cadence's tools!), or comp.lang.verilog and comp.lang.vhdl.Article: 45592
In order to tap the carry chain you need to add an extra bit in the carry chain. The synthesis tools won't do that for you, and in fact will not infer a caryy chain for less than about 7 bits. Using 2 four bit counters you incur the delay to get off and then onto the second chain, where with a single chain you only incur ~100ps. With 2 4 bit counts, it is likely not your worst case path anyway, so for the sake of simplicity, readability and maintainability of the code, it is probably better to just infer them as separate counters. My point was that what you asked about could be done, but it is not done automatically by the tools and it takes a bit of finabling to make it work. Eric Smith wrote: > > * How do I get the half-carry bit out of the 8bit adder? I guess I can > > instantiate/infer two separate 4bit adders. Is there a better way? > > Ray Andraka <ray@andraka.com> writes: > > It can be done, but it takes a little mind-bending. Basically, you > > need to turn your 8 bit adder into a 9 bit one with bit 4 being a > > dummy so that you can pull out the carry out through the bit. It > > takes a bit of caressing the tools to make them infer it. > > Is there any advantage to doing that rather than two four-bit adders? > For instance, with two four-bit adders, does the synthesizer not > recognize that it can continue the carry chain between them? Or > does the FPGA not allow you to tap the carry from intermediate stages > of the chain? -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 45594
hello, may be basic question if someone has to implement an FIR using bit serial, he has to see the output wordlength, thus the FIR bit growth. Then, he needs to expand the input data with zero to have regular wordlength through the structure in parrallel we have not to do that what about digit serial, should we still need to expand the input data with zero digits Many thanksArticle: 45597
Thanks a lot, John_H. It is very useful to me. Best Regards, Daryl "John_H" <johnhandwork@mail.com> 写入消息新闻:3D3E3E00.2796573F@mail.com... > You're trying to @(posedge clk) increment the counter and provide a > comparison value on... the new value? The old value? In the telecom > stuff I worked with, there were typically frame counters to track the > bytes and provide gates for various operations. If you only need one > gate signal, things are too simple. If you need a separate gate for > each of 20 bit positions, it's a little tougher but your speeds should > be extreme with a little care. > > If you're doing an equality compare for each gate, there are two ways to > do it, with a tree or a carry chain. I'll be playing with my first > Virtex-II in a week or two but I've heard the carry chains aren't as > effective as they were in the Virtex-E parts but they should still > provide excellent results. > > A 14 bit *constant* equality compare in a tree would require 3.5 LUTs > for the first level of comparison and another LUT to assemble all those > together. Since there are 4 slices (8 LUTs) in one Virtex-II CLB, this > should scream! If it's a variable equality compare, the 7 LUTs feeding > 2 LUTs feeding 1 final LUT isn't as clean but you should still get great > speed. One of the key factors is that the *registered* count value > needs to be compared to a constant or a *registered* comparison value. > > The carry chain is probably better for a 14 bit equality compare since > the 7 LUTs can cascade into one carry chain. If you want to do a 98 bit > equality compare, you could assemble the 7 bit carry chains into a > series of (horizontal) cascade ORs (if that's what they're called - I > won't look it up now). > > The point is, things should scream in either format compared to the > speeds you're getting. > > Check out your logic and routing delays to see how your timing goes from > source register to destination. Ask yourself if some of the stages can > be pipelined. One of the beautiful things about counters is that they > increment predictably! (Unless they decrement) > > You could assemble a huge comparison tree and register each level to > attain outrageous pipelined speeds. Look at your requirements and > figure out what you can back into a previous pipeline stage. Very good > things should come together with nice design work. > > An example of a counter with a single compare output (apologies if > you're VHDL): > > always @(posedge clk) > if( count == max_count ) count <= 0 + ena; > else count <= count + ena; > assign out_gate = (count == max_count); > > The structure above isn't very efficient because a wide compare is > needed in the logic while it isn't needed in the design. The logic may > not synthesize into a simple counter, either, requiring two stages of > logic for the counter to add to the compare. > > You could use a registered compare of > > out_gate <= (count == max_count - 1) & ena; > > which (in the always block) has the gate go active when you want it. > > But you could do better by resetting your counter with a different > value: > > always @(posedge clock) > if( out_gate ) {out_gate,count} <= {1'b0,-max_count} + ena; > else {out_gate,count} <= count + ena; > > Note that the gate is now synchronous and there is NO compare required. > (Apologies that things look a little strange... the constant "max_count" > should be dimensioned the same as the "count" vector so the out_gate > initializes properly false) > > The structure can be made "synthesis friendly" to use one level of > synthesized logic (if it doesn't already) by using an equation that's > more friendly to the Xilinx carry chain configuration: > > always @(posedge clock) > {out_gate,count} <= (out_gate ? {1'b0,-max_count} : count) + ena; > > The conditional operator works in place of the if/else construct and > "fits" in the carry structure. > > Many things to do. Happy coding! > > - John_H > > > > > Sniper Daryl wrote: > > > > Here, > > > > I am Daryl and I have to trouble you. :-) > > > > When I design a chip used for optical network, a lot of effort must > > be made to increase the clock speed and reduce the chip resource cost. > > In a timing interface module, there is a counter with 14-bit width to > > provide timing to the outgoing frame. So, a comparator used to compare > > the counter word with a series of registers set by the controller. > > I've notice that the slices cost increases seriously and the maxinum > > clock speed decreases a lot, when the counter and the comparator get > > wider. > > > > Troubled with it, I firstly tried a wider counter(14-bit) and a > > narrower comparator(4-bit) and got 20MHz upgrade of speed and more > > than 20 slices saving. Then, a 4-bit counter and 14-bit comparator > > with a result of 10MHz upgrade and about 10 slices saving. So, I think > > the critical factor is the wide comparator. This is proved by studying > > the report and schematics from the synthesis tools(FCII3.6.1 and > > Synplify Pro with Amplify). > > > > To improved the performance, I've tried to use CoreGen tool to > > generate a core of comparator. But,after implement, the result is no > > better than from myselft code. > > > > The synthesis tool I used is FCII 3.6.1, the device is > > VirtextII1000, implement by ISE4.2SP3. Here is the result of my trials > > : > > > > 14-bit counter, 14-bit comparator and other logic : 63 > > slices used(36 FFs and 105 LUTs); 95MHz > > > > 4-bit counter, 14-bit comparator and other logic : 50 > > slices used(26 FFs and 85 LUTs); 115MHz > > > > 14-bit counter, 4-bit comparator and other logic : 41 > > slices used(26 FFs and 62 LUTs); 127MHz > > > > Would you give me some advice about it from your experience? Or > > some resource to study? > > > > > > > > Thanks in advance for you time! > > > > DarylArticle: 45598
Hi everybody, I am looking for a FPGA that I have to use in a secure manner. I have to do a project in which the FPGA is use in a 'military' like environment. Can somebody tell me which vendors and FPGA families are out there with 'special' security features? and which security features that are and what are they good for? The FPGA can use any technology (SRAM, antifues, flash ...), but it has to be secure against as much attacks as possible. Thanks everybody for your time. I really appreciate your help Thomas P.S: If you like to email me, just delete XY in the following email-address: wollingerXY@crypto.ruhr-uni-bochum.de (I do not know if this is the right newsgroup to post this question - if not please could somebody let me know where is a better place to post.)Article: 45599
Thanks again everyone. Using your suggestions I've managed to implement PIC-style ADD/SUB/INC/DEC with carry and half-carry out in just 4 slices, see code below. I'm not sure about the polarity of the borrow bit though. Synthesis infers 2 5-bit adders, later optimised into 4-bit adders with carry in/out. P&R places them in one column one immediately on top of another (in otherwise empty FPGA). I don't have suffucient knowledge to tell from all those the reports whether the carry chain is broken or continues over. It does seem to continue over. Here is the code, comments appreciated. Regards, Dmitri library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; entity alu_adder is Port ( A,B: in std_logic_vector(7 downto 0); op: in std_logic_vector(1 downto 0); Y: out std_logic_vector(7 downto 0); carry_out: out std_logic; dc_out: out std_logic ); constant ADD : std_logic_vector(1 downto 0) := "00"; constant SUB : std_logic_vector(1 downto 0) := "01"; constant DEC : std_logic_vector(1 downto 0) := "10"; constant INC : std_logic_vector(1 downto 0) := "11"; end entity alu_adder; architecture Behavioral of alu_adder is begin process( A, B, op ) variable tmp: std_logic_vector(7 downto 0); variable lo_nibble, hi_nibble: unsigned(5 downto 0); variable cin: std_logic; begin case op is when INC => tmp := (others => '0'); cin := '1'; when DEC => tmp := (others => '1'); cin := '0'; when SUB => tmp := not B; cin := '1'; when ADD => tmp := B; cin := '0'; when others => tmp := (others => '-'); cin := '-'; end case; lo_nibble := unsigned('0' & A(3 downto 0) & cin ) + unsigned('0' & tmp(3 downto 0) & cin ); hi_nibble := unsigned('0' & A(7 downto 4) & lo_nibble(5) ) + unsigned('0' & tmp(7 downto 4) & lo_nibble(5) ); Y <= std_logic_vector( hi_nibble(4 downto 1) & lo_nibble(4 downto 1)); dc_out <= lo_nibble(5); carry_out <= hi_nibble(5); end process; end architecture Behavioral; Ray Andraka <ray@andraka.com> wrote in message news:<3D43FE74.BC6780AD@andraka.com>... > In order to tap the carry chain you need to add an extra bit in the carry > chain. The synthesis tools won't do that for you, and in fact will not > infer a caryy chain for less than about 7 bits. Using 2 four bit counters > you incur the delay to get off and then onto the second chain, where with a > single chain you only incur ~100ps. With 2 4 bit counts, it is likely not > your worst case path anyway, so for the sake of simplicity, readability and > maintainability of the code, it is probably better to just infer them as > separate counters. My point was that what you asked about could be done, > but it is not done automatically by the tools and it takes a bit of > finabling to make it work. > > Eric Smith wrote: > > > > * How do I get the half-carry bit out of the 8bit adder? I guess I can > > > instantiate/infer two separate 4bit adders. Is there a better way? > > > > Ray Andraka <ray@andraka.com> writes: > > > It can be done, but it takes a little mind-bending. Basically, you > > > need to turn your 8 bit adder into a 9 bit one with bit 4 being a > > > dummy so that you can pull out the carry out through the bit. It > > > takes a bit of caressing the tools to make them infer it. > > > > Is there any advantage to doing that rather than two four-bit adders? > > For instance, with two four-bit adders, does the synthesizer not > > recognize that it can continue the carry chain between them? Or > > does the FPGA not allow you to tap the carry from intermediate stages > > of the chain?
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z