Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Hey all. I have a uC that's updating a register in my XC4010E via a standard "three-wire" SPI. The uC writes to an 8-bit shift register in the FPGA -- the uC's SPI clock is used to clock the FPGA flip-flops and the SPI_nCS (chip select) is used as the ENABLE(active low) to the register for shifting. To prevent any odd behavior, I am double buffering my data -- the shift register is one buffer, and I parallel load the data from the shift register into a second 8-bit register, which is clocked by my FPGA native 40MHz clock. (The 2nd register feeds the inputs of a loadable free running counter). My problem is in the the control of the second register. I only want this register to update when data(in the SPI register) is valid but the SPI register is being clocked by something completely asynchronous to the FPGA's clock. My first idea was to use the SPI_nCS as the ENABLE(active high) on the second register; the register would clock in data on rising edges of the FPGA clock and only when SPI_nCS was high. Since the SPI_nCS "envelope" surrounds an SPI transaction, when the signal is NOT low, I know an SPI operation wouldn't be occurring. I also wondered if a better solution might be to use three T-flip-flops and divide down the uC's SPI clock by 8, so on the 8th clock(when the last bit from the uC gets clocked into the shiftreg) I register the SPI register's data. I would use the output from the third T-FF as the ENABLE(active high) on my 2nd register and still clock the 2nd register from the FPGA 40MHz clock. I'm sure all of the above will work, but I didn't know which would be a better solution (if any) and if there are other things to keep in mind. Thanks, VR.Article: 37176
Did you read the PCI spec carefully? The PCI spec requires power and ground planes, since the maximum distance for the PCI power/ground connector pads to the plane is .25", as stated in 4.4.2.1. Also, it is a requirement to decouple the 3.3V PCI pins too, even though they are not used, per the same section. "Dan" <daniel.deconinck@sympatico.ca> wrote in message news:I8TN7.26205$cC5.2965973@news20.bellglobal.com... > Hello, > > I am shipping a 2 layer PCI card (33mhz-32bit). It uses a Xilinx with a 2.5V > core and 5Volt tolerant IOs. ( XC2S50-5PQ208C) > > I laid out the board with as much ground plane on the bottom and as much > routing on the top as was possible. Its 90% ground plane. I believe that > this works OK on many PCs but I think I still need to improve the electrical > characteristics of the board for proper operation across all PCs. > > I currently use through hole by pass caps all around the perimeter of the > Xilinx chip. > > I am sure things will get better by switching to both surface mount caps and > a four layer PCB. My question is how important is each of these two > improvements when compared to one another ? For example X% of the > improvement will come by switching from through hole caps to surface mount > and (100-X) % of the improvment will come from switching from two layers to > four layers. > > I am wondering if simply switching to surace mount caps will give enough of > a boost in performance. > > > Sincerely > Daniel DeConinck > www.PixelSmart.com > TEL: 416-248-4473 > > >Article: 37177
On Mon, 3 Dec 2001 04:43:19 +0000 (UTC), VR <crossing@notjordanbutaclockdomain.com> wrote: >Hey all. > >I have a uC that's updating a register in my XC4010E via a standard "three-wire" SPI. > >The uC writes to an 8-bit shift register in the FPGA -- the uC's SPI clock is used to clock >the FPGA flip-flops and the SPI_nCS (chip select) is used as the ENABLE(active low) to the >register for shifting. So far, so good. We will call this reg the SPIReg. >To prevent any odd behavior, I am double buffering my data -- the shift register is one >buffer, and I parallel load the data from the shift register into a second 8-bit register, >which is clocked by my FPGA native 40MHz clock. (The 2nd register feeds the inputs of a >loadable free running counter). OK, we call this reg the ReloadReg. Although you dont say so, all the following assumes that your free running counter will reload from the ReloadReg at some arbitrary time with respect to the updating process. I.E. it could use the second register while a new value is arriving, or has just arrived, is about to arrive. >My problem is in the the control of the second register. Well actually, the problem is when to load ReloadReg (the second registe) >I only want this register to update when data(in the SPI register) is valid Right >but the SPI register is being clocked by something completely asynchronous >to the FPGA's clock. Right >My first idea was to use the SPI_nCS as the ENABLE(active high) on the second >register; the register would clock in data on rising edges of the FPGA clock and >only when SPI_nCS was high. This wont work. The failure scenario is that there is a race condition between SPI_nCS going high, and the next FPGA clock. ENABLE for ReloadReg has a setup and hold time requirement for reliable loading. This can be violated by this scheme. This will lead to an incorrect value loaded into ReloadReg, comprising of some new bits and some old bits, or metastability. If you are unlucky, while the ReloadReg has junk in it, it will be used as the reload value. Following FPGA clocks will correct the value, but there is a finite probability of the above scenario happening. >Since the SPI_nCS "envelope" surrounds an SPI transaction, when the signal is NOT low, I know >an SPI operation wouldn't be occurring. The problem is on the boundary of SPI_nCS going from low to high. There is also an issue with SPI_nCS going from high to low, and potentially corrupting a load at this point too, but because the prior contents, and the data on the D pins are the same, this wont cause problems. >I also wondered if a better solution might be to use three T-flip-flops and divide down the >uC's SPI clock by 8, so on the 8th clock(when the last bit from the uC gets clocked into the >shiftreg) I register the SPI register's data. I would use the output from the third T-FF as >the ENABLE(active high) on my 2nd register and still clock the 2nd register from the FPGA >40MHz clock. This still has the problem of your ReloadReg enable being in the wrong clock domain. >I'm sure all of the above will work, but I didn't know which would be a better solution (if >any) and if there are other things to keep in mind. I'm sure the above will eventually fail. Here's what I would do (have done in dozens of production designs ). You are right to be concerned about the data for your counter being in a register that is clocked by the same clock as the counter. The only issue is how to load ReloadReg reliably. Consider the following: 1) Once the SPIReg has been updated, we only need to copy it once into ReloadReg 2) The SPI clock is significantly slower than the 40MHz FPGA clock, so you can take a few 40MHz cycles to get the data safely over into the 40MHz domain, since it will take far longer for the SPI system to send a new value. 3) Since the SPI system is async to the 40MHz system, your design cant be sensitive to the exact 40MHz cycle in which ReloadReg is updated. So taking 3 * 40MHz cycles to do it safely is fine. Here is a structure that will work reliably: Connect 4 D flipflops as a 4 bit shifter (Q0 to D1, Q1 to D2, Q2 to D3, Q3 to D4). Clock the shifter with the 40MHz. Connect the input D0 to SPI_nCS. Connect a 4 input AND gate to Q0, Q1, Q2, and ~Q3 , to detect the sequence 1,1,1,0 . This will occur after SPI_nCS has gone high, and 3 clocks of 40MHz. If a metastability occured or you failed a setup/hold requirement, you might see something like the following: 1,0,0,0 0,0,0,0 1,0,0,0 1,1,0,0 1,1,1,0 <- match case 1,1,1,1 1,1,1,1 Regardless of metastabilities, by the time you get a match, things will be settled, as you have taken 75 ns , which should be more than enough given the characteristics of current FPGAs. Take the output of the AND gate and pass it through 1 more FF. Now we have 100ns of resolution time (and update latency) . Note that the match signal will only be high for one 25ns cycle. Use the output of this FF as the active high enable for ReloadReg. This should be extremely reliable. >Thanks, >VR. Philip Freidin Philip Freidin FliptronicsArticle: 37178
Some question about this pair : 1)In Xilinx I use Synplify to synthesize, why there are always the possibility to create a constrain file also in Xilinx ?? I had to clear it ?? 2) Synplify produce a file .ncf containing P&R constrain , how I can specify it like input to the P&R ??Article: 37179
How I can recognize that a path is a multicycle path and so I can specify more clock cycle for it in Synplify constrain ?? What happen if the this is not a multicycle path ??? For example for the following counter divider 3 : library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_unsigned.all; entity counter_divider_3 is port ( clk : in STD_LOGIC; reset : in STD_LOGIC; count_3 : out STD_LOGIC_VECTOR (2 downto 0); clk_div_3 : out STD_LOGIC ); end counter_divider_3; architecture counter_divider_3_arch of counter_divider_3 is signal int_count_3 : STD_LOGIC_VECTOR (1 downto 0) ; signal reset_clk_a_b : STD_LOGIC_VECTOR (3 downto 0) ; signal count_0_delayed : STD_LOGIC; begin process (clk , reset) begin if reset='1' then int_count_3 <= "01"; elsif falling_edge(clk) then -- & funziona come aggregatore di bit non un and logico !! int_count_3 <= int_count_3(0) & not(int_count_3(0) or int_count_3(1)); end if; end process; process(clk) begin if rising_edge(clk) then count_0_delayed <= int_count_3(0); end if; end process; clk_div_3 <= int_count_3(0) nor count_0_delayed; process(clk) begin if falling_edge(clk) then count_3 <= '0' & int_count_3; end if; end process; end counter_divider_3_arch; I use for it these constrains : # Synplicity, Inc. constraint file # C:\Tesi\Aggiunte_fino_al_1_Gennaio_2002\Xilinx\CounterDivider3_2Dic2001\counter_divider_3.sdc # Written on Sun Dec 02 17:37:07 2001 # by Synplify Pro, 7.0.1 Scope Editor # # Clocks # define_clock -disable -comment {-improve} -name {clk} -freq 165.000 -clockgroup default_clkgroup # # Inputs/Outputs # define_input_delay -disable -default define_output_delay -disable -default define_output_delay -disable {clk_div_3} define_output_delay -disable {count_3[2:0]} define_input_delay -disable {reset} # # Multicycle Path # define_multicycle_path -comment {-improve} -from {i:count_0_delayed} -to {p:clk_div_3} 4 define_multicycle_path -comment {-improve} -from {i:int_count_3[1:0]} -to {p:clk_div_3} 4 define_multicycle_path -comment {-improve} -from {i:int_count_3[1:0]} -to {i:count_0_delayed} 4 I obtain a estimated clk of 185MHz but then when I implement it in Xilinx 4.1 the result is really bad, about 140 MHz , what is wrong ??? ThanksArticle: 37180
rickman wrote: > > Simon Gornall wrote: > > [Reasons why GCC is a good but limited analogy to FPGA P&R] > > That may all be true. But I still maintain that place and route software > is inherently more complex than complilers. No argument here! > The tasks required to > convert C language instructions to machine code for a given, well > defined architecture is conceptually straight forward and well > understood by nearly anyone graduating with a computer science degree. > On the other hand, place and route algorithms are in a class of problems > known as NP complete if my schooling has not failed me (or my memory). > This means essentially that you can NEVER deterministically find the > best solution to the problem for a realistic application given the state > of technology in the foreseeable future. At least this is true until we > are using Quantum computing which can explore all solution sets > simultaneously. Well, not quite. NP-Complete means you're both NP and NP-hard. "NP" means a "Non-deterministic turing machine can solve the problem in Polynomial time". In practice, this means the solution will take a loooong time, because most NP problems involve either an enormous number of iterations to get the answer, or they have a lot of variables, increasing the search space. Sometimes (as in FPGA routing, I'd expect) both :-( Polynomials can get big very rapidly when you have "lots" of potential solutions to examine :-(( There is the interesting factor that if you solve one NP problem, you can in theory solve them all, because any NP problem can be transformed into any other in polynomial time as well... > The difference in problem statment means that the algorithms for solving > them and the means of developing them are very, very different. The > suboptimal solution hunt will always require custom algorithms and > special tuning that are far more device specific than what is done to > write a code optimizer for a processor. <grin> I did a PhD using neural networks to map feature spaces into decision trees. My major discovery was that the relaxation-labelling equations that were developed for optic-flow are actually an instance of the Hopfield neural network solution set. I'd expect that behind the scenes, you'd probably need a peer-voting scheme with conventional constraint-based logic as inputs to multiple types of solver - for example, you could have a genetic algorithm, a K-nearest-neighbour and a neural network all providing possible solutions to localised routing, with a second tier above making the decision as to which one to "accept" as a potential solution - the one that best matches the other localised areas. I worked on some similar stuff when I was a post-Doc. > You obviously understand compliers pretty well. But what do you know > about designing place and route software? I don't profess to be an > expert, but this is a very different animal than writing a compiler. Very little :-) It seems to me that the routing is the problem though, and there are *lots* of techniques to try and maximise global "fit" over local minima in the solution space. > > > One could look at: > > > > http://www.eecg.toronto.edu/~vaughn/vpr/vpr.html > > http://www.eecg.toronto.edu/~vaughn/vpr/e64.html (routing images) > > > > as a darn good start. I mailed the guy who wrote the package about a > > year ago though, and he said specifying the 'resource descriptions' > > as I refer to them above is by far the hardest problem, because of > > course you have to specify the constraints under which the resources > > operate as well as the method by which you instantiate the constraint > > on the resource. > > This is encouraging. But how does it compare to the commercial tools? > They don't say what the "chip" is. I assume it is an imaginary one, the > routing appears to be very, very simplistic. Most FPGAs have multiple > levels of routing and important limitations on how you can use that > routing. I expect this would greatly complicate routing algorithms. It will, but not necessarily to the level you expect. A constraint is a constraint - whether it spans one CLB block or 4 or 16 doesn't really matter. What does matter is the weighting given to how you would use the resource, but that's part of the problem... > But then maybe I am overstating the complexity of P&R algorithms. But > they have been the bane of FPGA design for as long as there have been > FPGAs. If you have a chip that runs 20% slower and have tools that > optimize the P&R to give 20% better results, you will be able to meet or > beat your competition. I am sure that every FPGA company works very hard > to improve the P&R tools. I'm not actually claiming it would be easy :-) I said I thought it would be hard. I do think it's in the realm of the possible though. At the moment I have too much to do (I'm building a radio telescope and writing the s/w to control it - I can do that in Linux so it takes priority over the FPGA stuff) Vaughn founded a company that's been bought by Altera, so he works for them now. It'll be interesting to see if 'vpr' will stick around. Grab a copy now! > But none of that changes the viability of open source tools for FPGA > design. Perhaps the availability of free (as in beer) tools and low cost > hardware will encourage more "amature" work in the tool area and we will > start to see some open source tools. But I don't expect to see them > being used much professionally during my career. I have about 10 - 15 > years left. We will see if anything changes my mind by then. Agreed. I'm working on the premise that if Xilinx get moaned at often enough, they will eventually listen. If only companies were as predictable as FPGA routing :-) ATB, Simon.Article: 37181
Hi all gurus out there, I got very curious after reading all the posts in this thread. I know that we can use a lookup table method to implement CRC in parallel. E.g. we can use a 256-byte table to calculate the CRC 8-bit per cycle. It seems to me that to use the same trick for a 128-bit input would require a 2^128 element table, which must be a no-go. Many people have talked about things like unrolling and pipelining the input. Can anybody point me to a source where such approaches are explained in greater detail so that I can apprepiate what you all have been driving at? Thanks in advance. TA TA kahheanArticle: 37182
Does any know where I can find a free PCI simulation model in VHDL? 32bit @ 33MHz or 66MHz is preferable. thanks SWArticle: 37183
On 3 Dec 2001 02:37:44 -0800, kahhean@hotmail.com (Chua Kah Hean) wrote: >Hi all gurus out there, > >I got very curious after reading all the posts in this thread. > >I know that we can use a lookup table method to implement CRC in >parallel. E.g. we can use a 256-byte table to calculate the CRC 8-bit >per cycle. > >It seems to me that to use the same trick for a 128-bit input would >require a 2^128 element table, which must be a no-go. > >Many people have talked about things like unrolling and pipelining the >input. Can anybody point me to a source where such approaches are >explained in greater detail so that I can apprepiate what you all have >been driving at? Instead of a monster lookup table mimicking a bunch of XOR gates, just use the XOR gates directly. Many of the terms cancel out: A xor A = 0, 0 xor A = A, etc. so the number of xor gates usually isn't excessive and you avoid the exponential growth in table size. (It's actually the depth of the xor gates, not the number of them, that matters, because the depth determines the delay and hence the clock rate.) Take a look at the logic generated by some of the free online parallel CRC generators: http://www.easics.be/webtools/crctool http://www.geocities.com/steve0192/vhdl.htm The first one (crctool) will generate a function that turns an input word and a feedback word into a new CRC value, which is the feedback word for the next clock. Here's the logic generated by crctool for one bit of a 16 bit CRC with 128 bit input word: D := Data; -- the input word C := CRC; -- the feedback word NewCRC(0) := D(127) xor D(125) xor D(124) xor D(123) xor D(122) xor D(121) xor D(120) xor D(111) xor D(110) xor D(109) xor D(108) xor D(107) xor D(106) xor D(105) xor D(103) xor D(101) xor D(99) xor D(97) xor D(96) xor D(95) xor D(94) xor D(93) xor D(92) xor D(91) xor D(90) xor D(87) xor D(86) xor D(83) xor D(82) xor D(81) xor D(80) xor D(79) xor D(78) xor D(77) xor D(76) xor D(75) xor D(73) xor D(72) xor D(71) xor D(69) xor D(68) xor D(67) xor D(66) xor D(65) xor D(64) xor D(63) xor D(62) xor D(61) xor D(60) xor D(55) xor D(54) xor D(53) xor D(52) xor D(51) xor D(50) xor D(49) xor D(48) xor D(47) xor D(46) xor D(45) xor D(43) xor D(41) xor D(40) xor D(39) xor D(38) xor D(37) xor D(36) xor D(35) xor D(34) xor D(33) xor D(32) xor D(31) xor D(30) xor D(27) xor D(26) xor D(25) xor D(24) xor D(23) xor D(22) xor D(21) xor D(20) xor D(19) xor D(18) xor D(17) xor D(16) xor D(15) xor D(13) xor D(12) xor D(11) xor D(10) xor D(9) xor D(8) xor D(7) xor D(6) xor D(5) xor D(4) xor D(3) xor D(2) xor D(1) xor D(0) xor C(8) xor C(9) xor C(10) xor C(11) xor C(12) xor C(13) xor C(15); (Switch to fixed point font.) Here's the logic you'll end up with: clock-----------------------+ | +-------+ +----------+ | huge | | register | input-->| xor |----->|d q|--+-> CRC out (128) | tree | (16) | | | (16) +-------+ +----------+ | ^ | | | +------------------------+ feedback (16) The "speed" is determined by the minimum clock period, which in this case is limited by the number of logic levels in the xor tree - i.e. the maximum delay between any flip flop output and any flip flop input. You can't do anything with this directly, as the feedback must happen in a single clock cycle. If you look more closely at the logic expression, you'll see that it can be decomposed into the form (input xor feedback) where input is the xor of a bunch of input bits, and feedback is the xor of a bunch of feedback bits. This leads to the following design: clock--------------------------------------+ | +-------+ +-------+ +----------+ | medium| | small | | register | input-->| xor |----->| xor |----->|d q|--+-> CRC (128) | tree | (16) | tree | (16) | | | out +-------+ +-------+ +----------+ | (16) ^ | | | +------------------------+ feedback (16) This isn't any faster than the first attempt, but notice that the "medium xor tree" is not in the feedback path. This means it can be pipelined - we can put flip flops in the logic so that the calculation is performed over several clock cycles. The logic depth between any flip flop output and any flip flop input is reduced - we can have a faster clock. This is shown here: clock-----------------------+----------------------------- | +-------+ +----------+ +-------+ +- | medium| | register | | small | | input-->| xor |----->|d q|----->| xor |----->|d (128) | tree | (16) | | (16) | tree | (16) | +-------+ +----------+ +-------+ +- ^ | +------------ feedbac (I pruned the right side to avoid line wrap, but you should get the idea.) In theory the synthesis tools can do all this for you. E.g. you can describe a serial CRC calculation, put it in a for loop to iterate over the input word, tell it how many clock cycles to take, and the synthesiser should spit out something equivalent to the above. (I have used this approach with LFSRs with some success at these bit rates.) I could make a comment about the relative benefits of HDLs and schematics for high speed design, but I don't want to ignite yet another religious war. Regards, Allan.Article: 37184
In article <u0m7n0ku4ccvd2@corp.supernews.com>, "Austin Franklin" <austin@dark98room.com> writes: |> Did you read the PCI spec carefully? The PCI spec requires power and ground |> planes, since the maximum distance for the PCI power/ground connector pads |> to the plane is .25", as stated in 4.4.2.1. <...> Then 80% of the cheaper network and soundcards violate the spec. I have never seen a RTL8139 based network card with a multilayer PCB. -- Georg Acher, acher@in.tum.de http://www.in.tum.de/~acher/ "Oh no, not again !" The bowl of petuniasArticle: 37185
We need such a PROM that can program two Spartan-II XC2S50-5FG256Cs at the same time. XC2S50 has configuration file size of 559.200 bits. So we need a PROM of size of = 2 x 559.200 = 1.118.400 bits. The devices I have found are XC17S00A 3.3V XC17S200A one time programmable XC17V00A 3.3V XC17V02 in system programmable The problem is, we need an OTP SPROM but our Data I/O Programmer device only supports XC17S00 and XC17S00XL devices but not XC17S00A devices. However, when I have looked at the datasheets of XC17S00/XL http://www.xilinx.com/support/programr/files/17s00.pdf and datasheet of XC17S00A http://www.xilinx.com/partinfo/ds078.pdf I see that the internal logic of these devices are the same. I haven't found any difference. XC17S00/XL family doesn't have any PROM that can hold two Spartan-II XC2S50 at the same time. On the other side Xilinx doesn't say that Spartan II devices can be programmed with XC17S00/XL PROMs. The reason why we look at XC17S00/XL devices for Spartan-II is that our Data I/O Programmer only supports XC17S00/XL devices. I have thought that XC17S00/XL in Data I/O Programmers are compatible with XC17S00A PROMs, therefore I can use XC17S00/XL mode in Data I/O Programmer to program XC17S00A PROMs. UtkuArticle: 37186
Hello, Im a grad student trying to benchmark an FPGA board. Are there any non proprietary BMs avaliable for FPGAs? Specifically I want benchmark efforts to measure the performance of multiple FPGA chips and their interconnect(between them). Pointers to any BM effort will be helpful. thank you very much, hananielArticle: 37187
Alex, I prefer to look at this as "what is the jitter noise floor" in a CMOS FPGA? Getting in, and getting out of the FPGA is the biggest problem, followed by the internal distribution of the clock signals. This is something we have carefully characterized, as we are the 'FPGA Lab' responsible for the verification of the design. To get in, get onto a BUFG (global clock resource) and then get out (by using the DDR clock forwarding FF's) is about 35 to 55 ps P-P (nothing else happening). If you have an another BUFG operating, the jitter goes up to 55 ps to 65 ps P-P. If you then have 10% of all nodes in a 2V3000 all toggle at the same time on the same clock domain (BUFG), the jitter measured is ~ 150 ps P-P on an ansynchronous clock domain. The primary means of jitter is the coupling through the ground, which affects the slicing level of all of the logic. Use of LVDS input buffers, and output buffers helps for the external jitter contributors, but does nothing for the internal contributors. 150 ps P-P of jitter in a design was ignored up until recently. With DDR (double data rate) logic designs, and clock periods of 4 ns in some designs, the half clock period is 2 ns, and 150 ps becomes a significant part of the timing budget. See: http://www.xilinx.com/support/techxclusives/slack-techX21.htm Austin Alex Sherstuk wrote: > Dear colleagues, > > Some time ago there was discussion about phase noise (jitter) introduced by > XILINX FPGA DLL's > > Here is an other question: > > What phase noise (jitter) is introduced by a regular logic element of XILINX > FPGA (e.g. SPARTAN2)? > What is the timing uncertainty introduced by XILINX CLB trigger? > > Thanks, > AlexArticle: 37188
"Ed Browne, Precision Electronic Solutions" escribió: > > It's appalling that Xilinx would sell a product to design an FPGA/CPLD > without the ability to simulate the design unless you buy a $1000+ > simulator. Neither the free version nor the eval version allows testing on > anything over 500 lines. At that limit on my machine, it simply closes - no > slowing down. > > Does anyone have a lower cost alternative, preferably one that would accept > the HDL bencher output? Hi Ed and all: I have used the WebPACK HDL simulator (i.e. Mentor Graphics at-last-free ModelSim simulator) and it runs ok for small-medium designs (see http://www.DTE.eis.uva.es/OpenProjects/OpenDSP/index.htm). The *trick* is to use VHDL or Verilog to design the circuit (up to you know 500 lines) and use TCL (*.cmd) files to simulate (in HDL, of course, I wish to simulate the routed design with it, but I cannot: I use Foundation). Yes, if you use *.cmd files for simulations, instead HDL-benches, you can simulate bigger designs. Another diference: the same 100K-lines code simulated with 500-lines limitation was 120 seconds; without such limitation it was 2 seconds. Nice, but why to pay! > > Ed Browne > Precision Electronic Solutions > Regards, Santiago (sanpab@eis.uva.es).Article: 37189
Allan Herriman wrote: > Here's the logic generated by crctool for one bit of a 16 bit CRC with > 128 bit input word: > > D := Data; -- the input word > C := CRC; -- the feedback word > > NewCRC(0) := D(127) xor D(125) xor D(124) xor D(123) xor D(122) xor > D(121) xor D(120) xor D(111) xor D(110) xor D(109) xor > D(108) xor D(107) xor D(106) xor D(105) xor D(103) xor > D(101) xor D(99) xor D(97) xor D(96) xor D(95) xor > D(94) xor D(93) xor D(92) xor D(91) xor D(90) xor D(87) xor > D(86) xor D(83) xor D(82) xor D(81) xor D(80) xor D(79) xor > D(78) xor D(77) xor D(76) xor D(75) xor D(73) xor D(72) xor > D(71) xor D(69) xor D(68) xor D(67) xor D(66) xor D(65) xor > D(64) xor D(63) xor D(62) xor D(61) xor D(60) xor D(55) xor > D(54) xor D(53) xor D(52) xor D(51) xor D(50) xor D(49) xor > D(48) xor D(47) xor D(46) xor D(45) xor D(43) xor D(41) xor > D(40) xor D(39) xor D(38) xor D(37) xor D(36) xor D(35) xor > D(34) xor D(33) xor D(32) xor D(31) xor D(30) xor D(27) xor > D(26) xor D(25) xor D(24) xor D(23) xor D(22) xor D(21) xor > D(20) xor D(19) xor D(18) xor D(17) xor D(16) xor D(15) xor > D(13) xor D(12) xor D(11) xor D(10) xor D(9) xor D(8) xor > D(7) xor D(6) xor D(5) xor D(4) xor D(3) xor D(2) xor > D(1) xor D(0) xor C(8) xor C(9) xor C(10) xor C(11) xor > C(12) xor C(13) xor C(15); > > (Switch to fixed point font.) > > Here's the logic you'll end up with: > > clock-----------------------+ > | > +-------+ +----------+ > | huge | | register | > input-->| xor |----->|d q|--+-> CRC out > (128) | tree | (16) | | | (16) > +-------+ +----------+ | > ^ | > | | > +------------------------+ > feedback (16) > > The "speed" is determined by the minimum clock period, which in this > case is limited by the number of logic levels in the xor tree - i.e. > the maximum delay between any flip flop output and any flip flop > input. > You can't do anything with this directly, as the feedback must happen > in a single clock cycle. > > If you look more closely at the logic expression, you'll see that it > can be decomposed into the form (input xor feedback) where input is > the xor of a bunch of input bits, and feedback is the xor of a bunch > of feedback bits. > > This leads to the following design: > > clock--------------------------------------+ > | > +-------+ +-------+ +----------+ > | medium| | small | | register | > input-->| xor |----->| xor |----->|d q|--+-> CRC > (128) | tree | (16) | tree | (16) | | | out > +-------+ +-------+ +----------+ | (16) > ^ | > | | > +------------------------+ > feedback (16) > > This isn't any faster than the first attempt, but notice that the > "medium xor tree" is not in the feedback path. This means it can be > pipelined - we can put flip flops in the logic so that the calculation > is performed over several clock cycles. The logic depth between any > flip flop output and any flip flop input is reduced - we can have a > faster clock. > > This is shown here: > > clock-----------------------+----------------------------- > | > +-------+ +----------+ +-------+ +- > | medium| | register | | small | | > input-->| xor |----->|d q|----->| xor |----->|d > (128) | tree | (16) | | (16) | tree | (16) | > +-------+ +----------+ +-------+ +- > ^ > | > +------------ > feedbac > > (I pruned the right side to avoid line wrap, but you should get the > idea.) > > In theory the synthesis tools can do all this for you. E.g. you can > describe a serial CRC calculation, put it in a for loop to iterate > over the input word, tell it how many clock cycles to take, and the > synthesiser should spit out something equivalent to the above. > (I have used this approach with LFSRs with some success at these bit > rates.) > > I could make a comment about the relative benefits of HDLs and > schematics for high speed design, but I don't want to ignite yet > another religious war. > > Regards, > Allan. I am not clear about how you generated this logic, but it does not match the general problem. Even though there are only 16 bits in the CRC, there should be 128 bits in the "feedback" register as well as in the input. This means that there would be about the same number of feedback signals to the "small" XOR tree as there are input signals to the medium tree. So pipelining will improve your complexity roughly by a factor of 2, but not so much more as your analysis above indicates. This of course does not reduce the number of logic levels by 2, but only a half LUT when using 4 input LUTS. Try this with a very simple one like X43. You start with 43 bits in the register and have to add one bit for every extra bit in the input word. If you have 16 bits in at one time, you need a 58 bit feedback word. Hmmm... does that mean that there should be 128 + C - 1 bits in the register, where C is the size of your CRC? I don't remember that being the case. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAXArticle: 37190
Theron Hicks <hicksthe@egr.msu.edu> wrote in message news:<3C0930CF.8C3C49B4@egr.msu.edu>... > I have a problem with the manual floorplanner in ise4.1. I have a > design which I know will place and route but the system will not quite > make it if I use all coregen parts. If I use mostly inferred counters > and adders it will work OK. If I try to manually place the parts then > they get screwed up. I woulkd like to use the absolute simplest method > to located the coregen parts using the UCF file > ... I've seen this problem too. Usually, it works to floorplan one flip-flop from the middle a counter. The others will be placed correctly by PAR. If you just have carry logic without flip-flops, you're out of luck. You can try an area constraint (fit the logic in a rectangle). I think this is possible with the floorplanner. The best solution is to instantiate each element of the carry chain. Even this doesn't always work with the floorplanner.Article: 37191
I have been looking at an upcoming design where I will have 2 16-bit channels of downconversion/decimation and a 16-bit channel of upconversion/interpolation. My two sample rates are 102.4MHz and 25.6MHz, with overall dec. by 4 or interp. by 4. I am also performing shifts of 12.8MHz which works out to fs/8, fs/4 depending on which stage I perform them. The fs/8 shift is less attractive because of the root(2)/2 terms and it would run at 102.4MHz. However, this fs/8 shift would allow me to use a single filter per channel, versus 2 filter stages and fs/4 at 51.2MHz. I have been targeting the VirtexE or Virtex2 families. Along with Matlab sims, I have been generating DA FIR cores to get some size/speed estimates for FIRs with 16-bit inputs and 16-bit coefficients. While the size/speed of Serial DA and nearly-Serial DA approaches are attractive, I need full rate or nearly-full rate filters. This has been leading me towards full parallel DA FIRs. The first thing apparent is that these start to get large, but intuitively I would think that these would approach the size of multiplier-based designs? So, if I'm heading towards MAC based FIRs, I wonder about using the Virtex2 and it's dedicated multipliers. I understand from my local Xilinx FAE that MAC-based FIR cores may be in the next Coregen update? I could use 4 Block Multipliers in a polyphase-type arrangement for my dec. by 4 paths. I could also exploit Halfband or other filter symmetry to implement efficient FIRs. I would like to get some input from others as to what they have done for similar applications. Thanks for any insight! Brady Gaughan Airnet Communications bgaughan@nospam.aircom.comArticle: 37192
This is a multi-part message in MIME format. --------------F12C70FD53BDB3149E9E13A8 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit For this case, I would suggest to instantiate an FDCE rather than an FDC. An FDCE is a primitive rather than a macro so you would not need an external XNF of other file to describe it to the tools. You can tie the clock enable to a logic 1 to keep it permanently enabled and should serve the same function. This does not explain your problem, but should get you going much quicker. --- Brian Don Teeter wrote: > Please help if you can. My VHDL design in Xilinx Foundation 2.1i uses > library macro FDC. To instantiate I include the provided file FDC.XNF as a > source file. At some point during synthesis the XNF file changes, losing > some ports. Then when attempt to implement, I get error messages: > > Error: The pin 'D' of the cell 'cmdproc/U1' does not have an associated > signal in the XNF design 'fdc'. (FPGA-LINK-17) > > One of these messages for each of three now-missing ports. If I look in the > XNF file I see it has changed and the ports are missing from it. What > gives? How to prevent? Thank you, > > Don T. --------------F12C70FD53BDB3149E9E13A8 Content-Type: text/x-vcard; charset=us-ascii; name="brian.philofsky.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for Brian Philofsky Content-Disposition: attachment; filename="brian.philofsky.vcf" begin:vcard n:Philofsky;Brian x-mozilla-html:TRUE adr:;;;;;; version:2.1 email;internet:brian.philofsky@xilinx.com fn:Brian Philofsky end:vcard --------------F12C70FD53BDB3149E9E13A8--Article: 37193
I love seeing comments form others that reinforce the gripes I've had over time. My experience with the Altera MaxPlus-II tools is dated with no Quartus to back mu up but what I saw then is consistent with what I continue to see in Xilinx: Nobody is doing "critical route" placement. In my opinion, the best way to place and route a design is to figure out what paths will be the most difficult to route. When the delay paths with two carry chains and four levels of logic are routed first, the paths with two levels of logic should be cake to P&R with fewer resources available. I don't care if my logic goes from one corner to another if it doesn't impact the timing for that path and the critical routing resources have already been used. What does irk me is finding part of my few tight paths getting placed inefficiently. To have these critical paths in different rows in my Altera design or in different, non-adjacent CLBs in my Xilinx designs is irresponsible. The Xilinx mapper in particular works against the concept of a critical route based place and route. The idea that the design should 1) be placed then 2) be routed doesn't work (to any level of efficiency). The P&R tool should be 1) place, 2) route, 3) place, 4) route, 5) place, etc.... At least in (an upcoming servicepack of) the version 4.1i tools there's some attention given to critical routes, though more of a ripup and retry approach. Strike that, I think they refer to it as a retry: no ripup of any paths that meet timing. This is a big step in the right direction but still may be too little, too late in the P&R process to give the leaps in performance. With the proper P&R strategies, the silicon that's been designed to kick some serious butt will finally be able to do just that. Having the P&R kick the engineer in the butt really needs to stop. I hope the research you're doing is toward a very good end!Article: 37194
Just because someone violates the spec, doesn't make it right, or something other designers can/should do. The spec IS the spec, like it or not, agree with it or not... It doesn't mean something done outside the spec won't "work", depending on your definition of "work". Having designed a dozen or so PCI cards (as well as PCI cores), I would strongly urge people to stick to the spec. That typically minimizes problems, especially unless you're willing to do VERY extensive testing with all existing motherboards and plug-in cards, in every conceivable configuration...fully loaded, in every different slot etc, through full temperature and voltage ranges...and continue testing as new boards etc. come out... > |> Did you read the PCI spec carefully? The PCI spec requires power and ground > |> planes, since the maximum distance for the PCI power/ground connector pads > |> to the plane is .25", as stated in 4.4.2.1. <...> > > Then 80% of the cheaper network and soundcards violate the spec. I have never > seen a RTL8139 based network card with a multilayer PCB. > > -- > Georg Acher, acher@in.tum.de > http://www.in.tum.de/~acher/ > "Oh no, not again !" The bowl of petuniasArticle: 37195
Well, here's a complaint about Lattice's tools. For some reason, Lattice thinks that designers care about how many logic levels it takes to implement a function. See, I don't care. All I care is that the finished design meets my timing constraints (and fits). Problem is, Lattice's tools don't know a timing constraint from a hole in the wall. What their tools expect you to do is to pick a combination of fitter options, press "go" and after the place and route completes (if, in fact, it does), you have to manually go through the timing reports to see if you win or lose. And when you lose, you have to go back in and pick a different bunch of options. The fitter "effort" switch doesn't do what you think it does, it just picks a different algorithm. The "Explore" feature is broken. If you want to take advantage of the fast I/O output enables, you have to set a constraint in a constraint file that's call "end critical path." And the fitter will then warn you that there's "no combinational logic..." to minimize if you drive your output enable from a flop. (It still "does the right thing," but the warning is stupid.) I've told the Lattice rep more than once: I want to be able to set a period constraint and I/O timing constraints, push the "start" button, and go get a cup of coffee or get some lunch, and come back and find my chip either routed or failed to meet timing (or it wouldn't fit). I haven't even mentioned how unroutable their chips are. ---aArticle: 37196
Neil Franklin wrote: > Presently my real application is my design for running on such an > board (and being developed on an normal prototype board). Custom board > will follow after, to let more users use the design with less hassle. What's a "normal prototype board"? Don't tell me you're gonna wire-wrap this thing. Question: which open-source PCB layout tool will you be using for your custom circuit-board layout? Comment: all of the freeware/inexpensive board-layout tools suck, for many reasons. > Directly drive SDRAM off of the FPGA. There exist XAPPs on that. You don't need an XAPP for that. Just read any SDRAM data sheet. Piece of cake. I hope that non-lazy college professors will start having their students design DDR SDRAM controllers instead of "Traffic Controllers" and "Vending Machines." --andyArticle: 37197
So it should be possible to simulate large designs with the free version, as long as you have the time? How much does the time penalty amount? "Ed Browne, Precision Electronic Solutions" <ed_b_pes@swbell.net> wrote in message news:05uN7.600$oO4.343960630@newssvr11.news.prodigy.com... > It's appalling that Xilinx would sell a product to design an FPGA/CPLD > without the ability to simulate the design unless you buy a $1000+ > simulator. Neither the free version nor the eval version allows testing on > anything over 500 lines. At that limit on my machine, it simply closes - no > slowing down. > > Does anyone have a lower cost alternative, preferably one that would accept > the HDL bencher output? > > Ed Browne > Precision Electronic Solutions > > "Theron Hicks" <hicksthe@egr.msu.edu> wrote in message > news:3BFA68F1.10118196@egr.msu.edu... > > > > > > Leon Heller wrote: > > > > > Sorry, I've just checked the Xilinx version. It is only for small > designs. > > > > > > -- > > > Leon Heller, G1HSM leon_heller@hotmail.con > > > http://www.geocities.com/leon_heller > > > Low-cost Altera Flex design kit: http://www.leonheller.com > > > > It will work with much larger designs. It just runs slower. > > > > > >Article: 37198
Andy Peters wrote: > > Directly drive SDRAM off of the FPGA. There exist XAPPs on that. > > You don't need an XAPP for that. Just read any SDRAM data sheet. Piece > of cake. I hope that non-lazy college professors will start having > their students design DDR SDRAM controllers instead of "Traffic > Controllers" and "Vending Machines." Agreed, but I would still encourage FPGA users to consult the free app notes ( Xilinx labels them XAPP ). They are sometimes very good, sometimes so-so, but they usually are well-documented, and they are FREE. And you can do with them whatever you like, just don't ignore them off-hand. Peter Alfke, Xilinx ApplicationsArticle: 37199
Dan wrote: > > Hello, > > I am shipping a 2 layer PCI card (33mhz-32bit). It uses a Xilinx with a 2.5V > core and 5Volt tolerant IOs. ( XC2S50-5PQ208C) > > I laid out the board with as much ground plane on the bottom and as much > routing on the top as was possible. Its 90% ground plane. I believe that > this works OK on many PCs but I think I still need to improve the electrical > characteristics of the board for proper operation across all PCs. > > I currently use through hole by pass caps all around the perimeter of the > Xilinx chip. > > I am sure things will get better by switching to both surface mount caps and > a four layer PCB. My question is how important is each of these two > improvements when compared to one another ? For example X% of the > improvement will come by switching from through hole caps to surface mount > and (100-X) % of the improvment will come from switching from two layers to > four layers. > > I am wondering if simply switching to surace mount caps will give enough of > a boost in performance. > The PCI spec does not specify the type of caps because it is in your own interest to keep the supply clean. The spec does specify a four layer board and a certain track geometry because the mainboard expects certain impedances and timings. If you don't stick to this, it can cause other, unrelated things in the system to break in a most amusing way. It is no secret that most motherboards can work with cards outside the spec. One of my designs (2 layers) did work on the top of a (2 layer, 15cm) slot riser card in all computers I could get hold of. This was for a research project and I wouldn't even dream of selling something like that. Your design may work or it may not. Which one depends on the design of the rest of the system, the particular chips used, the temperature and the moonphase. This also means that it is not necessary your card which stops working, there can be side effects. Iwo
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z