Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Ben Twijnstra wrote: > Jim Granville wrote: > >>Any info out yet on what MAX3 looks like ? >>Does it improve Static Icc, and lack of memory of MAX II, for example ? >>Smallest devices / largest devices ? > > > Nope - Altera's silicon design team is quite busy with Cyclix III. If you > have any good suggestions on specs outside the hobby sphere, now might be a > good time to post them... Now there is a strange reply. Do you mean suggestions on MAX 3, or this Cyclix III ? If the silicon design team is quite busy, the Architecture team has therefore finished, and so suggestions are probably too late anyway ? -jgArticle: 107576
On 2006-08-30, tullio <tullio.grassi@gmail.com> wrote: > I know this topic has been discussed before, but i am looking for some > ideas. > My design is synchronous, I use ISE8.2.02 XST and ModelSim, I have a > lot of signed logic; I get different results from behavioral and > post-Route simulations. I had some problems using signed logic in Verilog using an older version of XST although I do not remember the version any longer. I introduced a workaround to avoid the bug. I haven't checked wether the bug is still present in the latest version of XST. (Since I was not allowed to open a webcase at the time I discovered this bug I reported the bug on the Xilinx University Program forum but I didn't hear anything back about it.) The source code and screenshots of the waveforms of behavioral and post-map simulation are available at http://www.da.isy.liu.se/~ehliar/xstproblem/ . I'm afraid dct_workaround.v and dct.v differ quite a lot, but you can search for "synthesis bug" in dct_workaround.v to see the workaround I did. /AndreasArticle: 107577
"Martin Schoeberl" <mschoebe@mail.tuwien.ac.at> wrote in message news:44f4e0c4$0$11352$3b214f66@tunews.univie.ac.at... >> proprietary Altera design, or are there open source implementations >> available?" The switch fabric itself can easily be written, it is on the >> order of six lines of code per interface for a point to point connection, >> there is nothing really magic in what Altera spits out of SOPC Builder >> based > > I think that this is actually the power of SOPC builder. It will > do all your glue logic stuff for the interconnect (address > decoding, byte order managing, byte enable on write,...). > > And that's a lot more than 6 lines of code ;-) You missed my caveats about the 6 lines of code being 'per interface' for a 'point to point connection'. Below is the six lines of code that will implement what SOPC Builder spits out for such an interface when neither master or slave has latency. 1: slave_chipselect <= (master_read or master_write) and To_Std_Logic(master_address = Hard_coded_Address_range); 2: slave_write <= master_write; 3: slave_read <= master_read; 4: slave_writedata <= master_writedata; 5: master_waitrequest <= slave_chipselect and slave_waitrequest; 6: master_readdata <= slave_readdata; When you have a latency aware master and a slave then there is the one additional line of code to connect the two 'readdatavalid' signals. Obviously these are the most simple cases, but surprisingly enough I've found that between functional blocks on a single chip the six lines (or seven) are in many cases sufficient. When you get into the various forms that a master and slave can take the number of lines of code per interface can balloon quite rapidly since you need to take into account differences in data bus widths, connecting latency aware master/slaves to ones that are not, multiple bus masters, etc. I completely agree with you though about the power of SOPC Builder as a tool to implement that somewhat rote exercise of interconnect based on information solely in the .PTF file. I did run into several problems in using SOPC Builder that generated several service requests to Altera that resulted in - Me having to completely replace the SOPC Builder generated code. But since I was connecting all Avalon components it was a tedious (but totally straightforward) process. - Getting a T-shirt from Altera for pointing out the various deficiencies in their tool....and I thought the tool, having been around for several years, would've been a bit sturdier. - Promises from Altera to improve the tool. Supposedly it is much better in Quartus 6, I haven't tried yet but I will because even if I end up having to go down the same path, what SOPC Builder brings to the table has the potential to replace a good chunk of tedious, mindless work. > >> an open source version of this connection logic, but whether simple use >> of the Avalon bus without also targetting an Altera device (even if no >> Altera software is involved) is violating anything is an open legal >> question as you've pointed out (I'm guessing that it might but not really >> sure). > > AFAIK the bus definition is kind of open-source - free. That's my belief but I haven't really investigated whether that is true or not. > However, > I'm sure you're not allowed to use the SOPC builder generated > VHDL code on a Xilinx device ;-) That is definitely the case. > BTW: I asked Altera Austria about a related topic: Is it allowed > to 'use' the DRAM controller in an open-source environment (means > can I upload the VHDL code to a web server). However, they had > no real answare to this. They sayed that the SDRAM controller > does is part of NIOS and does only work with NIOS. Therefore, > one has to buy a NIOS license. But it works quite well with JOP > too ;-) I'm not sure that's really true. I think the SDRAM and DDR controllers are bundled as a part of Altera's Megacore which is a part of Quartus. You can use all of those and SOPC Builder without a NIOS license. I'm also thinking though that the DRAM controller itself would not be open source since it was generated by Altera for use by people that they would like to target Altera devices. > >> Section 2 >> In the paragraph starting "The third issue is..." you ask the question >> "Why not force the slaves to hold the data on the output registers as >> long as there is no new transaction?" A couple follow up questions to >> that though would be >> - What is a SimpCon slave to do if there IS a new transaction before the >> old one has been acknowledged? > > good question ;-) It depends on the pipeline level. It can accept > it. But this is not directly related on the request to 'just' > keep the data valid until a newer one is available (and was > requested). I was going the devil's advocate route and asking what 'should' happen since if a new read starts before the old one has been acknowledged this implies that the slave would then have to keep a queue of 'previously requested but not yet read acknowledged' data somewhere. Presumably when that queue fills up the slave would have to have someway to say 'Stop! I can't take it anymore' which I *think* might mean that the SimpCon slave would keep rdy_cnt set to 3. > >> - Does the SimpCon fabric prevent this from happening? (I think it does, >> but not exactly sure) > > It is the master who decides when to issue a new request > and when to leave the slave with the last data. And this may also be the answer to the above that a SimpCon master is not allowed to initiate a new request until it has acknowledged reading the previous. >> having the master 'know' the latency about the 'slave' would seem to be >> cheating (since this wasn't required by the SimpCon implementation as I >> had thought) but I guess I'll fall back to what Tommy mentioned earlier >> that since JOP was optomized for SimpCon in the first place it implies >> that an Avalon/SimpCon bridge must be built and such bridges can tend to >> be either > > about cheating and bridge: I'm allready cheating on the Avalon > interface (as mentioned in an earlier post) to generate the > address/control/data holding in the master - I switch between > the original single cycle register at the first cycle > and a hold register on the following cycles - that's not > allowed in the original spec. Actually I'm not sure that what you had is cheating. Avalon requires address, read, write and writedata to hold when the master receives waitrequest but it also says that it really only cares about signals at the rising edge of the clock. The SOPC Builder generated code though generates an assert if any of those signals change without regarding the clock. This to me is an overly agressive assert that I think goes beyond what the Avalon bus specifies. Maybe a question for Altera. >> be). Had JOP been optomized for Avalon to begin with would the numbers >> be any better without any cheating? That's sort of the open question and >> I'm not necessarily expecting an answer. > > Mmh, hard to say. I implemented the memory interface without > any SoC interconnect in mind - just tried to get the best > performance on SRAMs. The bus thing came very late in the > design. So it looks like I'm now defining a bus that 'fits' > to the way the original memory interface of JOP was. > However, perhaps that's not that bad as a new idea for a > differnt bus comes up ;-) No, in fact it's generally a good idea since you've at least used that bus definition to implement a real something. > >> My main interest in the thread was in understanding what sort of >> bottlenecks might be lurking in Avalon that I'm not aware of. A couple >> areas of 'possible' weakness for Avalon that I am aware of are: >> - It can hang (there is no requirement for a max time for a cycle). >> Something to be aware of, but generally not an issue since whoever is >> designing the slave components had better address this and not allow for >> a hang to occur. >> - No notion of a 'retry'. Again, given the environment of being on a >> chip, the slave design shouldn't be allowe to say 'try again later >> please' so I don't think this should be an actual design issue, just >> something to be aware of. > > Implementing those two features makes your bus (and interfacing) > way more complicated. AFAIK OPB does it, but you end up with > so many signals... I agree. It's good to be aware of the issues but I don't think they are that important in many cases and not worth overcomplicating the bus to accomodate it. > >> - Can't have pending reads from multiple slaves. I suppose this could be >> important to some, it hasn't for me. > > That's more an issue of the interconnect logic and not the > bus definition. Specifically logic resource usage in the interconnect. Allowing for this feature would bloat the interconnect logic and, in many cases, such bloat would not be required....at best, if such a thing is available you, as the designer, would like to be able to specify whether such a thing is really needed so you cna make an informed decision on the tradeoff. > > You can do single cycle pipelined read with a latency of two > cycles (compared to just a single one in AMBA AHB). Perhaps > I should provide that example too. When the slave supports it > you can issue a new read request when rdy_cnt is 2. When > you do it than rdy_cnt will stay at 2 - the 2 cycle pipelining. OK. Not sure what the pipeline delay through an off the shelf DRAM controller is. It might be more than two though even when things are flowing along (i.e. not refreshes are occuring). I'm guessing that it is probably less than 8 which might mean that one more bit to the rdy_cnt might allow you to connect cleaner to DRAMs and get full performance. > > BTW: I don't know VSIA. Any link to it? > http://www.vsia.org/documents/vsiadocuments.htm I'm aware of, but I haven't designed anything to their 'on chip bus' specification. I get the impression from what I've heard that the ASIC guys may use it. Since our volumes never justify the NRE to do an ASIC, we use FPGAs so I've been more focused on tools and techniques to improve design productivity inside an FPGA. > Completely agree - will remove this 'controversal' as this thread > is more 'constructive'. Thus ends the controversy over the use of the word 'controversal'. KJArticle: 107578
Hi if anyone has seen the same or has any ideas how to avoid the issue I am faced please help - I am trying to get it solved myself (my deadline is today 21:00 german time) but kinda stuck currently. problem: Virtex-4FX12 2 DCMs in series DCM1 uses only FX output for PPC system clock (to get the clock into DLL input range) DCM2 generates 3X clock proc clock for PPC it all works for 360 milliseconds after configuration. then the first DCM will remove lock, output stop, everythings stops. the delay 360 milliseconds is not dependand on the first DCM clock ratio settings. if PPC is held in reset then the DCMs still shut down after the same 360 milliseconds. any ideas? what to check? I have Lecroy 2GS/s DSO on some signals and power supply lines but not seeing anything at the time where the DCM shut off. thanks in advance for any suggestions, AnttiArticle: 107579
Martin Thompson wrote: > mikegurche@yahoo.com writes: > > <snip commentary on variables> > > > > > In synthesis, the problem is normally the abuse of sequential > > statements, rather than the use of variable. I have seen people trying > > to convert C segment into a VHDL process (you can have variables, for > > loop, while loop, if, case, and even break inside a process) and > > expecting synthesis software to figure out everything. > > > > Why not do this? Synthesis software is good at figuring all this > out. If it does what you need it to and meets timing, you're done. > Move on to the next problem. > > Personally, I have seen people spend far too long doing very explicit > coding of detailed stuff, giving the synth tool very little to do, > which for a relatively low-performance (still in the 10s of MHz > though) design, was a waste of effort. The so-called "naive" approach > of writing code in a natural "softwary" way and letting the synth sort > it out would have left us more time to sort out the one nitty-gritty > bit of code which did have a performance problem. > > Sure, if you are pushing the performance envelope, you're going to > have to put more work in. If you are doing a high-volume design then > you might get in a smaller part and save some money by putting the > effort in. But that's just an engineering-tradeoff like any other. > Softies do it all the time, optimising their hardcore interrupt > handlers, leave the rest to the tools. I assume civil engineers do > similar things with their bridges as well :-) Is TRW still around? I thought they were bought by Northrop Grumman. I guess some part of TRW was not part of that deal? I used to word in Defense Systems in McLean or whatever they called it that week. I guess I am too old school to feel good about using 'C' like code. Sure if it works, do it. But I always think in terms of hardware and like to know what I am building before I let the tool build it. I guess I would not want to debug a design where I didn't know what the tool was doing. Then I would be debugging software and not hardware. Maybe that works for some people, but I like to know the hardware I am building so I know exactly how to debug it. That also includes avoiding certain types of bugs that are caused by poorly designed hardware. If the tool generated the hardware then I can't say it doesn't have race conditions and such.Article: 107580
Symon wrote: > "rickman" <gnuarm@gmail.com> wrote in message > > Having a peak is not bad if it is still below the max impedance you are > > trying to achieve. > > > But it's better not to have it, right? Better than what? If it meets your requirements, then you are done. > So, you loaded up the LTSpice simulations I posted, right? That's not > speculation. And no, you haven't lost the battle with a few tiny traces, the > vias are the bad guys as they give you loop area. Your simulation was for two caps with no power plane, right? That is not a useful simulation. > So, you advocate decoupling the power plane without considering what effect > this has on the IC? Why would you go to all that effort if the package > you're stuck with prevents your efforts making any difference? A quiet plane is part of the solution. If your package produces ground bounce that blows your noise budget, you have no hope of building a good design. If so, you need to get a part with a better package. I don't get what you are saying. > > This is not borne out by the facts. If you can get your hands on > > Richey's book I would suggest that it is a valuable addition to any > > library on SI and EMI. His volume 2 will cover EMI in more detail and > > I am looking forward to it. > > > I have different facts, look at the sims I posted. Ok, is this conversation coming to an end? I don't want to argue about this. Your simulation was for two capacitors if I understood correctly. That has no bearing on the problem of power plane decoupling. Without simulating the power planes you aren't simulating anything useful. > > Wow! If your design is not high speed and the edge rates are not very > > fast, then power distribution is not a big deal. But nearly everything > > in this paragraph is incorrect. Yes, power planes cost money, that is > > true. Now that I understand how simple it is to figure out how power > > distribution works, I would never use any of these ideas on a board > > where I needed good noise margin or had high speed signals. > > > I see from this paragraph you may not have grasped the effect that the BGA > package connections are having on the PDS design. As I said, the whole point > of the exercise is to get good supplies on the IC, not the power plane. The > plane capacitance has such high Q it's good to severel GHz, I reiterate that > you can't benefit from this on the device. I suggest you look at how Xilinx > themselves route the power to their Rocket I/Os on their demo boards. The > power supplies aren't on planes. The connection between the PCB and the IC > mean it's a waste of time, I suspect for these Gbit circuits they embed caps > on the FBGA. Yes, you need to consider the package inductance. But you can't expect the power plane to fix a problem with the package. You can analyze them separately. The power plane will have noise from the effects of all chips on the board. The chip package inductance will only affect that one part. So analyze how much noise each will contribute and do what it takes to stay within your noise margin. Beyond that it is not useful to analyze them together. > > BTW, I never said a high Q power plane pair is bad. Yes, it can create > > impedance holes at very high frequencies, but the alternative for your > > approach would raise the floor, not lower the ceiling. > > > I'm not saying it's necessarily bad. But it's not a great deal of help ON > THE SILICON. You've gotta get that HF energy through vias, bga balls, > traces(maybe) to the device. Any noise you have on the power planes will add to the noise that the silicon sees. > Only a nutter would do this without thinking about it and running some > simulations. So, take a look at my LTSpice sim posts. Are you thinking of simulating with the power planes?Article: 107581
Symon wrote: > "rickman" <gnuarm@gmail.com> wrote in message > news:1156900847.672642.116440@b28g2000cwb.googlegroups.com... > > I can take your word for your results as I don't understand any of the > > code you posted. But this simulation is not of a power distribution > > system. Try adding a power plane and just one value of cap (the 0.1uF > > is what you suggested IIRC) and simulate up to 200 MHz or so. I think > > you will find that you get a *huge* parallel resonance peak from the > > two. If you add a few 0.01 uF caps this peak will be split in two and > > the highest will be much smaller. Then add a few more 0.001 uF caps > > and you will see the peaks reduced further. It also helps if you use > > caps with a low Q factor or high ESR (relatively speaking). But then > > at 0.1 uF, I don't think you will find a C0G so X7R should be fine. > > > > Symon wrote: > > > OK, here's a LTSpice file that shows a resonance between disimilar cap > > > values. The circuit sweeps from 5MHz to 45MHz back and forth. The first > > > three sweeps are done with two 1uF caps, the second three with a 1uF and > a > > > 0.1uF cap. Notice the big resonance at about 10MHz for the second set of > > > sweeps. The performance of the second circuit is slightly better at > 45MHz, > > > worse at 5MHz, MUCH worse at 10MHz. > > > HTH, Syms. > > > > > > Version 4 > > > SHEET 1 1516 904 > > > WIRE 1040 432 800 432 > > > > ....snip... > > > Tell you what, why don't YOU download the simulator from Linear Tech's > website and learn how to simulate what you suggest? Then you can prove stuff > to yourself without having to go to expensive classes. :-) You might even > want to simulate the stuff you learned at your class with a real world > situation and see how much benefit you get. Let us know how you get on. > Good luck, Syms. I will be doing that when I am ready to layout my next board.Article: 107582
Martin Thompson wrote: > mikegurche@yahoo.com writes: > > <snip commentary on variables> > > > > > In synthesis, the problem is normally the abuse of sequential > > statements, rather than the use of variable. I have seen people trying > > to convert C segment into a VHDL process (you can have variables, for > > loop, while loop, if, case, and even break inside a process) and > > expecting synthesis software to figure out everything. > > > > Why not do this? Synthesis software is good at figuring all this > out. If it does what you need it to and meets timing, you're done. > Move on to the next problem. > If the synthesis software is really this capable, there is no need for hardware engineers. Everyone can do hardware design after taking C programming 101 and we all will become unemployed :( Let me give an example. Assume that we want to design a sorting circuit that sorts a register of 1000 8-bit word with minimal hardware. For simplicity, let us use the bubble sort algorithm: n=100 for (i=0; i<n-1; i++) { for (j=0; j<n-1-i; j++) if (a[j+1] < a[j]) { /* compare the two neighbors */ tmp = a[j]; /* swap a[j] and a[j+1] */ a[j] = a[j+1]; a[j+1] = tmp; } } The hardware designer's approach is to develop a control FSM to mimic the algorithm. It can be done with one 8-bit comparator in 0.5*1000*1000 clock cycles. If we ignore the underlying hardware structure and just translate C constructs to corresponding VHDL constructs directly (the C programmer's approach), we can still derive correct VHDL code: process(clock) variable a: std_logic_vector(999 downto 0) of std_logic_vector(7 dwonto 0); variable tmp: std_logic_vector(7 dwonto 0); begin if (clock'event and clock='1') then -- register q <= d; a := q; -- combinational sorting circuit based on -- one-to-one mapping of C constructs for i in 0 to N-2 loop for j in 0 to N-2-i loop if (a(j+1) <a(j)) then tmp := a(j); a(j) := a(j+1); a(j+1) := tmp; end if; end loop; end loop; -- result to register input d <= a; end process; The resulting circuit can complete sorting in one clock cycle but requires 0.5*1000*1000 8-bit comparators. We need a extremely large target device to accommodate the synthesized circuit. It will be very demanding for synthesis software to convert this code into a circuit with only one comparator. I think my job is still safe, for now :) Mike G.Article: 107583
> > > > I agreed with you completely. What I am trying to say is that variable > > may not be synthesizable if you write the code with a "C > > mentality." > > I'm not sure I agree that variables are the problem at all. There are > many ways to write code that is not synthesizable. This can be done > with signals as well as variables. The difference between signals and > variables is just that the value of a variable is updated immediately > just like a 'C' variable. Signals are only updated at the end of the > process. So if you make an assignment to a variable and then use that > value in a calculation in the same process, the new value will be used. > If you do the same thing with a signal, the old value of the signal > will be used. I don't know of a way that this can be unsynthesizable. > Variables can not exist outside of a process, IIRC. So the variable > must be assigned to a signal in order for it to affect anything outside > the process. So in reality, it can only be used as an intermediate > value in an assignment to a signal. I guess my wording is not very clear. Let me elaborate on the statement. A variable in C is a symbolic memory location in a computer and its function is close to a register in hardware. A statement like sum =0; for (i=0, i<10000, i++) sum =sum + a[i]; implies that the addition is done 1024 times sequentially. With the "C mentality" and with no knowledge of the underlying hardware structure, the C code can be translated directly to VHDL variable and sequential statements: sum:= 0; for i in 0 to 9999 do sum := sum + a[i]; end loop; When synthesized, this will infer 9999 adders. The problem with this code is that it can be simulated correctly but leads to excessive logic. Few statements like this make the circuit too complex to be synthesized. To derive the right code, we have to think hardware and then use a register for sum and derive an FSM to add a[i] sequentially in 9999 cycles. Replacing the variable with a signal in the previous code cannot solve the problem and actually renders the code incorrect. It at least forces us to seek an alternative and think more about hardware. Variables/sequential statements themselves do not lead to a good or bad design, but provide a mechanism to describe the circuit in a very abstract fashion. Careless use of these constructs may lead to description that is too far away from the underlying hardware structure Mike G.Article: 107584
David Brown wrote: > I've not any high speed boards - the last board I made had internal > frequencies at 150 MHz, and an external bus at 75 MHz (overclocked in > testing to about 240/120 MHz), so maybe I'm missing something that > happens at higher frequencies. > > Using a simple tool such as Murata's software, I looked at the > impedances for different capacitors at different frequencies. To a fair > extent, the inductance is determined by the package size (and the board > vias and traces), while the capacitance obviously goes up with the cap's > value. So choosing a 0.01 uF instead of a 0.1 uF cap increases the > capacitance side of the impedance curve by a factor of 10, and leaves > the inductive side unchanged. It changes the peak frequency, but I fail > to see why that should make a real difference - it has the same or > higher impedance across the frequency range. Given that the 0.1 uF type > has lower ESR (being made of more parallel plates), I can't find any way > in which the 0.01 uF is better. So as Symon says (unless I'm > misinterpreting him), the best arrangement is to pick the smallest size > package you can conveniently mount (0603 for us), then the largest > capacitance value you can conveniently and economically get in that size > (100 nF), and use as many as needed for the board. Placement should be > close to the device where possible, but is not very critical as long as > it is within the range of the mini power plane (i.e., polygon on a > signal layer). > > It works for me - but then again, I'm not doing really high-end cards. The amount of decoupling required is not a function of the clock rate. It depends on the slew rate of your signals and the length of the transmission lines. The length determines the lower frequencies you will need to decouple and the slew rate determines the highest frequencies. Of course there are other aspects that you need to decouple, such as switching inside the chips. For that you need to compare the maximum transition in current the chip will produce to the maximum noise voltage you can tolerate. Then use the resulting impedance as the goal for decoupling. Ritchey's data was very clear on this. Adding a single value of caps to a power plane produced a resonance with a higher impedance than that of the plane alone over a significant frequency range. By using multiple cap values he was able to decouple a board with just a handfull of caps rather than the mountain that are normally used. Most importantly, he could show that his decoupling design worked correctly before he built the board rather than verifying it in testing. I wish I could post the images from Ritchey's book. I have tried to describe his measurements in detail, but a picture is worth a thousand words (or maybe more). Moving the SRF is what makes it work. If you use a hundred 0.1 uF caps you should get a parallel resonant peak in impedance as the capacitor resonates with the power plane. Assuming the capacitor has a high Q, then no number of capacitors will significantly reduce that peak. Of course the caps don't have a high Q so some number of caps *will* reduce the peak to an acceptable level. Or you can add a smaller number of caps with a smaller value. These caps will produce a higher frequency resonance with the power plane. You can then add a third value of cap to move that resonance higher. Each time you add a value of cap you flatten the impedance curve. If the caps are not high Q, by the time you have added 0.1, 0.01 and 0.001 caps you will have flattened it enough to not see any real peaks, but rather just ripple in the frequency response. This will take a lot fewer caps than the hundreds that are often used on boards, even at the frequencies you are using. Or think of it from the other direction. You add the 0.001 uF caps to decouple the plane at the highest frequencies that caps can be effective. But they don't work well at lower frequencies so add a smaller number of 0.01 uF caps to provide decoupling at a lower frequency. Then add just a small number of 0.1 uF caps for the lower freqs and so on down to the tantalum caps for bulk at the lowest frequencies until the PSU response time can adequately maintain the voltage.Article: 107585
Antti wrote: > When I opened webcase about the issue that Xilinx tools made fatal > failure when I tried to use the PMV from an hard macro the response was > that, "you dont need to know" - well now I know :) Ok, but when you create a schematic with a FPGA, which uses this undocumented feature and you produce some million parts, but Xilinx decide to cancel it in later revisions of the FPGA, you are lost. -- Frank Buss, fb@frank-buss.de http://www.frank-buss.de, http://www.it4-systems.deArticle: 107586
Frank Buss schrieb: > Antti wrote: > > > When I opened webcase about the issue that Xilinx tools made fatal > > failure when I tried to use the PMV from an hard macro the response was > > that, "you dont need to know" - well now I know :) > > Ok, but when you create a schematic with a FPGA, which uses this > undocumented feature and you produce some million parts, but Xilinx decide > to cancel it in later revisions of the FPGA, you are lost. > > -- > Frank Buss, fb@frank-buss.de > http://www.frank-buss.de, http://www.it4-systems.de sure - care should be taken when making any decisions. well currently it *IS* used by all Virtex-4 silicon, as the Xilinx tools use it for every design you make, you are just not seeing it. it is added silently, only visible in FPGA editor. so as long as the new Virtex-4 steppings are bitstream backward compatible the PMV feature will be there as well. its just the configuration clock primitive - that is for some reasons made unaccessible (or not easily) accessible for the FPGA users. AnttiArticle: 107587
mikegurche@yahoo.com wrote: > Let me give an example. Assume that we want to design a sorting > circuit that sorts a register of 1000 8-bit word with minimal hardware. > For simplicity, let us use the bubble sort algorithm: > > n=100 > for (i=0; i<n-1; i++) { > for (j=0; j<n-1-i; j++) > if (a[j+1] < a[j]) { /* compare the two neighbors */ > tmp = a[j]; /* swap a[j] and a[j+1] */ > a[j] = a[j+1]; > a[j+1] = tmp; > } > } > > The hardware designer's approach is to develop a control FSM to mimic > the algorithm. It can be done with one 8-bit comparator in > 0.5*1000*1000 clock cycles. > > If we ignore the underlying hardware structure and just translate C > constructs to corresponding VHDL constructs directly (the C > programmer's approach), we can still derive correct VHDL code: > > process(clock) > variable a: std_logic_vector(999 downto 0) of > std_logic_vector(7 dwonto 0); > variable tmp: std_logic_vector(7 dwonto 0); > begin > if (clock'event and clock='1') then > -- register > q <= d; > a := q; > -- combinational sorting circuit based on > -- one-to-one mapping of C constructs > for i in 0 to N-2 loop > for j in 0 to N-2-i loop > if (a(j+1) <a(j)) then > tmp := a(j); > a(j) := a(j+1); > a(j+1) := tmp; > end if; > end loop; > end loop; > -- result to register input > d <= a; > end process; > > The resulting circuit can complete sorting in one clock cycle but > requires 0.5*1000*1000 8-bit comparators. We need a extremely large > target device to accommodate the synthesized circuit. It will be very > demanding for synthesis software to convert this code into a circuit > with only one comparator. What is missing is some form of control to say, how many clock cycles it can take to implement the algorithm. Given that additional control on the synthesis the designer can make tradeoffs on how much logic versus how much latency is acceptable and the surrounding circuitry can be designed appropriately. Neither C nor VHDL inherently have this. I don't think this is equivalent to what is generally meant when synthesis tools do 'register retiming/balancing' either since even after this task they still will meet the overall clock cycle latency of the function. Recognizing that the overall clock cycle latency is really an up front design tradeoff to be made and correctly synthesizing code using that as an input parameter would be a leap forward. > I think my job is still safe, for now :) Yep, mine too. KJArticle: 107588
Has anyone actually measured the static power of the V4fx12 and v4fx20. I'd like to upgrade my design for the larger chip but I'm very power constrained so I don't want the larger chip (which I don't need for the current app) if it burns too much extra battery. Thanks, ClarkArticle: 107589
> > I don't think this is equivalent to what is generally meant when > synthesis tools do 'register retiming/balancing' either since even > after this task they still will meet the overall clock cycle latency of > the function. Recognizing that the overall clock cycle latency is > really an up front design tradeoff to be made and correctly > synthesizing code using that as an input parameter would be a leap > forward. > Hi, I think that's the intention of mentor's Catapult C synthesis program, though I haven't used it myself. Cheers, AndyArticle: 107590
Anonymous schrieb: > Has anyone actually measured the static power of the V4fx12 and v4fx20. I'd > like to upgrade my design for the larger chip but I'm very power constrained > so I don't want the larger chip (which I don't need for the current app) if > it burns too much extra battery. > > Thanks, > Clark Hi Clark, I dont have comparison data for FX20, but I have compared FX12,LX15,LX25 and what I can measure and feel with my nose at LX25 - I'd say if you are about to have an battery gadget then do whatever you can to stick with FX12 ! if needed add some other ICs to the system some low power MCU or lowpower PLD, but dont use larger FPGA AnttiArticle: 107591
Andy Ray wrote: > > > > I don't think this is equivalent to what is generally meant when > > synthesis tools do 'register retiming/balancing' either since even > > after this task they still will meet the overall clock cycle latency of > > the function. Recognizing that the overall clock cycle latency is > > really an up front design tradeoff to be made and correctly > > synthesizing code using that as an input parameter would be a leap > > forward. > > > > I think that's the intention of mentor's Catapult C synthesis program, > though I haven't used it myself. > Cool, thanks for the tip. Sounds like it's worth investigating. KJArticle: 107592
rickman wrote: > David Brown wrote: >> I've not any high speed boards - the last board I made had internal >> frequencies at 150 MHz, and an external bus at 75 MHz (overclocked in >> testing to about 240/120 MHz), so maybe I'm missing something that >> happens at higher frequencies. >> >> Using a simple tool such as Murata's software, I looked at the >> impedances for different capacitors at different frequencies. To a fair >> extent, the inductance is determined by the package size (and the board >> vias and traces), while the capacitance obviously goes up with the cap's >> value. So choosing a 0.01 uF instead of a 0.1 uF cap increases the >> capacitance side of the impedance curve by a factor of 10, and leaves >> the inductive side unchanged. It changes the peak frequency, but I fail >> to see why that should make a real difference - it has the same or >> higher impedance across the frequency range. Given that the 0.1 uF type >> has lower ESR (being made of more parallel plates), I can't find any way >> in which the 0.01 uF is better. So as Symon says (unless I'm >> misinterpreting him), the best arrangement is to pick the smallest size >> package you can conveniently mount (0603 for us), then the largest >> capacitance value you can conveniently and economically get in that size >> (100 nF), and use as many as needed for the board. Placement should be >> close to the device where possible, but is not very critical as long as >> it is within the range of the mini power plane (i.e., polygon on a >> signal layer). >> >> It works for me - but then again, I'm not doing really high-end cards. > > The amount of decoupling required is not a function of the clock rate. > It depends on the slew rate of your signals and the length of the > transmission lines. The length determines the lower frequencies you > will need to decouple and the slew rate determines the highest > frequencies. Of course there are other aspects that you need to > decouple, such as switching inside the chips. For that you need to > compare the maximum transition in current the chip will produce to the > maximum noise voltage you can tolerate. Then use the resulting > impedance as the goal for decoupling. > > Ritchey's data was very clear on this. Adding a single value of caps > to a power plane produced a resonance with a higher impedance than that > of the plane alone over a significant frequency range. By using > multiple cap values he was able to decouple a board with just a > handfull of caps rather than the mountain that are normally used. Most > importantly, he could show that his decoupling design worked correctly > before he built the board rather than verifying it in testing. > > I wish I could post the images from Ritchey's book. I have tried to > describe his measurements in detail, but a picture is worth a thousand > words (or maybe more). Moving the SRF is what makes it work. If you > use a hundred 0.1 uF caps you should get a parallel resonant peak in > impedance as the capacitor resonates with the power plane. Assuming > the capacitor has a high Q, then no number of capacitors will > significantly reduce that peak. Of course the caps don't have a high Q > so some number of caps *will* reduce the peak to an acceptable level. > Or you can add a smaller number of caps with a smaller value. These > caps will produce a higher frequency resonance with the power plane. > You can then add a third value of cap to move that resonance higher. > Each time you add a value of cap you flatten the impedance curve. If > the caps are not high Q, by the time you have added 0.1, 0.01 and 0.001 > caps you will have flattened it enough to not see any real peaks, but > rather just ripple in the frequency response. This will take a lot > fewer caps than the hundreds that are often used on boards, even at the > frequencies you are using. > > Or think of it from the other direction. You add the 0.001 uF caps to > decouple the plane at the highest frequencies that caps can be > effective. But they don't work well at lower frequencies so add a > smaller number of 0.01 uF caps to provide decoupling at a lower > frequency. Then add just a small number of 0.1 uF caps for the lower > freqs and so on down to the tantalum caps for bulk at the lowest > frequencies until the PSU response time can adequately maintain the > voltage. > Thanks for taking the time explaining this - between you and Symon I'm hopefully learning something! However, I've a couple of issues here. First off, I can't see that the power planes have much capacitive effect at these frequencies (the "planes" being polygons, with other signals on the same layer, and thus having plenty of gaps). But I'll happily admit to not having a clear idea how to model such planes or polygons. Secondly, I understand about different caps working better at different frequencies, and obviously have bulk capacitors for the lower frequencies (electrolytics near the regulators, and a few 4.7uF ceramics around the board). But I still can't find any reason to expect a 0.001 uF ceramic 0603 capacitor to be significantly better at higher frequencies than a 0.1 uF ceramic (same dialectric) 0603 capacitor. Using the muRata software, I picked a 0603 X7R 100 nF capacitor. The software gives it an SRF of 21 MHz, L of 0.63 nH, R of 0.027 O, and an impedance of 0.14 ohm at 10 MHz, 0.02 ohm at 20 MHz, 0.16 ohm at 50 MHz, 0.38 ohm at 100 MHz, 0.78 at 200 MHz, and 1.97 ohm at 500 MHz. Picking a 10 nF cap with the same setup gives an SFR of 67 MHz, and impedances at these frequencies of 1.66, 0.77, 0.16, 0.24, 0.71 and 1.95 ohms. In other words, it is a better at around 100 MHz, but not vastly better. Until we start looking at special 0306 caps for frequencies of several hundred MHz, I just don't see the benefit of smaller capacitance values. Even then, it is more economical to simply use a few extra caps of the same type (assuming the board has space for it). It doesn't even take that many caps - I've got about a dozen for the processor (which as two main supplies and a PLL supply), two or three for each of the sdram chips, and one or two for each of the other major chips. One thing that makes a significant difference is that I'm not driving any fast, high current lines - signalling is (almost) all TTL levels. Higher current drives would mean more capacitors, but I'd still expect to use the same types.Article: 107593
bazarnik@hotmail.com wrote: > Thanks! I stand corrected. > > In the paper there are several arbiter implementations with varying > request, grant protocols. > I was assuming the simples protocol being used: Figure 1 and 2, 3 > (also valid for Fig 6 logic) > > So the answer is in fact yes (req and grant exactly one cycle) > for protocols used in cases shown on Figures 7-9 > [snip] I think the ideal arbiter is pure comb logic. But some arbiter is so large that we have to split with FF. When use FF, the grant will wait a cycle and the request will send again. So the solution is add queue (FIFO) to arbiter and pull-down the request when queue get the request. Any comment are welcome! Thanks! Davy > (answer is no for Fig 1-6) > > Cheers, > Przemek > > > > Doug MacKay wrote: > > bazarnik@hotmail.com wrote: > > > <snip> > > > > > > Obviously request will be active for several clock cycles. > > > This is beacuse some waiting time for acknowledge is necessary. > > > (If not then why would we need arbiter?) > > > > > > <snip> > > > > > > Cheers, > > > Przemek > > > > Not so obvious. Some arbiters contain logic to queue incoming requests > > and (for example) will interpret 4 continuous cycles of request > > assertion as 4 separate requests. > > > > This can be useful if your arbitration logic requires multiple cycles > > while being expected to handle a new request every cycle.Article: 107594
Hi Karl, I have already got the document and tried the estimation calculation, but the delay shift I worked out with the th, tsu and tco's which I got from my compilation report is quite different to the one (-3.35) demonstrated in the example calculation. (I compiled the example standard design from altera and used the figures in the compilation report for calculation just in case some of my custom components affect the sdram timing ). In the compilation report, the tco's (max and min) I found are similar to the ones used in the example calculation , but the th and tsu I got are quite different . What I did is, I looked into the sections "th" and "tsu" under the timing analysis report and find the largest possible th and tsu I could find that are related to sdram. Is this right ? What I have been doing now is trial and error , and from that I am pretty sure the delay for my sdram should be around -3.5ns. But somehow its still not stable, I tried some printf statement trying to print data from the sdram and the program always stuck after printing the second character of all the character i wanted to print out. Thank you very much , TonyArticle: 107595
I found the problem, it was actually in the signed logic. I had something like: wire my_wire; reg signed [10:0] power= 11'h688; parameter [10:0] m33dBm = 11'h6F8; // unsigned declaration :( ... assign my_wire = (power >m33dBm); I did not realize that I had an expression with one signed and one unsigned operand. This is clearly bad practice, but I think the Verilog2001 standard forces a deterministic result to it, with conversion rules. But XST 8.2.02i and ModelSim interpret differently tyhe expression; i think ModeSim is correct (my_wire evaluated to 0 as expected) and XST is wrong (my_wire evaluated to 1 in post-P&R, why ?) The 2 simulations are equal with this declaration: parameter signed [10:0] m33dBm = 11'h6F8; PS: Xilinx should give a bonus to users for finding their bugs... Plus I could not find good guidlines on how XST interpret signed logic. Andreas Ehliar ha scritto: > On 2006-08-30, tullio <tullio.grassi@gmail.com> wrote: > > I know this topic has been discussed before, but i am looking for some > > ideas. > > My design is synchronous, I use ISE8.2.02 XST and ModelSim, I have a > > lot of signed logic; I get different results from behavioral and > > post-Route simulations. > > I had some problems using signed logic in Verilog using an older version > of XST although I do not remember the version any longer. I introduced > a workaround to avoid the bug. I haven't checked wether the bug is > still present in the latest version of XST. (Since I was not allowed > to open a webcase at the time I discovered this bug I reported the bug > on the Xilinx University Program forum but I didn't hear anything back > about it.) > > The source code and screenshots of the waveforms of behavioral and > post-map simulation are available at > http://www.da.isy.liu.se/~ehliar/xstproblem/ . > > I'm afraid dct_workaround.v and dct.v differ quite a lot, but you can > search for "synthesis bug" in dct_workaround.v to see the workaround > I did. > > /AndreasArticle: 107596
Antti schrieb: > Hi > > if anyone has seen the same or has any ideas how to avoid the issue I > am faced please help - I am trying to get it solved myself (my deadline > is today 21:00 german time) but kinda stuck currently. > > problem: > > Virtex-4FX12 > 2 DCMs in series > DCM1 uses only FX output for PPC system clock (to get the clock into > DLL input range) > DCM2 generates 3X clock proc clock for PPC > > it all works for 360 milliseconds after configuration. then the first > DCM will remove lock, output stop, everythings stops. the delay 360 > milliseconds is not dependand on the first DCM clock ratio settings. if > PPC is held in reset then the DCMs still shut down after the same 360 > milliseconds. > > any ideas? what to check? I have Lecroy 2GS/s DSO on some signals and > power supply lines but not seeing anything at the time where the DCM > shut off. > > thanks in advance for any suggestions, > > Antti thanks for all the fine suggestions, the issue has been fixed. AnttiArticle: 107597
Martin Thompson wrote: > I think that's also low-level though, LUTs and wiring? I was thinking > of something higher-level, where if you have an adder, you put a<=b+1 > in the VHDL and let the synth sort it out. That might make sense for a lnaguage translator, but that isn't the synthesis goal for FpgaC. > Or are there benefits to optimising across the great sea of LUTs that > a normal synth doesn't get to? For TMCC (the starting point for FpgaC) the goal was just to get a portable form, optimized for C. It wasn't particularly optimial, just something that worked. The goal was a C language semantic HDL, as an educational exercise. The FpgaC project is targeting using FPGAs to provide a VERY fast execution environment for ANSI C. As it matures it will do some things in a very different way. One for starters, will be to keep all expressions and variable assignments in "carry save" format right up to the point they are committed to being saved in their final form as DFF's. This may use a little more logic to gain performance, and avoids unnecessary ripple carry resolution of intermediate terms. With LUT term sharing and packing, it may actually use the same or less, by foling several operations into a single lut.Article: 107598
Antti, The PMV is not used for configuration. In fact, it was not intended to be used by anything at all. It is there for test only. As such, it is uncharacterized. Anything that is uncharacterized can NOT be made available for customer use. The DCM NBTI macro did make use of it, as that macro needed an oscillator for when no clock was present, and the less resource used by a macro, the better the macro (least impact on any design). If you want a ring oscillator, (which is all that the PMV is), you can easily make one out of a chain of LUTs. Austin Antti wrote: > Hi All, > > as I had guessed for long time the PMV primitive is actually the > on-chip oscillator, most likely it is the same oscillator that is used > for configuration. And it can be used from user designs as well. PMV is > present in all recent FPGAs. > > http://xilant.com/index.php?option=com_content&task=view&id=29&Itemid=32 > > When I opened webcase about the issue that Xilinx tools made fatal > failure when I tried to use the PMV from an hard macro the response was > that, "you dont need to know" - well now I know :) > > Antti >Article: 107599
> thanks for all the fine suggestions, the issue has been fixed. > Out of curiosity, What was the problem ? Sylvain
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z