Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
PLEASE HELP: In my design I have a bank of memory made up of 64 reg[31:0]'s. They get synthesized into latches, and kill my timing because of a huge 64 X 32 sensitivity list.(See below). How else can I synthesize a 64X32 block of Memory with Asynch Read, Asynch Write, and where the Data_In shows up immediately on the Data_Out? Can I do it with Xilinx CoreGen Modules? //This part handles the read for the 64 32-bit registers which store the data. //The rd_address is 1-hot, 64-bit reg. always @ (rd_address or q0 or q1 or q2 or q3 or q4 or q5 or q6 or q7 or q8 or q9 or qa or qb or qc or qd or qe or qf or q10 or q11 or q12 or q13 or q14 or q15 or q16 or q17 or q18 or q19 or q1a or q1b or q1c or q1d or q1e or q1f or q20 or q21 or q22 or q23 or q24 or q25 or q26 or q27 or q28 or q29 or q2a or q2b or q2c or q2d or q2e or q2f or q30 or q31 or q32 or q33 or q34 or q35 or q36 or q37 or q38 or q39 or q3a or q3b or q3c or q3d or q3e or q3f) begin case (1'b1) // synopsys full_case parallel_case rd_address[0]: DO = q0; rd_address[1]: DO = q1; rd_address[2]: DO = q2; rd_address[3]: DO = q3; rd_address[4]: DO = q4; rd_address[5]: DO = q5; rd_address[6]: DO = q6; rd_address[7]: DO = q7; rd_address[8]: DO = q8; rd_address[9]: DO = q9; rd_address[10]: DO = qa; rd_address[11]: DO = qb; rd_address[12]: DO = qc; rd_address[13]: DO = qd; rd_address[14]: DO = qe; rd_address[15]: DO = qf; rd_address[16]: DO = q10; rd_address[17]: DO = q11; rd_address[18]: DO = q12; rd_address[19]: DO = q13; rd_address[20]: DO = q14; rd_address[21]: DO = q15; rd_address[22]: DO = q16; rd_address[23]: DO = q17; rd_address[24]: DO = q18; rd_address[25]: DO = q19; rd_address[26]: DO = q1a; rd_address[27]: DO = q1b; rd_address[28]: DO = q1c; rd_address[29]: DO = q1d; rd_address[30]: DO = q1e; rd_address[31]: DO = q1f; rd_address[32]: DO = q20; rd_address[33]: DO = q21; rd_address[34]: DO = q22; rd_address[35]: DO = q23; rd_address[36]: DO = q24; rd_address[37]: DO = q25; rd_address[38]: DO = q26; rd_address[39]: DO = q27; rd_address[40]: DO = q28; rd_address[41]: DO = q29; rd_address[42]: DO = q2a; rd_address[43]: DO = q2b; rd_address[44]: DO = q2c; rd_address[45]: DO = q2d; rd_address[46]: DO = q2e; rd_address[47]: DO = q2f; rd_address[48]: DO = q30; rd_address[49]: DO = q31; rd_address[50]: DO = q32; rd_address[51]: DO = q33; rd_address[52]: DO = q34; rd_address[53]: DO = q35; rd_address[54]: DO = q36; rd_address[55]: DO = q37; rd_address[56]: DO = q38; rd_address[57]: DO = q39; rd_address[58]: DO = q3a; rd_address[59]: DO = q3b; rd_address[60]: DO = q3c; rd_address[61]: DO = q3d; rd_address[62]: DO = q3e; rd_address[63]: DO = q3f;Article: 120501
<Mavrick> wrote in message news:eea754b.4@webx.sUN8CHnE... > Bob, > > As per my understanding using the bufio the fpga internal clock tree skew > is 50 ps. I am still not able to understand the 10 ps skew per fpga output > pin and you multiply that by the number of outputs used. Is this 10ps skew > due to mismatch in trace routing inside the fpga? If so why we need to > multiiply by the number of pins used? > > Mavrick The thing that drives a global clock line is a BUFG (aka, BUFGMUX). Each clock line is driven from one end (as far as I know) and the IOBs tap off of this (long) clock line. So, as a given clock edge moves down the clock line it passes by the IOBs. Each IOB gets this clock edge a little later than the previous one, so an output generated (clocked out) by that IOB is delayed (a little) from the previous one. It's that simple. BobArticle: 120502
On Jun 8, 7:50 am, Frank <FrankT...@gmail.com> wrote: > PLEASE HELP: > > In my design I have a bank of memory made up of 64 reg[31:0]'s. They > get synthesized into latches, and kill my timing because of a huge 64 > X 32 sensitivity list.(See below). > > How else can I synthesize a 64X32 block of Memory with Asynch Read, > Asynch Write, and where the Data_In shows up immediately on the > Data_Out? > > Can I do it with Xilinx CoreGen Modules? > > //This part handles the read for the 64 32-bit registers which store > the data. > //The rd_address is 1-hot, 64-bit reg. > > always @ (rd_address or q0 or q1 or q2 or q3 or q4 or q5 or q6 or q7 > or q8 or q9 or qa or qb or qc or qd or qe or qf > or q10 or q11 or q12 or q13 or q14 or q15 or q16 > or q17 > or q18 or q19 or q1a or q1b or q1c or q1d or q1e > or q1f > or q20 or q21 or q22 or q23 or q24 or q25 or q26 > or q27 > or q28 or q29 or q2a or q2b or q2c or q2d or q2e > or q2f > or q30 or q31 or q32 or q33 or q34 or q35 or q36 > or q37 > or q38 or q39 or q3a or q3b or q3c or q3d or q3e > or q3f) > begin > case (1'b1) // synopsys full_case parallel_case > rd_address[0]: DO = q0; > rd_address[1]: DO = q1; > rd_address[2]: DO = q2; > rd_address[3]: DO = q3; > rd_address[4]: DO = q4; > rd_address[5]: DO = q5; > rd_address[6]: DO = q6; > rd_address[7]: DO = q7; > rd_address[8]: DO = q8; > rd_address[9]: DO = q9; > rd_address[10]: DO = qa; > rd_address[11]: DO = qb; > rd_address[12]: DO = qc; > rd_address[13]: DO = qd; > rd_address[14]: DO = qe; > rd_address[15]: DO = qf; > rd_address[16]: DO = q10; > rd_address[17]: DO = q11; > rd_address[18]: DO = q12; > rd_address[19]: DO = q13; > rd_address[20]: DO = q14; > rd_address[21]: DO = q15; > rd_address[22]: DO = q16; > rd_address[23]: DO = q17; > rd_address[24]: DO = q18; > rd_address[25]: DO = q19; > rd_address[26]: DO = q1a; > rd_address[27]: DO = q1b; > rd_address[28]: DO = q1c; > rd_address[29]: DO = q1d; > rd_address[30]: DO = q1e; > rd_address[31]: DO = q1f; > rd_address[32]: DO = q20; > rd_address[33]: DO = q21; > rd_address[34]: DO = q22; > rd_address[35]: DO = q23; > rd_address[36]: DO = q24; > rd_address[37]: DO = q25; > rd_address[38]: DO = q26; > rd_address[39]: DO = q27; > rd_address[40]: DO = q28; > rd_address[41]: DO = q29; > rd_address[42]: DO = q2a; > rd_address[43]: DO = q2b; > rd_address[44]: DO = q2c; > rd_address[45]: DO = q2d; > rd_address[46]: DO = q2e; > rd_address[47]: DO = q2f; > rd_address[48]: DO = q30; > rd_address[49]: DO = q31; > rd_address[50]: DO = q32; > rd_address[51]: DO = q33; > rd_address[52]: DO = q34; > rd_address[53]: DO = q35; > rd_address[54]: DO = q36; > rd_address[55]: DO = q37; > rd_address[56]: DO = q38; > rd_address[57]: DO = q39; > rd_address[58]: DO = q3a; > rd_address[59]: DO = q3b; > rd_address[60]: DO = q3c; > rd_address[61]: DO = q3d; > rd_address[62]: DO = q3e; > rd_address[63]: DO = q3f; Hi Two of the most over used and abused directives included in Verilog models are the directives "//synopsys full_case parallel_case". The popular myth that exists surrounding "full_case parallel_case" is that these Verilog directives always make designs smaller, faster and latch-free. This is false! Indeed, the "full_case parallel_case" switches frequently make designs larger and slower and can obscure the fact that latches have been inferred . These switches can also change the functionality of a design causing a mismatch between pre-synthesis and post-synthesis simulation, which if not discovered during gate-level simulations will cause an ASIC to be taped out with design problems. it is generally a bad coding practice to give the synthesis tool different information about the functionality of a design than is given to the simulator. Whenever either "full_case" or "parallel_case" directives are added to the Verilog source code, more information is potentially being given about the design to the synthesis tool than is being given to the simulator In general, do not use "full_case parallel_case" directives with any Verilog case statements. ARHArticle: 120503
cs_posting@hotmail.com wrote: > You might want to think about using a flash memory card instead of a > physical disk... a bit more reliable and even easier to source today. Agreed. > If you are only handling a sector, you may be able to use a state- > machine like architecture rather than a processor core. All you really > have to do is grab the data from one sector quickly, and then let the > 8080 read it out more slowly. I've actually implemented a read-only (state machine) emulation of the WDC-1793 in an FPGA which sources data from a serial flash device. It interfaces to a TRS-80 implemented within the same FPGA. So state-machine is definitely an option. > I would assume controllers of that vintage would be state machines and > not microprogrammed machines... but then I could be wrong. IIUC the WDC 1793 did actually have a (purpose-built?) microprocessor. Regards, -- Mark McDougall, Engineer Virtual Logic Pty Ltd, <http://www.vl.com.au> 21-25 King St, Rockdale, 2216 Ph: +612-9599-3255 Fax: +612-9599-3266Article: 120504
Eric asked 'how' you would do it with logic gates. Once you know that, then the question is to how to describe that behavior in VHDL, verilog, c, c++... If you can write behavioral descriptions for your FFs and logic gates in VHDL (etc.), you have a solution to your problem. If you can abstract the sequence to a state machine, and code that in VHDL (etc.), you have another solution to your problem. If you understand what it is you really want to do, and appropriately describe and code it, you are likely to have a better solution to your problem. A better solution, because it will respond to other sequences the way you intend, rather than just matching the small set of stimulus-response vectors described. There will be many solutions that will pass the test vectors, but will not 'do the right thing' in general. JTW <willwestward@gmail.com> wrote in message news:1181261769.985939.84080@x35g2000prf.googlegroups.com... > On Jun 7, 3:54 pm, Eric Smith <e...@brouhaha.com> wrote: >> willwestw...@gmail.com writes: >> > I hope I explained ok here. I'm having trouble putting this behavior >> > into codes in VHDL. Someone has any idea? >> >> How would you design it with logic gates (and flip-flops, if required)? >> >> Once you know that, you can just do the same thing in VHDL. > > Yes you can do it in logic gates, but I want behavior modeling. If > using FF and logic gates, it would make this totally a netlist > coding. > > Here is the Requset, Reset, and Acknowledge again: > > Request(0 to 3): 0000 1010 0110 1101 0010 1001 0001 1000 1011 1011 > 1000 0101 1001 > Reset: > HHHHLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLHHHHLLLLLLLLLLLLLLLLLLLLL > Ack(0 to 3): 0000 1000 0010 0100 0010 1000 0001 1000 0000 1000 > 1000 0100 0001 > > I also plan to convert this code into Verilog, C or C++ later on, so > if you have algorithm or codes in those languages, it's fine too. > >Article: 120505
Thanks for your answer, we're working on it ! "Peter Ryser" <peter.ryser@xilinx.com> a écrit dans le message de news: f47n98$pa91@cnn.xilinx.com... > For standalone applications the heap is initialized in sbrk.c which you > can find in ppc405_0/libsrc/standalone*/src/sbrk.c > > You can see that heap_ptr is initialized on the first call to sbrk() and > increases its value after that on every call. It maintains its value > through a reset. > > To change the behavior you can copy sbrk.c into your own project and > change it. > > > Alternatively, there might also be a compiler option that allows you to > initialize static values at run time instead of compile time. You might be > able to > static char *heap_ptr=0; > in combination with the compiler option to get the desired behavior. > > - Peter > > > sjulhes wrote: >> Hi all, >> >> I have a SOC with a Power PC running on a Virtex II pro. >> On FPGA configuration, the PPC firmware runs correctly, heap reservation >> on firmware initialisation is correct. >> When I reset the SOC with the reset controller, the PPC restarts >> correctly, but when it tries to use the heap to initialize some dynamic >> variables, I get a fail on the malloc. >> >> It seem that on the reset the power PC context is not reinitialized. >> I guess we have to add something in the boot sequence of the PPC to reset >> the heap pointer. >> >> I'm I right ? >> It is something that must be standard in the power PC world, does someone >> have an exemple how to do this ?? >> >> Thanks. >> >> Stéphane.Article: 120506
thanks for looking and please help me. I have a 16 bit ADC sampling at 50Mhz. I take only 10 MSB and display it in my video 1024 x 768 as an output to check the signal. I see a continuous line displayed in my monitor which still toggle within 1 or 2 pixels. My question is, do I need a adaptive filter to really smooth the line in only 1 pixel wide? To make my question clear, for example whenever it's supposed to be a horizontal line, it has noise random 2 pixel wide in y. Is this situation, how can I smooth the data efficiently? I used moving average to smooth the data but every pixel is important, my moving average will give me a small wave along a straight line plus curving the edge. The input from ADC is dc coupled which accept level of dc voltage continuously. I'm really doubt about FIR filter and adaptive filter and can somebody tell me is this should be done in adaptive filter? I tried the DA fir from xilinx spartan 3 but it distort the signal, is that mean my cut off frequency is too low? And i use 10bit as input of adaptive filter, should I should all 16 bit as input?Article: 120507
On Jun 7, 10:24 am, "comp.arch.fpga" <ksuli...@googlemail.com> wrote: > All of these architectures require N*N/K 1-bit adders when multiplying > N bit numbers in K clock cycles. Hi Kolja, Without a doubt the product of two numbers of length M and N, yields at most M*N full adders, and LUT's, by most decompositions we teach students ... or N*N = N^2 in your form. For the product of two 64 bits, that yield roughly 4096 LUTs using the basic student forms which are taught in most EE courses. Actually, it's (N-1) rows of adders of length N, or 4064 full adders or LUTs. However, that is about what we expect from students, and novice engineers which fail to study best practice algorithms, and think about the problem in a bit more detail. Karatsuba's form requires two products of (N/2) plus a product of ((N/ 2)+1) bits, plus three more adders of length (M/2)+1, plus some careful merging of the data streams. Roughly 992+992+1056+99=3139 full adders, or LUTs which is a significant savings over N^2. A good student would produce this result without question. An excellent student would recursively implement the two 32 bit multiplies using Karatsuba's form as well, saving roughly another 378 full adders or LUTs, for a rough total of 2761 LUTs. And probably one more time to pickup another hundred or so full adders, or LUTs, but also to better constrain the input routing and locality. One of the side effects of using Karatsuba's form, is that it reduces the fan-in required for the input digits, and greatly relieves input routing over a basic student form solution where every input term is in a full N bit row or column. It's with this starting point, significantly better than N^2, (about 60% the size of the basic student adder tree from) that we can do even better with careful optimization and layout, using a host of salty dog tricks to implement a a slightly better variation of this model.Article: 120508
I have a warning with Quartus 6.0 software in the Fitter section. It says : "Found 71 ouput pins without output pin load capacitance assignment" In the "Device and Pin Options" I can see that in the LVTLL the "Capacitive loading" there's 0 ! I don't know what capacitive loading I must put in my case. The device is a Stratix II EP2S15F484C5 mounted on our board... Thanks...Article: 120509
Thanks. What I need is to have the phase relationship automatically updated, whenever I change any settings in the PLL. Anyway my questions was probably stupid. It can be done just with the features of the Tcl: derive_pll_clocks set out_clk80 "out_clk_pll_1|altpll_component|pll|clk[0]" set out_clk80_latch "out_clk_pll_1|altpll_component|pll|clk[1]" set out_clk80_data "out_clk_pll_1|altpll_component|pll|clk[2]" then: set_output_delay -clock $out_clk80_latch -add_delay 1.0 [get_ports {IFIFO_OUT_DPRSNT}] set_output_delay -clock $out_clk80_latch -add_delay 1.0 [get_ports {IFIFO_OUT_CT[1]}] -- Thanks & Regards, WojtekArticle: 120510
"Totally_Lost" <air_bits@yahoo.com> wrote in message news:1180769858.872726.132740@i13g2000prf.googlegroups.com... > I've been looking at the various core/macro generators and they all > seem horribly large and slow, almost like student designs. Has anyone > seriously taken a good look at hand fitting multipliers and squarers > into Altera/Xilinx FPGA's? This has been an interesting thread. But I still wonder why you want to implement a LUT-based 64x64 multiplier? The reason the multiplier generator core in Xilinx Coregen doesn't use a whole lot of tricks in this configuration is that it seems like a very unlikely use case, and not one worth concentrating a lot of effort on. Ever since hard multiplier/DSP blocks began appearing in FPGAs, they have been the best building block for these large multiplier structures. Maybe you can share some details of why exactly you need such a macro? Cheers, -Ben-Article: 120511
Hi All, It is obviously my fault, and the correct answer is probably "RTFM" (1), but I've just lost significant amount of time, because TimeQuest treats clocks as related by default. The design which worked perfectly compiled with standard Timing Analyzer, got crazy after switching to TQ. I discovered the problem when I got strange warnings like: "Warning: 26 (of 12252) connections in the design require a large routing delay to achieve hold requirements. Please check the circuit's timing constraints and clocking methodology, especially multicycles and gated clocks." The TQ considers all clocks as related unless specified otherwise. Is it reasonable to consider two clocks with different frequences (e.g. 10.3213 MHz and 13.43345 MHz - just the random values) as related? In this case there must be ALWAYS ts and th violations. Shouldn't the TQ loudly complain about not specified relationships between the clocks instead of assuming by default that all clocks are related? -- TIA & Regards, Wojtek. (1) qts_qii53019.pdf Page 7-1: All clocks are related by default. (Refer to "Related and Unrelated Clocks" on page 7-=AD13.) Page 7-13: Related and Unrelated Clocks In the Quartus II TimeQuest Timing Analyzer, all clocks are related by default, and you must add assignments to indicate unrelated clocks. However, in the Quartus II Classic Timing Analyzer, all base clocks are unrelated by default. All derived clocks of a base clock are related to each other, but are unrelated to other base clocks and their derived clocks. Then on 7-14 it is explained how to set clocks to unrelated with set_clock_groups -exclusive -group {clock_a} -group {clock_b}Article: 120512
In news:lgfg635gsikhnj2ps84lv24fs0qpj84mt4@4ax.com timestamped Thu, 07 Jun 2007 18:55:05 +0100, Evan Lavelle <nospam@nospam.com> posted: "On 7 Jun 2007 11:02:48 GMT, Colin Paul Gloster <Colin_Paul_Gloster@ACM.org> wrote: [..] >standard. It is very clearly written in Section 4.2.1 The scheduling >algorithm: >"[..] > >4.2.1.2 Evaluation phase > >[..] > >Since process instances execute without interruption, only a single >process instance can be running at any >one time, and no other process instance can execute until the >currently executing process instance has >yielded control to the kernel. [..] I'm not quite sure what you're saying here, but SystemC would be useless if it didn't support a user-level notion of concurrency. [..]" If one thread/process is running and all other threads/processes are not running, then they are not running concurrently. They are not running, actually. "Or are you saying that you can't implement SystemC on concurrent hardware/multi-threaded processors/whatever?" I am not saying that. However, I am not aware of a SystemC(R) implementation (aside from synthesizers of course) which actually exploits concurrent hardware (e.g. a multiprocessor workstation). If you check one of the forums on SystemC.org you can notice people who were not pleased that their OSCI simulators would use just one operating system process. "[..] But, if you look at 4.2.1 in more detail, you'll see that you can implement your scheduler in any way you want, as long as you preserve the simulation semantics. See note 3 in 4.2.1.2: "3 An implementation running on a machine that provides hardware support for concurrent processes may permit two or more processes to run concurrently provided that the behavior appears identical to the co-routine semantics defined in this clause"." I am aware of that. It is also not necessary to have that written in the standard: e.g. if a particular function is always called with a parameter of exactly the same value then it might be optimized to not bother passing the parameter in so long as the optimized version does what is required of it. The designer would not even notice. Are you aware of a SystemC(R) simulation engine which actually does run threads/processes concurrently? I am aware of one which does not. I do not contend that concurrency can not be achieved by a SystemC(R) simulation engine, but before such a simulator is made, it will be imaginary. Do people really want to bother? The SystemC(R) standard was being drafted for years. How many more years until concurrent simulations? "Besides, there's nothing wrong with non-preemptive coroutine semantics." True. Sequential programming can be useful. "Why would you want to pre-empt an executing thread? [..]" Perhaps I would, perhaps I would not. It depends on what I am tying to do. Preemptive multitasking operating systems do exist. I spoke about concurrency whereas both preempting something and non-preemptive coroutine semantics involve sequential programming which is not concurrent. "[..] BTW, what is SystemC(R)? 'SystemC' is an OSCI trademark, in the same way that 'Verilog' is a Cadence trademark." Terms and/or conditions similar to what were imposed on me can be found on WWW.SystemC.org/account/register.php , such as: "[..] EXHIBIT D Trademark Usage Policy [..] II. PROPER USE OF MARKS Trademarks and service marks function as adjectives and generally should not be used as nouns [..] [..] III. PROPER ATTRIBUTION Trademark ownership is attributed in two ways, with the use of a symbol [..] [..]" ASCII does not have a registered trademark character, so "(R)" can suffice instead. Regards, Colin Paul GlosterArticle: 120513
On 7 juin, 21:39, Ken Ryan <newsr...@leesburg-geeks.org> wrote: > Well, I believe xilkernel is theoretically needed even with raw api. > In any case, I've never tried raw mode myself, but I thought I saw posts > in this NG that said raw mode doesn't actually work yet. lwIP in raw API mode (for the hard temac only, no problem for other mac) won't be supported before EDK v9.2, due at the end of the summer. With raw mode you don't need xilkernel. If you want early temac support for raw mode, you might want to give the driver Paul Tobias did a try: http://www.paultobias.com/Xilinx/ Search his post in this NG for more info. PatrickArticle: 120514
I like your analogy but we are using bufio to clock the oserdes. The Xilinx spec says that maximum skew with bufio is 50ps. Thus if I drive 40 loads with bufio clock output then the maximum skew should not exceed 50 ps. Is that correct? MavrickArticle: 120515
On 7 juin, 21:29, Ken Ryan <newsr...@leesburg-geeks.org> wrote: > PatrickDuboiswrote: > > I > > sold the idea around here to use the Avnet FX12 mini-module as a nice > > solution to ethernet connectivity. I thought that there would be lots > > of working examples for it (as ethernet seems to be its biggest > > selling point). Wrong. > > Hoboy, you stepped on a rant there. :-) > > I initially procured the Avnet v2pro FF672 board with the comm3 module. > I figured, how could I go wrong, there's an example design that does 90% > of what I want to do - the only bit missing was a specialized gigabit > serial link that I could adapt from a previous design. > > I checked out the board, the example seemed to work OK, so I proceeded > with my design. Then sometime later I connected the board to my > company's intranet - instead of a private wire link. Wham, the TCP > stack fell over, crashed the processor hard. This was my introduction > to xilnet. Then I realize that EDK didn't support the 91C111 MAC that > was on that board - I'd have to port an ethernet stack myself. The > Avnet demo was a hacked-up port of xilnet with a very rudimentary > driver, and completely useless for anything halfway realistic. > > I actually started down the path of porting Linux to the board, when I > discovered another problem. The comm3 module uses all 3.3V signaling to > the FPGA. The FF672 board has a bank voltage select to support it. > Unfortunately *some* of the I/Os come from a fixed 2.5v bank. So > Avnet's stack was driving about a half-dozen 2.5v inputs with 3.3v > levels. I considered switching to the comm2 module, but when I looked > at the schematics I saw the same problem. Eventually I got hold of > someone in the know at Avnet; turns out that was a known design bug, > known for a long time (through two design generations!). > > My mood wasn't helped by the complete and utter failure to get their > SystemAce Module to work (I eventually gave up and tossed it in a drawer > to rot). > > Normally when one uses one of these development boards one is just > fiddling with it or experimenting. Sometimes we want to embed it into a > product, if it's appropriate. This was to go into some equipment with a > fairly long expected lifetime, so I couldn't afford to have hardware I > knew would eat itself. $4500 and six weeks down the drain. I turned > the Avnet sales rep's ears around for that one. Avnet is great with the > vast majority of what they do, but it'll be a long time before I > consider another of their development boards for anything more > sophisticated than fixing a short table leg. > > Now I'm going through the process of finding all the other little > surprises (I went to an ML405 board; much better piece of hardware > thought it took me a while to port my rocketio design). > > OK, I'll quit frothing now... > > ken Hehe, thanks for sharing your experience :-)Article: 120516
I have a Spartan 3 interfaced to a TigerSHARC via its "link port", which is a 4-bit wide DDR communications interface. The receive side of this interface on the FPGA consists of two 16x4 bit FIFOs implemented as LUTRAM, and the write address of each one is driven by a 4-bit register, explicitly implemented as FD or FDC. I added "LOC" constraints to my VHDL source for the four LUTRAMs and four write-address flip-flops, in order to control skew among the data lines and to optimize potential performance. I'm currently running this interface at 125 MHz, but hope to be able to support 250 MHz eventually. In particular, the flip-flops are packed in pairs, two to a given location. Anyway, my problem is this: All of this has been working fine for the last two years while the rest of the FPGA was being developed. However, on the last iteration of adding some more logic to the FPGA (unrelated to the DSP link port), I suddenly started getting an error from the "Directed Packing" step of the Map process saying that it couldn't pack one of my flip-flops with the other, because the set/resets were not identical. It turns out that XST had assigned the reset of one of the flip-flops to a different copy of the global reset signal, which presumably had been created because the fanout of the reset signal had reached some threshold. Apparently, the assignment of loads to specific copies of replicated nets occurs at an earlier step, and doesn't take into account directed packing constraints. So, my question is this: Is there an easy way, in my VHDL source file, to insure that all of my flip-flops are connected to the *same* copy of the reset signal, without introducing additional logic into the path? The file is attached below for reference. Thanks in advance for any suggestions! -- Dave Tweed ============================================================================ -- dsp_link_rx.vhd -- Receive side of a 4-bit wide ADSP-TS201 link port. -- This is based on the design found in XAPP635, but without the block RAMs -- used to convert the internal data path to 128 bits width. It is designed -- to accept one quadword at a time into its LUTRAM 16x8 FIFO. The FIFO can -- be unloaded at the system clock rate, and then another quadword can be -- accepted from the DSP transmitter. -- To do: -- For now, we're not interested in the BCOMP signal, so it is ignored. -- History: -- 2007/06/05 DT Fix sensitivity list. -- 2005/08/19 DT Fix problem with duplicated last byte. -- 2005/08/17 DT Fix problem with duplicated first byte. -- 2005/08/10 DT Eliminate generate statements so that we can label -- primitives for placement. -- 2005/08/08 DT Fix FIFO state machine. -- 2005/08/02 DT Take over from HDLmaker; fix trigger mechanism. -- 2005/06/07 DT Move the I/O buffers to the pad ring. -- 2005/05/22 DT Start. library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_arith.all; use IEEE.std_logic_unsigned.all; -- for correct simulation of RAM16X1D library unisim; use unisim.vcomponents.all; entity dsp_link_rx is port ( clock : in std_logic; rst : in std_logic; -- external interface from DSP lxacko : out std_logic; lxbcompi : in std_logic; lxclk : in std_logic; lxdata : in std_logic_vector (3 downto 0); -- internal interface (to bytestream decoder, fast version) dataout : out std_logic_vector (7 downto 0); data_en : out std_logic; ready : in std_logic ); end dsp_link_rx; architecture BEHAVIOR of dsp_link_rx is -- -------------------------------------------------------------------- signal high : std_logic; signal notlxclk : std_logic; signal int : std_logic_vector(7 downto 0); signal rd_addr : std_logic_vector(3 downto 0); signal wr_addrp_d : std_logic_vector(3 downto 0); signal wr_addrp : std_logic_vector(3 downto 0); signal wr_addrn_d : std_logic_vector(3 downto 0); signal wr_addrn : std_logic_vector(3 downto 0); signal trigger_d : std_logic; signal data_en_int : std_logic; signal set_lxacko : std_logic; signal lxacko_int : std_logic; type fifo_state_t is (fifo_empty, fifo_active, fifo_final); signal fifo_state : fifo_state_t; attribute loc : string; attribute syn_keep : boolean; attribute syn_keep of lxclk : signal is TRUE; attribute loc of rp0 : label is "slice_x40y8"; attribute loc of rp1 : label is "slice_x38y10"; attribute loc of rp2 : label is "slice_x40y10"; attribute loc of rp3 : label is "slice_x38y8"; attribute loc of rn0 : label is "slice_x40y9"; attribute loc of rn1 : label is "slice_x38y11"; attribute loc of rn2 : label is "slice_x40y11"; attribute loc of rn3 : label is "slice_x38y9"; attribute loc of ff_wr_addrp0 : label is "slice_x39y9"; -- attribute loc of ff_wr_addrp1 : label is "slice_x39y9"; attribute loc of ff_wr_addrp2 : label is "slice_x39y8"; attribute loc of ff_wr_addrp3 : label is "slice_x39y8"; attribute loc of ff_wr_addrn0 : label is "slice_x41y9"; attribute loc of ff_wr_addrn1 : label is "slice_x41y9"; attribute loc of ff_wr_addrn2 : label is "slice_x41y8"; attribute loc of ff_wr_addrn3 : label is "slice_x41y8"; -- -------------------------------------------------------------------- begin -- -------------------------------------------------------------------- high <= '1'; notlxclk <= not lxclk; -- -------------------------------------------------------------------- -- FIFO write addresses for both phases of the clock wr_addrp_d <= wr_addrp + 1; wr_addrn_d <= not (rst & rst & rst & rst) and wr_addrp; ff_wr_addrp0 : fdc port map (d => wr_addrp_d(0), c => lxclk, clr => rst, q => wr_addrp(0)); ff_wr_addrp1 : fdc port map (d => wr_addrp_d(1), c => lxclk, clr => rst, q => wr_addrp(1)); ff_wr_addrp2 : fdc port map (d => wr_addrp_d(2), c => lxclk, clr => rst, q => wr_addrp(2)); ff_wr_addrp3 : fdc port map (d => wr_addrp_d(3), c => lxclk, clr => rst, q => wr_addrp(3)); ff_wr_addrn0 : fd port map (d => wr_addrn_d(0), c => notlxclk, q => wr_addrn(0)); ff_wr_addrn1 : fd port map (d => wr_addrn_d(1), c => notlxclk, q => wr_addrn(1)); ff_wr_addrn2 : fd port map (d => wr_addrn_d(2), c => notlxclk, q => wr_addrn(2)); ff_wr_addrn3 : fd port map (d => wr_addrn_d(3), c => notlxclk, q => wr_addrn(3)); -- -------------------------------------------------------------------- -- The FIFO memories (LUTRAM) rp0 : ram16x1d port map (d => lxdata(0), we => high, wclk => lxclk, a0 => wr_addrp(0), a1 => wr_addrp(1), a2 => wr_addrp(2), a3 => wr_addrp(3), dpra0 => rd_addr(0), dpra1 => rd_addr(1), dpra2 => rd_addr(2), dpra3 => rd_addr(3), spo => open, dpo => int(0) ); rp1 : ram16x1d port map (d => lxdata(1), we => high, wclk => lxclk, a0 => wr_addrp(0), a1 => wr_addrp(1), a2 => wr_addrp(2), a3 => wr_addrp(3), dpra0 => rd_addr(0), dpra1 => rd_addr(1), dpra2 => rd_addr(2), dpra3 => rd_addr(3), spo => open, dpo => int(1) ); rp2 : ram16x1d port map (d => lxdata(2), we => high, wclk => lxclk, a0 => wr_addrp(0), a1 => wr_addrp(1), a2 => wr_addrp(2), a3 => wr_addrp(3), dpra0 => rd_addr(0), dpra1 => rd_addr(1), dpra2 => rd_addr(2), dpra3 => rd_addr(3), spo => open, dpo => int(2) ); rp3 : ram16x1d port map (d => lxdata(3), we => high, wclk => lxclk, a0 => wr_addrp(0), a1 => wr_addrp(1), a2 => wr_addrp(2), a3 => wr_addrp(3), dpra0 => rd_addr(0), dpra1 => rd_addr(1), dpra2 => rd_addr(2), dpra3 => rd_addr(3), spo => open, dpo => int(3) ); rn0 : ram16x1d port map (d => lxdata(0), we => high, wclk => notlxclk, a0 => wr_addrn(0), a1 => wr_addrn(1), a2 => wr_addrn(2), a3 => wr_addrn(3), dpra0 => rd_addr(0), dpra1 => rd_addr(1), dpra2 => rd_addr(2), dpra3 => rd_addr(3), spo => open, dpo => int(4) ); rn1 : ram16x1d port map (d => lxdata(1), we => high, wclk => notlxclk, a0 => wr_addrn(0), a1 => wr_addrn(1), a2 => wr_addrn(2), a3 => wr_addrn(3), dpra0 => rd_addr(0), dpra1 => rd_addr(1), dpra2 => rd_addr(2), dpra3 => rd_addr(3), spo => open, dpo => int(5) ); rn2 : ram16x1d port map (d => lxdata(2), we => high, wclk => notlxclk, a0 => wr_addrn(0), a1 => wr_addrn(1), a2 => wr_addrn(2), a3 => wr_addrn(3), dpra0 => rd_addr(0), dpra1 => rd_addr(1), dpra2 => rd_addr(2), dpra3 => rd_addr(3), spo => open, dpo => int(6) ); rn3 : ram16x1d port map (d => lxdata(3), we => high, wclk => notlxclk, a0 => wr_addrn(0), a1 => wr_addrn(1), a2 => wr_addrn(2), a3 => wr_addrn(3), dpra0 => rd_addr(0), dpra1 => rd_addr(1), dpra2 => rd_addr(2), dpra3 => rd_addr(3), spo => open, dpo => int(7) ); -- -------------------------------------------------------------------- -- The FIFO readout state machine -- This assumes that the readout clock is no faster than the input -- clock. -- When the FIFO is not empty, a trigger is generated that enables -- the reading process. Once the FIFO is full, further transfers from -- the transmitting DSP are inhibited until it is completely empty -- again. -- Data is transferred out of the FIFO if the ready signal is asserted. -- data_en is driven high when data is available from the FIFO. -- If ready is negated on a particular clock edge, that means that -- the data for the previous clock period was not accepted by the -- downstream device. process (clock) begin if clock'event and clock = '1' then trigger_d <= (not lxacko_int) and (not set_lxacko); if rst = '1' then rd_addr <= X"0"; data_en_int <= '0'; fifo_state <= fifo_empty; else case fifo_state is when fifo_empty => data_en_int <= '0'; if trigger_d = '1' then fifo_state <= fifo_active; rd_addr <= rd_addr + 1; dataout <= int; end if; when fifo_active => data_en_int <= ready; if data_en_int = '1' and ready = '1' then if rd_addr = X"F" then fifo_state <= fifo_final; end if; rd_addr <= rd_addr + 1; dataout <= int; end if; when fifo_final => if data_en_int = '1' and ready = '1' then data_en_int <= '0'; fifo_state <= fifo_empty; else data_en_int <= ready; end if; end case; end if; end if; end process; data_en <= data_en_int; -- -------------------------------------------------------------------- -- LXACKO handshake with transmitting DSP -- This asynchronous process identifies the conditions in the above state -- machine under which the FIFO becomes empty. Note that this happens when -- the last byte has been transferred to the output register, even before -- the desination has accepted it. process (rst, fifo_state, data_en_int, ready, rd_addr, lxacko_int) begin if rst = '1' or (fifo_state = fifo_active and data_en_int = '1' and ready = '1' and rd_addr = X"F" ) or lxacko_int = '1' then set_lxacko <= '1'; else set_lxacko <= '0'; end if; end process; -- This flip-flop is driven high synchronously by the system clock to -- enable transmission, but is immediately cleared asynchronously by -- the first clcok pulse from the transmitting DSP. ff_lxack : fdc port map ( d => set_lxacko , c => clock , clr => lxclk , q => lxacko_int ); lxacko <= lxacko_int; -- -------------------------------------------------------------------- end; ============================================================================Article: 120517
Bob, Yes. AustinArticle: 120518
On Jun 7, 10:40 pm, ARH <haghdo...@gmail.com> wrote: > On Jun 8, 7:50 am, Frank <FrankT...@gmail.com> wrote: > > > > > PLEASE HELP: > > > In my design I have a bank of memory made up of 64 reg[31:0]'s. They > > get synthesized into latches, and kill my timing because of a huge 64 > > X 32 sensitivity list.(See below). > > > How else can I synthesize a 64X32 block of Memory with Asynch Read, > > Asynch Write, and where the Data_In shows up immediately on the > > Data_Out? > > > Can I do it withXilinxCoreGen Modules? > > > //This part handles the read for the 64 32-bit registers which store > > the data. > > //The rd_address is 1-hot, 64-bit reg. > > > always @ (rd_address or q0 or q1 or q2 or q3 or q4 or q5 or q6 or q7 > > or q8 or q9 or qa or qb or qc or qd or qe or qf > > or q10 or q11 or q12 or q13 or q14 or q15 or q16 > > or q17 > > or q18 or q19 or q1a or q1b or q1c or q1d or q1e > > or q1f > > or q20 or q21 or q22 or q23 or q24 or q25 or q26 > > or q27 > > or q28 or q29 or q2a or q2b or q2c or q2d or q2e > > or q2f > > or q30 or q31 or q32 or q33 or q34 or q35 or q36 > > or q37 > > or q38 or q39 or q3a or q3b or q3c or q3d or q3e > > or q3f) > > begin > > case (1'b1) // synopsys full_case parallel_case > > rd_address[0]: DO = q0; > > rd_address[1]: DO = q1; > > rd_address[2]: DO = q2; > > rd_address[3]: DO = q3; > > rd_address[4]: DO = q4; > > rd_address[5]: DO = q5; > > rd_address[6]: DO = q6; > > rd_address[7]: DO = q7; > > rd_address[8]: DO = q8; > > rd_address[9]: DO = q9; > > rd_address[10]: DO = qa; > > rd_address[11]: DO = qb; > > rd_address[12]: DO = qc; > > rd_address[13]: DO = qd; > > rd_address[14]: DO = qe; > > rd_address[15]: DO = qf; > > rd_address[16]: DO = q10; > > rd_address[17]: DO = q11; > > rd_address[18]: DO = q12; > > rd_address[19]: DO = q13; > > rd_address[20]: DO = q14; > > rd_address[21]: DO = q15; > > rd_address[22]: DO = q16; > > rd_address[23]: DO = q17; > > rd_address[24]: DO = q18; > > rd_address[25]: DO = q19; > > rd_address[26]: DO = q1a; > > rd_address[27]: DO = q1b; > > rd_address[28]: DO = q1c; > > rd_address[29]: DO = q1d; > > rd_address[30]: DO = q1e; > > rd_address[31]: DO = q1f; > > rd_address[32]: DO = q20; > > rd_address[33]: DO = q21; > > rd_address[34]: DO = q22; > > rd_address[35]: DO = q23; > > rd_address[36]: DO = q24; > > rd_address[37]: DO = q25; > > rd_address[38]: DO = q26; > > rd_address[39]: DO = q27; > > rd_address[40]: DO = q28; > > rd_address[41]: DO = q29; > > rd_address[42]: DO = q2a; > > rd_address[43]: DO = q2b; > > rd_address[44]: DO = q2c; > > rd_address[45]: DO = q2d; > > rd_address[46]: DO = q2e; > > rd_address[47]: DO = q2f; > > rd_address[48]: DO = q30; > > rd_address[49]: DO = q31; > > rd_address[50]: DO = q32; > > rd_address[51]: DO = q33; > > rd_address[52]: DO = q34; > > rd_address[53]: DO = q35; > > rd_address[54]: DO = q36; > > rd_address[55]: DO = q37; > > rd_address[56]: DO = q38; > > rd_address[57]: DO = q39; > > rd_address[58]: DO = q3a; > > rd_address[59]: DO = q3b; > > rd_address[60]: DO = q3c; > > rd_address[61]: DO = q3d; > > rd_address[62]: DO = q3e; > > rd_address[63]: DO = q3f; > > Hi > > Two of the most over used and abused directives included in Verilog > models are the directives "//synopsys full_case parallel_case". The > popular myth that exists surrounding "full_case > parallel_case" is that these Verilog directives always make designs > smaller, faster and latch-free. This is false! Indeed, the "full_case > parallel_case" switches frequently make designs larger and > slower and can obscure the fact that latches have been inferred . > These switches can also change the functionality of a design causing a > mismatch between pre-synthesis and post-synthesissimulation, which if not discovered during gate-level simulations will > cause an ASIC to be taped out with design problems. > > it is generally a bad coding practice to give the synthesis tool > different information about the functionality of a design than is > given to the simulator. > > Whenever either "full_case" or "parallel_case" directives are added to > the Verilog source code, more information is potentially being given > about the design to the synthesis tool than is being given to the > simulator > > In general, do not use "full_case parallel_case" directives with any > Verilog case statements. > > ARH Additionally, Xilinx Block Memory does not support Asynchronous read and write modes. If you want to use Asynchronous modes, then you are now automatically killing the timing of your design. If meeting timing is your objective ideally you would want to map to Block Memory and in order to do this, change to synchrnous timing. Asynchronous timing , especially when not careful of how you are using it, will always end up in bad timing scores. If you do want to do this in asynchronous mode, then the best you can do is to code it to use distributed memory. If you want tools to infer the memory, then please refer to the coding templates for the synthesis tool you are using and you will achieve the correct inference. As you have coded above, this is a latch based implementation. Thanks DuthArticle: 120519
http://www.zylogic.com.cn/english/products04.htm I wonder what that is? it looks like the product Triscend never announced, but maybe its a hoax, still funny at least the spec are known now what Triscend was about to announce just before it was purchased by Xilinx AnttiArticle: 120520
Antti wrote: > http://www.zylogic.com.cn/english/products04.htm > > I wonder what that is? > it looks like the product Triscend never announced, but maybe its a > hoax, > still funny at least the spec are known now what Triscend was about to > announce > just before it was purchased by Xilinx It doesn't just look like it: If you click on the product brief (http://www.zylogic.com.cn/download/pdf/products01_2/A7VProductBrief.pdf), it's from Triscend... with Triscend logo and everything. cu, Sean -- My email address is only valid until the end of the month. Try figuring out what the address is going to be after that...Article: 120521
Dear I want to get "actual clock frequency" of my design (in ISE tool). Due to following problem, I am having trouble. I implemented 20-port "crossbar network" module, in which number of I/ O pins is greater than number of FPGA I/O pins. I synthesized and got 'estimated' clock frequency. My goal is to "place and route" my design. What I did was to " Put dummy module to each crossbar port " Task of these dummy modules is - Get signals from the crossbar network - Register (or simple dummy arithmetic functions) - Forward the registered signals to the crossbar network. In this way, I have only "clock, reset, result" pins in my TOP module. Problem was that Synthesizer optimizes, such that my dummy modules are 'almost' trimmed away. Accordingly, part of my crossbar network module is also removed. It is very time-consuming to manually modify 20 dummy modules. Question is that - Can we make these dummy modules 'locked' in VHDL description, so that the synthesizer will not try to optimize? - Is there any other way to place/route my design, when number of I/O pins is greater than number of FPGA pins? Thank you for comment again.Article: 120522
Hi I am involved in a high reliability application where the device we want to use (Quicklogic EclipsePlus QL7160) is only available in a 208pin plastic PQFP (30mm sq) or 280pin plastic BGA (17mm sq). The environment will subject the device to elevated temperature (~115deg) and vibration (TBD). Has anyone any experience with using these types of devices in such an environment & what kind of measures can be taken to improve mechanical reliability ? Also, does anyone have any general comments on Quicklogic devices and/ or support ? Thanks DaveArticle: 120523
motty wrote: > I have upgraded to EDK 9.1 and am trying to generate the HDL > simulation files. I have successfully compiled both the ISE libraries > and the EDK libraries. They live in C:\xilinx_sim_libs and C: > \xilinx_sim_libs\EDK respectively. Make sure that those directories contain the file "_info" Maybe the compiled directory is really C:\xilinx_sim_libs\work -- Mike TreselerArticle: 120524
Antti wrote: > http://www.zylogic.com.cn/english/products04.htm > > I wonder what that is? > it looks like the product Triscend never announced, but maybe its a > hoax, > still funny at least the spec are known now what Triscend was about to > announce > just before it was purchased by Xilinx > > Antti > Zylogic and Rochester were handling the end-of-life commitments that were in place for the Triscend parts. I would be a bit shocked if they would even take an order at this point unless they still had some parts in stock. Ed McGettigan -- Xilinx Inc.
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z