Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
hrocarina@gmail.com wrote: > We are looking for a consultant for a project that involves an FPGA to > implement our data manipulation algorithm to scale up to 5000 > simultaneous sessions and interface with our Opteron system via either > PCI or Hypertransport. > > Please contact me if interested as soon as possible. > > hrocarina@gmail.com > > Regards, > HR I'll give you a call next week - NOT. Get a marketing consultant, yesterday.Article: 105176
I am rusty on Verilog so can't remember if you have a generate statement available but another way to cut work is to have a layered component such that the bottom level has say four 4ip OR gates in it. The layer above has 4 of the supper gate and so on. If you start with at the bottom with a or gate instantiation and do the same all the way up with component instatiations the synthesiser won't be able to do much to insert other gates. The MUXCY is probably being used as the carry chain is a fast route compared to general routing and can be used to make a wide OR function with 2 or more LUTs. To a degree this may be the fastest way to get you OR but probaby tempered with some imposed structure. As a guess the synthesiser is currently generating a number of 220-228 i/p OR gates then putting the output together in another OR function. John Adair Enterpoint Ltd. mk wrote: > Hi everyone, > I am trying to 'or' a 2K vector in Virtex4. Looking at the problem as > a first approximation, it would need 6 levels of 4 input lookup > tables. So far I have tried XST but it seems to be using the initial > 512 LUT4s and then 56 levels of MUXCY. Any ideas why it's using the > MUXCYs? They seem to be quite fast at 45ns each but number of levels > is quite high. I'm curious what the timing would look like if I could > force it to use only LUT4s but I really don't want to code it by hand > and I am too lazy to write a perl script to do it either. Any > suggestions ? > > Thanks. > > PS Here is what I am using as a test module. I am trying to map it to > a virtex4-10. > > module orlt(clk, in, out); > input clk; > input [2047:0] in; > output out; > > reg [2047:0] inr; > reg out; > wire outw; > > orl u0(inr, outw); > > always @(posedge clk) > begin > out <= outw; > inr <= in; > end > > endmodule > > module orl(in, out); > input [2047:0] in; > output out; > > wire out = |in[2047:0]; > endmoduleArticle: 105177
Hi google really finds things: http://www.ccm.ece.vt.edu/~amarschn/openfire/tools.html at the above link you can download OpenFire verilog source code a small writeup with synthesis results http://xilant.com/content/view/43/2/ AnttiArticle: 105178
"John Adair" <g1@enterpoint.co.uk> writes: > It's now been christened and had the obligatory bottle of Champers > smashed. Darnaw1 is the name to look for. > Go on John, enlighten us :-) Where do you get your names from? Do you just open a random OS (that's Ordnance Survey for non-Brits - not operating system!) map and stick your finger on a remote village or something? Cheers, Martin -- martin.j.thompson@trw.com TRW Conekt - Consultancy in Engineering, Knowledge and Technology http://www.trw.com/conektArticle: 105179
Thomas Reinemann <thomas.reinemann@aucotronics.de> writes: > Hello, > > usually a reset signal is applied to put the FFs of an FPGA into a known > state. Just some days ago I had a discussion. Someone's point of view > is, that a reset is not necessary, since the FF's output will be always > zero, after applying the voltage. Does this happen in FPGAs really, > especially in a Spartan3? > The FFs always come up in a guaranteed way after power-up - the bitstream you feed into the FPGA defines their power-up state. Normally this will be a zero, but there are situations where this can change. Either because you tell the tools you want a '1' there instead - or because the tools decided it would make their life easier for it to power-up to '1'. If you're lucky they'll even tell you they did it :-) The old Altera 10K series FFs could only reset to '0', so if you asked for a preset FF, the mapper (or whatever it was called back in those days) would stick not gates either side of it to make it behave how you asked it to. This had the side-effect of power-up to '1' also, which was *usually* what you wanted... Anyway, those days are passed, I'll be quiet now :-) Martin -- martin.j.thompson@trw.com TRW Conekt - Consultancy in Engineering, Knowledge and Technology http://www.trw.com/conektArticle: 105180
"mk" <kal*@dspia.*comdelete> wrote in message news:574jb2h3o7cv3viul4ghq7j3gt2pen6l2d@4ax.com... > Hi everyone, > I am trying to 'or' a 2K vector in Virtex4. Looking at the problem as > a first approximation, it would need 6 levels of 4 input lookup > tables. So far I have tried XST but it seems to be using the initial > 512 LUT4s and then 56 levels of MUXCY. Any ideas why it's using the > MUXCYs? They seem to be quite fast at 45ns each but number of levels > is quite high. I'm curious what the timing would look like if I could > force it to use only LUT4s but I really don't want to code it by hand > and I am too lazy to write a perl script to do it either. Any > suggestions ? > > Thanks. > Your synthesiser is using the MUXCYs because it uses less resource (about 75% of the tree method) and is faster. If the MUXCY propagation delay was 45ns, I'd be worried, but it's really only 45ps! :-) If you build a tree, it'll be slower. It's not just the LUT delay, it's all that routing you need for a wide OR gate. To show it, you could try synthesising a 2k XOR gate. Your synthesiser might struggle to implement that with a carry structure. HTH, Syms.Article: 105181
Antti wrote: > Hi > > google really finds things: > > http://www.ccm.ece.vt.edu/~amarschn/openfire/tools.html > Another ( maybe the same ;-) ) is at http://www.opencores.org/projects.cgi/web/aemb/overview SandroArticle: 105182
Sandro schrieb: > Antti wrote: > > Hi > > > > google really finds things: > > > > http://www.ccm.ece.vt.edu/~amarschn/openfire/tools.html > > > > Another ( maybe the same ;-) ) is at > http://www.opencores.org/projects.cgi/web/aemb/overview > > Sandro its not the same at all - the OpenFire docs explain why they did not use the aeMB but rather designed a new core. the author of the aeMB has other priorities (university study) and has dropped the development ASFAIK AnttiArticle: 105183
Austin Lesea wrote: > Gee, > > Thanks. (Austin for Austin) > > Peter is on vacation, so I will say thanks for him as well: > > Danke, > > Austin (for Peter) Add me to list. While I have not always agreed with Austin, don't come here asking for free help and then start crying 'Xilinx is the evil empire' when you don't like the answer. Anytime there has been a serious problem on the Xilinx side (like with incorrect documentation, bad chip, etc.) they provide help. When it comes to design advice or something that is screwed up (for any reason), don't expect a group of engineers to form a consensus that you were the helpless victim of a bad chip or poor documentation. Anytime you get lots of engineers together, there is never agreement. Everyone thinks there way is better. -EliArticle: 105184
It is important to have a reset that is synchronously deasserted relative to every clock used. These may be fully syncrhonous resets or asynchronous resets that have the trailing edge syncrhonized. The reason for this is that if a reset input is not syncrhonized to the same clock as the circuitry being reset, then all flops in the circuit will not come out of reset on the same clock, which, unless it is handled very carefully, will cause problems that can be very hard to debug. Whether or not you have a separate reset, or are only resetting on configuration, the above requirements hold true. Andy Nial Stewart wrote: > "Thomas Reinemann" <thomas.reinemann@aucotronics.de> wrote in message > news:e981ph$ur5$1@news.boerde.de... > > Hello, > > usually a reset signal is applied to put the FFs of an FPGA into a known > > state. Just some days ago I had a discussion. Someone's point of view > > is, that a reset is not necessary, since the FF's output will be always > > zero, after applying the voltage. Does this happen in FPGAs really, > > especially in a Spartan3? > > Bye Tom > > > If you use any form of PLL/DLL in your design I don't think you can > be sure of what's going to happen until it's locked. This can throw > logic/state machines into complete disarray. > > I generate a synchronous reset which de-activates some period > after all my PLLs have locked. > > > > NialArticle: 105185
Synplicity has a product designed for ASIC verification using FPGAs that can semi-automate the partitioning problem. I have no experience with the product. Andy Brannon wrote: > > I am interested to learn more about techniques for design partition > > across multiple FPGAs. > > Traditionally people have tried to come up with auto-partitioners that > are somehow smart enough to split up connections between chips. The > scope of that problem is too large. I propose you do it this way: > > First, you have to define a dataset as partitionable. You cannot break > apart objects unless they are connected by this specific dataset that > is allowed to be broken. You'll need some communication core that goes > with that dataset on both ends of the transfer. Then your partition > software will automatically insert those communication cores in after > it decides to separate a certain line with the breakable dataset. Um. > I'm not sure I'm describing this very well. Does that make sense? > > So for example, suppose you have a dataset that is made of some data > bits, an enable bit, a clock, and a busy signal going the opposite > direction. That dataset is breakable because you can send the data, > clock, and enable to a fifo on the far chip; that fifo can send back an > almost busy signal to stop data from being sent. A simpler case would > be a control line that is stable ages before it is needed; your > separation objects for those will just be buffers and pads.Article: 105186
Symon wrote: > "mk" <kal*@dspia.*comdelete> wrote in message > news:574jb2h3o7cv3viul4ghq7j3gt2pen6l2d@4ax.com... > > Hi everyone, > > I am trying to 'or' a 2K vector in Virtex4. Looking at the problem as > > a first approximation, it would need 6 levels of 4 input lookup > > tables. So far I have tried XST but it seems to be using the initial > > 512 LUT4s and then 56 levels of MUXCY. Any ideas why it's using the > > MUXCYs? They seem to be quite fast at 45ns each but number of levels > > is quite high. I'm curious what the timing would look like if I could > > force it to use only LUT4s but I really don't want to code it by hand > > and I am too lazy to write a perl script to do it either. Any > > suggestions ? > > > > Thanks. > > > Your synthesiser is using the MUXCYs because it uses less resource (about > 75% of the tree method) and is faster. If the MUXCY propagation delay was > 45ns, I'd be worried, but it's really only 45ps! :-) If you build a tree, > it'll be slower. It's not just the LUT delay, it's all that routing you need > for a wide OR gate. To show it, you could try synthesising a 2k XOR gate. > Your synthesiser might struggle to implement that with a carry structure. Is that 45 ps per LUT of the carry or 45 ps per CLB in the carry chain? If I use the 56 elements that the OP said, I get 2.52 ns total carry delay. That is pretty remarkable if it is correct. Increasing that to 45 ps per each of the 512 LUTs the carry delay is still only 23.04 ns. A combination approach combining say 16 LUTs with the carry then using an 8 input OR gate should be a bit faster. 16 carries is about the same speed as a LUT. I have not looked at the Virtex 4 architecture so I don't know for sure if this is needed or if the carry delay is 45 ps per CLB.Article: 105187
"rickman" <spamgoeshere4@yahoo.com> wrote in message news:1153146462.943222.181680@b28g2000cwb.googlegroups.com... > Symon wrote: > > Is that 45 ps per LUT of the carry or 45 ps per CLB in the carry chain? > If I use the 56 elements that the OP said, I get 2.52 ns total carry > delay. That is pretty remarkable if it is correct. > Hi Rick, Yes, that's 45ps per LUT. I believe the carry is actually implemented as a two bit look ahead, so that each CLB is a two bit carry with delay of 90ps. But, now you mention it, I don't understand the 56 levels thing. > > Increasing that to 45 ps per each of the 512 LUTs the carry delay is > still only 23.04 ns. A combination approach combining say 16 LUTs with > the carry then using an 8 input OR gate should be a bit faster. 16 > carries is about the same speed as a LUT. I have not looked at the > Virtex 4 architecture so I don't know for sure if this is needed or if > the carry delay is 45 ps per CLB. > Thinking about it a bit harder, and after reading your post, I reckon the synthesiser must be doing what you suggest, dividing the chain up into sections, and oring together the output. Cheers, Syms.Article: 105188
"Symon" <symon_brewer@hotmail.com> wrote in message news:44bba39a$1_2@x-privat.org... > "rickman" <spamgoeshere4@yahoo.com> wrote in message > news:1153146462.943222.181680@b28g2000cwb.googlegroups.com... >> Symon wrote: >> >> Is that 45 ps per LUT of the carry or 45 ps per CLB in the carry chain? >> If I use the 56 elements that the OP said, I get 2.52 ns total carry >> delay. That is pretty remarkable if it is correct. >> > Hi Rick, > Yes, that's 45ps per LUT. I believe the carry is actually implemented as a > two bit look ahead, so that each CLB is a two bit carry with delay of > 90ps. But, now you mention it, I don't understand the 56 levels thing. >> >> Increasing that to 45 ps per each of the 512 LUTs the carry delay is >> still only 23.04 ns. A combination approach combining say 16 LUTs with >> the carry then using an 8 input OR gate should be a bit faster. 16 >> carries is about the same speed as a LUT. I have not looked at the >> Virtex 4 architecture so I don't know for sure if this is needed or if >> the carry delay is 45 ps per CLB. >> > Thinking about it a bit harder, and after reading your post, I reckon the > synthesiser must be doing what you suggest, dividing the chain up into > sections, and oring together the output. > Cheers, Syms. More specifically the synthesizer is probably splitting into two levels of carry chains. Rather than 512 LUTs feeding a carry chain that's 128 rows high (there are 2 carry chain paths in a CLB, 4 LUTs per carry chain) using 2 levels of carry chains with the first at 5 MUXCY stages (32 inputs) and the second at 6 MUXCY stages (64 inputs, specifying 64 initial carry chains) the delay ends up being shorter still. The Tbyp value, by the way, is about 103 ps in the Spartan3E (-5 speed grade) and corresponds to 2 LUTs worth of carry chain since the bypass is on a slice-by-slice basis. ***** Dadgummit. The 8.2.01i speedprint numbers for Tbyp doesn't match my Timing Analyzer numbers (which did seem to correspond in speedprint 8.1.03i). I've submitted a case to Xilinx on this. ***** In the Spartan3E -5 speed grade, for instance, using timing numbers from my 8.2.01i Timing Analyzer (a mixbag of SliceM and SliceL values so the actual numbers will vary) the 6-level OR would end up Tcko+5*(Tnet+Tilo)+Tnet+Tfck = 0.567+6*Tnet+5*0.660+0.776 = 4.643+6*Tnet an average Tnet of 1ns (routing to logic of 56% to 44% which is much better than what I'd expect for a wide distribution of inputs) gives = 10.643 ns While a single carry chain across 128 CLB rows would be Tcko+Tnet+Topcyf+255*Tbyp+Tcinck = 0.567+Tnet+1.011+255*(0.103)+0.518 = 28.561+Tnet or probably under = 29.561 ns Which is much worse than the tree or for 2 levels of carry chains which would be Tcko+Tnet+Topcyf+2*Tbyp+Tnet+Topcyf+2*Tbyp+Tcinck = 0.567+Tnet+1.011+2*0.103+Tnet+1.011+2*0.103+0.518 = 3.519 + 2*Tnet or around = 5.519 ns Two levels of carry chains use significantly fewer resources than an OR tree while the delay is about half what the tree would need. The key to the number of carry chains the tool generates for the longest delay would be the number of Topcyf (or Topcyg) values in the path as reported by Timing Analyzer. Ain't optimization fun?Article: 105189
"Symon" <symon_brewer@hotmail.com> wrote in message news:44bba39a$1_2@x-privat.org... > Yes, that's 45ps per LUT. I believe the carry is actually implemented as a > two bit look ahead, so that each CLB is a two bit carry with delay of 90ps. > But, now you mention it, I don't understand the 56 levels thing. > > > > Increasing that to 45 ps per each of the 512 LUTs the carry delay is > > still only 23.04 ns. A combination approach combining say 16 LUTs with > > the carry then using an 8 input OR gate should be a bit faster. 16 > > carries is about the same speed as a LUT. I have not looked at the > > Virtex 4 architecture so I don't know for sure if this is needed or if > > the carry delay is 45 ps per CLB. > > > Thinking about it a bit harder, and after reading your post, I reckon the > synthesiser must be doing what you suggest, dividing the chain up into > sections, and oring together the output. If you think about it just a tiny bit harder, the structure of the optimal circuit comes down to an assessment of the relative performance of the LUT delay + routing, and the carry chain delay. Intuitively, the best circuit will have minimal disparity between the fastest and slowest path. Say for the sake of argument that four stages of carry-OR takes as long as one LUT-OR. Then an extremely coarse rendition of the fastest circuit to do a big OR will look a bit like this (L = LUT, ^ = carry-mux OR, inputs [not shown] on left): L-L-L-^ (top (result)) L-L-L-^ L-L-L-^ L-L-L-^ L-L-^ L-L-^ L-L-^ L-L-^ L-^ L-^ L-^ L-^ (bottom) The further up the carry chain you get, the more the inputs to the carry-mux elements are just "waiting around" for the carry propagation. Eventually it reaches the point where you can squeeze in an extra level of LUTs in these higher stages, and thus reduce the total size of the carry chain. Go further up, and you can afford two extra levels, and so on. I'd hope that at least some tools are clever enough to exploit this. (Note: in reality, the ratio of LUT:CY speed in this context is somewhere in the 12:1 to 16:1 ballpark for most Xilinx architectures.) Hope this makes sense... perhaps someone can take it a step further and work out where the 56 levels thing really comes from (and thus deduce what this particular synthesis tool believes the LUT:CY speed ratio is!). Cheers, -Ben-Article: 105190
Has anyone successfully used the PowerPC Instruction-Set Simulator packaged with EDK (8.1)? I can set it up and launch XMD, but whenever I try to download an elf file it lists all the parts of the file and then reports "Failed to download ELF file Unable to write to Sim" Any subsequent attempts to download or run result in the error "Error in Resetting Target" I seem to be able to download just fine to hardware with XMD, but not everyone involved in the research I'm doing has access to the actual board, so it would be nice to have the simulator working, even if its functionality is limited.Article: 105191
"Ben Jones" <ben.jones@xilinx.com> wrote in message news:e9gfcl$4mg1@cliff.xsj.xilinx.com... > > > Hope this makes sense... perhaps someone can take it a step further and > work > out where the 56 levels thing really comes from (and thus deduce what this > particular synthesis tool believes the LUT:CY speed ratio is!). > > Cheers, > > -Ben- > > Hi Ben, Thanks for that, it made sense to me. I think we might need to know what part the design was in because the carry chain length is limited by the number of rows in the FPGA. Smaller parts have smaller maximum length chains. Also, as a BTW, I see from the datasheet that the ORCY structure that was in V2PRO has been dropped from the V4. That made wide gates even faster. Cheers, Syms.Article: 105192
WebPack and SP1 are available but it looks like the ISE WebPack does not support any Virtex-5 devices at all? I did assume smallest Virtex-5 would be supported. what a pity. AnttiArticle: 105193
<Sean> schrieb im Newsbeitrag news:ee9d076.-1@webx.sUN8CHnE... > Has anyone successfully used the PowerPC Instruction-Set Simulator > packaged with EDK (8.1)? I can set it up and launch XMD, but whenever I > try to download an elf file it lists all the parts of the file and then > reports > > "Failed to download ELF file > > Unable to write to Sim" > > Any subsequent attempts to download or run result in the error > > "Error in Resetting Target" > > I seem to be able to download just fine to hardware with XMD, but not > everyone involved in the research I'm doing has access to the actual > board, so it would be nice to have the simulator working, even if its > functionality is limited. - same here I tried once got the same error and gave up. AnttiArticle: 105194
"John_H" <johnhandwork@mail.com> wrote in message news:t3Pug.5676$Oh1.1853@news01.roc.ny... <snip> > Which is much worse than the tree or for 2 levels of carry chains which > would be > > Tcko+Tnet+Topcyf+2*Tbyp+Tnet+Topcyf+2*Tbyp+Tcinck > = 0.567+Tnet+1.011+2*0.103+Tnet+1.011+2*0.103+0.518 > = 3.519 + 2*Tnet > or around > = 5.519 ns > > Two levels of carry chains use significantly fewer resources than an OR > tree while the delay is about half what the tree would need. > > The key to the number of carry chains the tool generates for the longest > delay would be the number of Topcyf (or Topcyg) values in the path as > reported by Timing Analyzer. > > Ain't optimization fun? I thought through this too quickly. The first stage in the example I was drawing out could do 64-wide ORs with the first carry chain which is 8 slices or 7*Tbyp, not 2*Tbyp. The second stage would be from 32 carry chains for 4 slices of MUXCY-based OR for 3*Tbyp, not 2*Tbyp so the timing would be more like 6.137 ns, still significantly better than the LUT tree. I missed the 56 elements mentioned initially; this is probably just poor partitioning, relying instead on a "maximum carry width" value. I'd manually partition the OR into two sets based on the 2 levels of carries. The generate can be used to shorthand the 32 intermediate values. The KEEP attribute may be what's needed in XST - I use the syn_keep=1 in the synplicity synthesizer. This synthesized okay but I didn't put a wrapper around it to get into a physiacl part (2k I/O is too much for me). module orlt(clk, in, out); input clk; input [2047:0] in; output out; reg [2047:0] inr; reg out; wire outw; orl u0(inr, outw); always @(posedge clk) begin out <= outw; inr <= in; end endmodule module orl(in, out); input [2047:0] in; output out; (* KEEP *) wire [31:0] mid; generate genvar i; for( i=0; i<32; i=i+1) begin : MUXCYtree assign mid[i] = |in[i*64 +: 64]; end endgenerate wire out = |mid[31:0]; endmoduleArticle: 105195
Antti, not nice to hijack Sean thread. Anyway is perhaps Austin's attitude "I still have no idea why this matters whatsoever" is the official Xilinx position (cf. http://groups.google.com/group/comp.arch.fpga/tree/browse_frm/thread/d3a75da111b452a3/a852a6a48db9a88b?rnum=1&q=new+largest+&_done=%2Fgroup%2Fcomp.arch.fpga%2Fbrowse_frm%2Fthread%2Fd3a75da111b452a3%2F462b1ea94d885aa4%3Fq%3Dnew+largest+%26rnum%3D1%26#doc_a0535aeea2a09638) Maybe ISE 9.0 will be better. Tommy Antti Lukats wrote: > WebPack and SP1 are available but it looks like the ISE WebPack does not > support any Virtex-5 devices at all? I did assume smallest Virtex-5 would be > supported. what a pity. > > AnttiArticle: 105196
"Tommy Thorn" <tommy.thorn@gmail.com> schrieb im Newsbeitrag news:1153159547.039604.67790@b28g2000cwb.googlegroups.com... > Antti, not nice to hijack Sean thread. > > Anyway is perhaps Austin's attitude "I still have no idea why this > matters whatsoever" is the official Xilinx position (cf. > http://groups.google.com/group/comp.arch.fpga/tree/browse_frm/thread/d3a75da111b452a3/a852a6a48db9a88b?rnum=1&q=new+largest+&_done=%2Fgroup%2Fcomp.arch.fpga%2Fbrowse_frm%2Fthread%2Fd3a75da111b452a3%2F462b1ea94d885aa4%3Fq%3Dnew+largest+%26rnum%3D1%26#doc_a0535aeea2a09638) > > Maybe ISE 9.0 will be better. > > Tommy > > > Antti Lukats wrote: >> WebPack and SP1 are available but it looks like the ISE WebPack does not >> support any Virtex-5 devices at all? I did assume smallest Virtex-5 would >> be >> supported. what a pity. >> >> Antti > I already apologized! I havent been able to post with outlook express for a while and I had forgotten that by hitting reply and changing subject to completly new one the post is still going as reply. silly stupid me. sorry again, wasnt intentional. AnttiArticle: 105197
"Antti Lukats" <antti@openchip.org> schrieb im Newsbeitrag news:e9gj1o$pqj$1@online.de... > WebPack and SP1 are available but it looks like the ISE WebPack does not > support any Virtex-5 devices at all? I did assume smallest Virtex-5 would > be supported. what a pity. > > Antti > ops, I posted incorrectly as reply. sorry. and another ops, 5 seconds ago claimed that I already said sorry, but that sorry was sent to the OP only not as reply to me wrong posting AnttiArticle: 105198
On Mon, 17 Jul 2006 18:04:33 GMT, "John_H" <johnhandwork@mail.com> wrote: ... >> Ain't optimization fun? > >I thought through this too quickly. The first stage in the example I was >drawing out could do 64-wide ORs with the first carry chain which is 8 >slices or 7*Tbyp, not 2*Tbyp. The second stage would be from 32 carry >chains for 4 slices of MUXCY-based OR for 3*Tbyp, not 2*Tbyp so the timing >would be more like 6.137 ns, still significantly better than the LUT tree. > >I missed the 56 elements mentioned initially; this is probably just poor >partitioning, relying instead on a "maximum carry width" value. > >I'd manually partition the OR into two sets based on the 2 levels of >carries. The generate can be used to shorthand the 32 intermediate values. >The KEEP attribute may be what's needed in XST - I use the syn_keep=1 in the >synplicity synthesizer. This synthesized okay but I didn't put a wrapper >around it to get into a physiacl part (2k I/O is too much for me). ... Thanks John and everyone else, So far I tried all three options. It turns out a LUT4 tree is slightly faster at 6.26ns than what XST comes up with (6.613ns) where as the number of LUT4s go from 515 to 811. John's two level LUT4+muxcy on the other hand has a delay of 4.94ns at 648 LUT4s. In terms of generating the LUT4 tree by hand, I used 5 different generate statements with keeps on the outputs which convinces XST to give me what I wanted. By the way 32x64 vs 64x32 partition does not make a difference but 64x32 is very slightly larger.Article: 105199
"mk" <kal*@dspia.*comdelete> wrote in message news:rs5lb2ltpdofqci81sr6hnvpjraub3rduc@4ax.com... > Thanks John and everyone else, > So far I tried all three options. It turns out a LUT4 tree is slightly > faster at 6.26ns than what XST comes up with (6.613ns) where as the > number of LUT4s go from 515 to 811. John's two level LUT4+muxcy on the > other hand has a delay of 4.94ns at 648 LUT4s. > In terms of generating the LUT4 tree by hand, I used 5 different > generate statements with keeps on the outputs which convinces XST to > give me what I wanted. By the way 32x64 vs 64x32 partition does not > make a difference but 64x32 is very slightly larger. I would have thought the result would be 512+16+1 LUTs -- 2048/4 LUTs feeding 64 carry chains, 64/4 LUTs feeding the final carry chain, and 1 to register the carry at the top of the chain -- for 539 total, not 648. _____ For the OR tree, rather than 5 generates you could be creative with one big wire and do one generate loop: (*KEEP*) wire [681:0] ORs; // 512+128+32+8+2 intermediate OR results wire [2729:0] XtraWideOR = {ORs,inr}; generate genvar i; for( i=0; i<682; i=i+1) begin : ORtree ORs[i] = |XtraWideOR[i*4 +: 4]; end endgenerate assign outw = |XtraWideOR[682*4 +: 2];
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z