Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Ben Jackson wrote: > On 2006-09-11, David Ashley <dash@nowhere.net.dont.email.me> wrote: > > I want to create a 1152 by 6 bit rom and I want to use > > a bram. It can be clocked or not clocked, but I'd prefer > > not clocked. Can someone point me to a template? > > There's a whole PDF of them called "xst.pdf" which you can google. I was not aware of this document. It looks very useful, but it seems to be a bit out of date. It does not mention a number of newer families including Spartan 3. I guess the information applies as appropriate depending on the feature. Will Xilinx be updating this document anytime soon?Article: 108551
David Ashley wrote: > John_H wrote: >> A 2-D example using fixed length SRLs that comes to my mind is a 90 degree >> pixel rotation. >> >> If you have a 16x16 array of vectors that come in in the order >> >> A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 Aa Ab Ac Ad Ae Af >> B0 B1 B2 B3 B4 ... >> C0 C1 ... >> . >> . >> . >> P0 P1 P2 P3 ... >> >> And want to send them back out rotated 90 degrees so the order is >> >> A0 B0 C0 D0 E0 F0 G0 H0 I0 J0 K0 L0 M0 N0 O0 P0 >> A1 B1 C1 D1 E1 ... >> A2 B2 ... > > Just a nitpick but wouldn't this be a transpose? You'd need to > invert in X or Y to get a 90 degree rotation. > > -Dave If Sally comes into the room followed by Barbara then Sheila and finally Carol but exiting the room is four pairs of shoes followed by four nicely folded outfits followed by a basket of lingerie and finally four unclad women racing after their departed belongings, is it just a transposition? Things got very rearranged in the process. In the example above the A values enter first followed by the B values and so on. When they exit the rotator scheme, they exit as the zero label values followed by the 1 label values and so on. The transpose is a 90 degree rotation of 16x16 blocks within a 256 element grid. To get this to run continuously with simple registers would require 384 registers. When the resource usage can be nearly quartered, isn't it something to consider? The issue at hand was data reordering. The rotation is a simple reorder but in a way that isn't easy to parallelize at high speeds without throwing a huge number of resources at the problem when the information is available in a serial fashion.Article: 108552
Here's a handy link to this whole thread, provided by Google: http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6 KJ wrote: > james..@yahoo.ca wrote: > > The "redundant" and "unused" logic terms I am copying from the mapper > > report and Xilinx documentation. The mapper report (see my > > first post) says "redundant" logic is being removed, not "unused > > logic". > >From my reading of the Xilinx manuals I understand that "unused logic" > > means logic that is not connected to anything, so it can be removed > > (this latter is not what is happening to me). > > However, I haven't found anything in the manuals that explains what > > "redundant logic" is or how to write the code to avoid it. > A simple example of the 'redundant' logic that I was asking about is > something that one might decide to put in to avoid race conditions is > the following code which implements a transparent latch (By the way, do > not implement this in real code in an FPGA). > Q <= (en and D) -- #1 > or (not(en) and Q) -- #2 > or (en and Q); --#3 > > The point is that #3 is a redundant logic term and any synthesis tool > will be able to recognize this and remove it. If you remember how to > do Karnaugh maps this example is also easy enough to see it for > yourself. If you don't know about Karnaugh maps just take my word on > it that #1 and #2 are 'logically' all you need. Term #3 is something > that you would need to put into any actual implementation because using > only #1 and #2, although they are logically complete have a race > condition when 'en' is switching. Yes, I'm familiar with Karnaugh maps and I understand the point. Remember, I am past synthesis and my problem is in the mapper, going from .NGD (Native Generic Database) to .NCD (Native Circuit Description) files. Does this redundant logic removal process you just described happen at this stage? Remember, the "Redundant" terminology is Xilinx's, not mine, and it is being invoked by the mapper. I am just wondering what Xilinx means by "Redundant Blocks" (sic) of logic. This terminology can be seen in the section from the mapper that I included with my first post. > > > I have a lot > > of identical ROMs that I use to do parallel processing; those were > > being removed in the synthesis and translate step due to not having > > clocks on them. > I don't doubt what you say but I also don't quite understand why ROMs > would be 'removed' either. Maybe all you meant is that is that you > couldn't find specific entities in the post-map VHDL that equated to > the various 'ROMs' that you instantiated in the original code....but > that's OK, a ROM is simply an array of constants, I would expect those > to get rolled right into the logic. I can see where targetting a > particular family might have to use logic blocks instead of embedded > memory to implement what your code says (but could use embedded memory > if you chose to implement a clocked ROM) but that doesn't mean that > that the original unclocked ROM is not synthesizable at all. The explanation I received was that without a clock, they were being interpreted as asynchronous RAMs and were optimized away. Further explanation was not given to me. That was happening at the Translate step, which was the previous step to the mapping step, and is fixed. > > > So my mind is pretty much a blank as to what is > > meant by "redundant" logic, other than the common meaning that it > > is repetitive -- but it isn't really, of course, because I'm using them > > simultaneously for different data. > 'Redundant' in this context generally means that the fitter found that > you have two equations that are logically equivalent. An example... > > d <= a or b or c; > h <= e or f or g; > .... > a <= e; > b <= f; > c <= g; > > The signal 'h' is redundant since it is logically equivalent to 'd' > since, although the signals appear to be different for calculating 'h', > from a logic perspective they are identical because of the 'a<= e....' > assignments. Does the mapper really do this? Is this what Xilinx means by "Redundant Blocks" of logic at the mapping stage? Thanks again, Best regards, -JamesArticle: 108553
Here's a handy link to this whole thread, provided by Google: http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6 David Ashley wrote: > James, > > Maybe there are switches to the synthesizer that would allow > turning off the optimization? Yes, but absolutely nothing is for turning off optimization of "Redundant Blocks" (sic)* of logic; everything is for turning off removal of "Unused" logic. The mapper -u option, the "keep" constraint and the "save" constraint, are all for preventing removal of "Unused" logic, not "Redundant Blocks" (sic)* of logic. It's enough to make me tear my hair out. Anyway, as you can read from the other posts, doing that is a kludge and at best a debugging step to identify the problem area, not the real way I want to solve the problem. *See mapper report in my first post in this thread. > I would tend to agree that looking for bugs in the toolchain might > not be the best way to work through this. > > I haven't been following this thread all along, but one thing occurs > to me. I'm new to VHDL and have settled in to an approach where > I make little incremental changes, then immediately test and verify > something didn't break. That way I can go back and the source of > the problem is obvious, because there is only a little bit of code to > examine. > > In your case it's like maybe the sequence is > working, change code > working, change code > working, change code > working, change code > broken, change code <-- it broke here but you didn't discover it > broken, change code > broken, change code > broken, change code > broken <--- you're here > > It's just a theory. But I've seen this sort of thing before. The > most recent change didn't cause the problem and in fact > couldn't have caused the problem, but it's not working. > Therefore the tools must be broken. Really the problem > occured earlier... I think I may very well have to try that, building up my project piece by piece. > > Sorry to intrude... > -Dave Not at all. I'm grateful for your input. Best regards, -JamesArticle: 108554
John_H wrote: > If Sally comes into the room followed by Barbara then Sheila and finally > Carol but exiting the room is four pairs of shoes followed by four > nicely folded outfits followed by a basket of lingerie and finally four > unclad women racing after their departed belongings, is it just a > transposition? Things got very rearranged in the process. Transpose - it's a term from linear algebra, at least that's what I'm thinking of. A[i][j] becomes A[j][i] for 0 <= i < N, 0 <= j < N. It's a reflection over the 45 degree line from 0,0 to N,N. > In the example above the A values enter first followed by the B values > and so on. When they exit the rotator scheme, they exit as the zero > label values followed by the 1 label values and so on. The transpose is > a 90 degree rotation of 16x16 blocks within a 256 element grid. To get > this to run continuously with simple registers would require 384 > registers. When the resource usage can be nearly quartered, isn't it > something to consider? I don't know if we're discussing the same thing. The way your data goes from input to output is a transpose, not a rotation. I'm just compaining about the terminology. BTW I'm not making this up :). > The issue at hand was data reordering. The rotation is a simple reorder > but in a way that isn't easy to parallelize at high speeds without > throwing a huge number of resources at the problem when the information > is available in a serial fashion. I don't really follow how the circuit works. I mean, before you can output P0 you would have had to read in every single row from A to O, that's a lot of data you need to store. Perhaps on the order of a 384 element shift register? -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architectureArticle: 108555
skyworld wrote: > Hi Ray & Kolja, > thanks for your reply. The advice is very helpful, but the question is > that the code is for ASIC design and is frozen. I just migate the code > to FPGA to check its function. So do you have any suggestion on how to > setup constraints? something like DC do in ASIC design? thanks > If you are using Quartus II V6.0 Full Edition, you may want to take a look at the new TimeQuest Timing Analyzer. Assuming your ASIC had an SDC file with timing constraints (e.g. from DC Compiler), you should be able to use it if you only map the signal names (e.g. ports, pins and cells) to the Quartus names. You may want to check the following link: http://www.altera.com/support/software/quartus2/timequest/tq-spt-index.html If you don't have access to V6.0 Full Edition, you can still use the Classic Timing Analyzer and do a lot of the same constraints, only you will have to learn a different constraint format and some of the differences between the Classic Timing Analyzer and a Timing Analyzer like PrimeTime. For more help on the Classic Timing Analyzer, check: http://www.altera.com/support/software/quartus2/timing/sof-qts-timing.html Hope this helps. -David Karchmer AlteraArticle: 108556
jidan1@hotmail.com wrote: > 1) Can also the confg. dedicated pins made 5V tolerant through a serial > resistor although they are powered from 2.5V? (I calculated this an I > came to Rser=220OHM) Be cautious while working with CCLK configuration pin. Our experience: for +3.3v CMOS driver serial resistor to that pin should be no more than 100 ohm. Otherwise configuration clock CCLK doesn't work. I highly recommend to read XAPP453 "The 3.3V Configuration of Spartan-3 FPGAs": http://direct.xilinx.com/bvdocs/appnotes/xapp453.pdf It seems to me that 100 ohm required for proper "zero" level translating to CCLK pin (at least it looks so on oscilloscope). If it's true, this can lead to failure with Rser=220OHM. I recommend to use for level shifting simple gate logic, for example SN74LVC3G17.Article: 108557
David Ashley wrote: > Transpose - it's a term from linear algebra, at least that's what I'm > thinking of. A[i][j] becomes A[j][i] for 0 <= i < N, 0 <= j < N. > It's a reflection over the 45 degree line from 0,0 to N,N. My apologies. I took "just the transpose" to be along the lines of a bit swizzle. The transpose as you properly describe moves the top and bottom edges of a region to the right and left and the right and left edges to the top and bottom. Rotation does the same thing, just reflected across another access, changing the way the SRLs are arranged. As with a rotate, a "simple" transpose is resource intensive especially if the desire is to maintain the transpose or rotation. This is why the SRLs can significantly help out. <snip> >> The issue at hand was data reordering. The rotation is a simple reorder >> but in a way that isn't easy to parallelize at high speeds without >> throwing a huge number of resources at the problem when the information >> is available in a serial fashion. > > I don't really follow how the circuit works. I mean, before you can > output P0 you would have had to read in every single row from A to O, > that's a lot of data you need to store. Perhaps on the order of a > 384 element shift register? > > -Dave The rotate/transpose uses an input SRL "triangle," increasing SRL delays from the earliest bit to leave per word (0-length SRL or direct connect) to the latest bit to leave (15-length SRL). The barrel shifter transposes the input SRL outputs to an output SRL triangle. The earliest bit to leave from the first word goes directly from the input to the longest output delay (15-length SRL) so it will match up with the shortest output delay (0-length SRL or direct connect) that takes the last word's earliest bit directly; the first bit from the 16th word shows up at the same time as the first bit of the first word. The latency from the start of the 26x16 square to the start of the output is 15 clocks plus any pipeline stages (such as in the barrel shifter). When one block ejects from the mechanism, the next block loads. A "simple" transpose or rotate that maintains the pipeline would require a large number of parallel-in, serial-out shift registers which must be implemented as discrete registers. 384 registers for the selective load/global shift approach. The same mechanism takes less that 100 LUTs to accomplish the same goal with the same speed capability. SRLs are a win for a transpose or rotate where the function size is almost 1/4 of a more traditional approach. - John_HArticle: 108558
Another example is the LP2996 which we use on our development boards. John Adair Enterpoint Ltd. Jim Granville wrote: > Austin Lesea wrote: > > > Jim, > > > > DDR regulator? I must have missed this new term. > > > > Do you have an example?| > > Sure, Go to Linear or Maxim's web sites, and search for DDR regulator. > These target the Vtt terminations on DDR memory busses, and they can > source and sink current. > > -jgArticle: 108559
John_H wrote: > SRLs are a win for a transpose or rotate where the function size is > almost 1/4 of a more traditional approach. I've been seeing this "SRL" term used a lot, what does it mean? :) I just did a google search for "srl xilinx" and got some useful info, and so I created a Wikipedia page on it since one didn't seem to exist. http://en.wikipedia.org/wiki/Shift_Register_Look_Up_Table Anyone reading this feel free to expand on it. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architectureArticle: 108560
Hi everyone! I have a uncertainty concerning clock source using the Virtex 4 RocketIOs. I would like to use them in reduced latency mode "Full PCS Bypass". In the user guide it says for this mode the RXUSRCLK has to be derived internally from RXUSRCLK2 (clocking the interface to the fabric) through internal dividers. In the 4 byte interface mode I plan to use the ratio between RXUSRCLK and RXUSRCLK2 is 1:1 though. This makes me wonder, if it is possible, to source RXUSRCLK2 externally (this isn't explicitly stated anywhere in the guide). This is an absolute must for my design as there is no way to recover a clock from the incoming data signal. Maybe someone can help me clarify this problem. Cheers, MichaelArticle: 108561
Antti schrieb: > John_H schrieb: > > > Antti wrote: > > > New low cost families other than from Xilinx are known to be coming > > > this autumn (Cyclone-3, MAX3, LatticeXP2) but there is no advance info > > > an Spartan-4 yet, is there a hope at all that there will be modern low > > > cost family from Xilinx too? > > > > > > Spartan-3 is 'not for new designs' as there is no price roadmap for it, > > > Spartan-3E only has small members, eg not replacement for S3 > > > > > > so we have vacuum in the place of Spartan-4 ! > > > > > > I wonder if that vacuum will be filled with Cyclone-3 or is Spartan-4 > > > coming this autumn? > > > > > > Antti > > > > I don't know about positioning but > > > > http://tinyurl.com/fvup6 > > > > (or > > http://www.xilinx.com/xlnx/xil_ans_display.jsp?iLanguageID=1&iCountryID=1&getPagePath=23856) > > Thanks John, > its only visible from 8.2 SP2 > > but here is quick link to Spartan3A libraries guide :) > http://toolbox.xilinx.com/docsan/xilinx82/books/docs/s3adl/s3adl.pdf > > it looks like the ICAP and therefore self-reconfiguration is added to > Spartan-3A, but it still lacks the SPI or NOR flash configuration that > is available on S3e. > > those the Spartan3A may be the replacement part for Spartan-3 meaning > that Spartan-4 is possible even further away from being available, I > was > expecting Spartan-4 announcement of prelim info in 6 months from now, > but guess we have to wait more. > > actually no, the Spartan3e is partial downgrade from s3e ? > only xc3s50a, xc3s200a, xc3s400a, xc3s700a, xc3s1400a > so no large Spartan3a devices either :( > > is the Spartan-3 family really the last big low cost Xilinx FPGA ?? > just a small correction: Spartan-3A do support SPI and NOR flash configuration modes, and there are some power saving features added: SUSPEND/AWAKE pins! I wonder why is the Spartan-3A suppport already in the ISE when there is no public information available, specially the power saving features could be interesting! AnttiArticle: 108562
In the company I'm working for we're using DSPs and FPGAs to develop motion controllers. Since we're doing all the position, speed, torque and current control stuff in software, we're in need of powerful devices. We're considering to get rid of the DSP and change to a SoC design in future. Therefore I would be interested if anyone can suggest an affordable SoC development board with an ambedded (digital signal) processor to evaluate the possiblities of SoC. I'm especially interested in Altera boards as we're already using them in our old designs. What FPGA should we choose? Cyclone, Cyclone II, Stratix? What embedded processor fits best to motion control needs? NIOS, ARM, PowerPC? I'm looking forward to your suggestions. TIA. Markus -- Markus Fuchs - http://www.yeahware.comArticle: 108563
John Williams schrieb: > Antti wrote: > > > a binary demo for Linux isnt much interesting or useful - everybody is > > waiting when does PetaLogix finally release the PetaLinux, but so far > > there has been to release date information announced by PetaLogix? Can > > we assume that the actual PetaLinux release date is coming closer also, > > or is PetaLogix still holding back information about possible release > > date? > > Antti, a binary demo may not be useful to you, but it's obviously of some value > to the people who've been asking for it, and the numerous people who've > downloaded it in the just last 12 hours. > > One reason binary demos are valuable is for people who want a quick evaluation, > a proof of concept, or even just be able to show their boss that indeed Linux on > an FPGA works, and might make sense for their project. They don't want to take > the time to learn how to build it themselves, they just want the "5 minute > demo", and that's what this is about. > > One of the goals of PetaLinux, of course, is to enable people to build the > 5-minute demo themselves, as well as provide an environment for major FPGA-based > embedded Linux development. This is a lofty goal, which is why it's taking a > while to get it developed and documented to a state where we are happy to > release it. > > > I see the 'binary demo' is still based on EDK 8.1 tools - so it can not > > fully support the MicroBlaze version 5, I wonder why hasnt PetaLogix > > used EDK 8.2 tools? To what I know PetaLogix had early access to EDK > > 8.2 (and GNU code?) and those would have been in the position to use > > the latest GCC toolchain. > > Yes, we do have early access to the 8.2 tools, and indeed the current demo is > based on 8.1. Why? Because as a small organisation operating out of a > university research group we have limited resources which we must manage > carefully. > > We also have paying clients who expect us to deliver what we have promised. If > this means that PetaLinux, and other nice-to-have features that we will be > giving aware free to the community (including you) must sometimes take a > back-seat, then I can make no apologies for that fact. > > Regards, > > John Hi John, ok, I see while you are busy with paying clients you have no longer interest to work on mb-uclinux improvement. Good point. That is possible the reason why Xilinx did choose lynuxworks to deliver the microblaze 2.6.x port. For those who want to use GPL tools to compile MicroBlaze 5.0 applications on WinXP platform here is the cygwin compiled MicroBlaze toolchain from EDK 8.2 release. http://www.xilant.com/downloads/mb_gnu_8_2.zip I have only tested it to succesfully compile MicroBlaze u-boot, so the toolchain is working at least. Antti LukatsArticle: 108564
<james7uw@yahoo.ca> wrote in message news:1158116145.025583.289450@h48g2000cwc.googlegroups.com... >> >> > I have a lot >> > of identical ROMs that I use to do parallel processing; those were >> > being removed in the synthesis and translate step due to not having >> > clocks on them. >> I don't doubt what you say but I also don't quite understand why ROMs >> would be 'removed' either. Maybe all you meant is that is that you >> couldn't find specific entities in the post-map VHDL that equated to >> the various 'ROMs' that you instantiated in the original code....but >> that's OK, a ROM is simply an array of constants, I would expect those >> to get rolled right into the logic. I can see where targetting a >> particular family might have to use logic blocks instead of embedded >> memory to implement what your code says (but could use embedded memory >> if you chose to implement a clocked ROM) but that doesn't mean that >> that the original unclocked ROM is not synthesizable at all. > > The explanation I received was that without a clock, they > were being interpreted as asynchronous RAMs and were > optimized away. Well whatever is 'optomizing' them away has a bug in it then if the output is now 'different' because of that optomization. Like I said, an asynch ROM is simply a table of constants. Synthesis tools are very good at optomizing constants (as they should be). It wouldn't surprise me at all that... - You wouldn't be able to 'find' the ROM after mapping to a particular part because the result of those constants has been integrated into whatever downstream logic that the ROM was feeding. - That the implementation might (probably) use more logic resources and none of the internal memory if the targetted part requires a clock in order to be able to map it into one of those internal memories. In any case, the overall function has not changed it should simulate the same. If not, then a simple test case and a service request to Xilinx might be in order. > Further explanation was not given to me. > That was happening at the Translate step, > which was the previous step to the mapping step, and is fixed. Not sure I would call it 'fixed' (unless what was 'broken' was just the ability to use internal memory which as mentioned above is not really a functional issue but one of trying to properly use internal resources to implement a given function). Any way, moving on. >> >> 'Redundant' in this context generally means that the fitter found that >> you have two equations that are logically equivalent. An example... >> >> d <= a or b or c; >> h <= e or f or g; >> .... >> a <= e; >> b <= f; >> c <= g; >> >> The signal 'h' is redundant since it is logically equivalent to 'd' >> since, although the signals appear to be different for calculating 'h', >> from a logic perspective they are identical because of the 'a<= e....' >> assignments. > > Does the mapper really do this? Yes as it should. Remember, 'logic' does care about propogation delays and from the standpoint of transforming the source code into an implementation these things can legally be combined as redundant since they (in this case 'd' and 'h') perform exactly the same function. You wouldn't be able to tell from the outside which is 'd' and which is 'h' by wiggling the inputs 'e', 'f' or 'g'. Another simple example is x <= not(y0); y0 <= not(y1); y1 <= not(y2) y2 <= not(y); which is equivalent to x <= not(not(not(not(y)))); which is equivalent to x<= y; which when implemented in an FPGA would not even use a single logic resource. Whatever logic in the original source that needed 'x' or 'y' would get the same signal > Is this what Xilinx means by > "Redundant Blocks" of logic at the mapping stage? I believe so, but haven't had the need to dig any deeper.Article: 108565
David Ashley wrote: > John_H wrote: > > SRLs are a win for a transpose or rotate where the function size is > > almost 1/4 of a more traditional approach. > > I've been seeing this "SRL" term used a lot, what does it mean? > > :) I just did a google search for "srl xilinx" and got some useful > info, and so I created a Wikipedia page on it since one didn't seem > to exist. > > http://en.wikipedia.org/wiki/Shift_Register_Look_Up_Table > > Anyone reading this feel free to expand on it. > > -Dave How do you get the wikipedia corrected? I clicked the link on the SRL page to the FPGA page and then on through to the partial re-configuration page at... http://en.wikipedia.org/wiki/Partial_re-configuration This page says "In current versions of software, Xilinx supports partial reconfiguration on Spartan 3...". I am pretty certain that this is not supported in Spartan 3. I have requested that this be supported in Spartan 3 since they came out and I still have not seen it appear. Am I wrong, or is the wikipedia wrong? From removethisthenleavejea@replacewithcompanyname.co.uk Wed Sep 13 03:18:19 2006 Path: newssvr29.news.prodigy.net!newsdbm05.news.prodigy.com!newsdst01.news.prodigy.net!prodigy.com!newscon04.news.prodigy.net!newsfeed.telusplanet.net!newsfeed.telus.net!news-east.rr.com!news.rr.com!newscon02.news.prodigy.net!prodigy.net!news.glorb.com!solnet.ch!solnet.ch!news.clara.net!wagner.news.clara.net!monkeydust.news.clara.net!demeter.uk.clara.net From: "John Adair" <removethisthenleavejea@replacewithcompanyname.co.uk> Newsgroups: comp.arch.fpga References: <ee8ji4$ctj$02$1@news.t-online.com> Subject: Re: SoC Development Board Date: Wed, 13 Sep 2006 11:18:19 +0100 Lines: 39 X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2900.2869 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2962 X-RFC2646: Format=Flowed; Response X-Complaints-To: abuse@clara.net (please include full headers) X-Trace: 1233905003d8130921e523a037c23143221803056e4345e7617284164507daef NNTP-Posting-Date: Wed, 13 Sep 2006 11:18:23 +0100 Message-Id: <1158142703.3706.0@demeter.uk.clara.net> Xref: prodigy.net comp.arch.fpga:119459 Markus Not Altera, as yet, but we have a range of Xilinx based boards with a high degree of flexibility and ability to add custom things. Have look here http://www.enterpoint.co.uk/boardproducts.html and see if anything there takes your interest. Commercial bit - If you don't find what you need we specialise in doing derivatives quickly too. The best example, in the public domain of what we do in fast turn, is our MINI-CAN board that had a design cycle, manufacture, 5 days of bench test and boards delivered to the customer in 18 calendar days. John Adair Enterpoint Ltd. - Home of Broaddown2 with added Virtex-4 Solution. http://www.enterpoint.co.uk "Markus Fuchs" <markus@yeahware.com> wrote in message news:ee8ji4$ctj$02$1@news.t-online.com... > In the company I'm working for we're using DSPs and FPGAs to develop > motion controllers. Since we're doing all the position, speed, torque and > current control stuff in software, we're in need of powerful devices. > > We're considering to get rid of the DSP and change to a SoC design in > future. Therefore I would be interested if anyone can suggest an > affordable SoC development board with an ambedded (digital signal) > processor to evaluate the possiblities of SoC. I'm especially interested > in Altera boards as we're already using them in our old designs. What FPGA > should we choose? Cyclone, Cyclone II, Stratix? What embedded processor > fits best to motion control needs? NIOS, ARM, PowerPC? > > I'm looking forward to your suggestions. TIA. > > Markus > > -- > Markus Fuchs - http://www.yeahware.com >Article: 108566
Hi, after Developing connection my custom peripheral to the OPB-Bus, I tried to download my Design to the FPGA. The custom Design including the Bus needs 87% of my FPGA (Virtex2Pro30 896-7). The device summary is as follows: =========================================================== Device utilization summary: --------------------------- Selected Device : 2vp30ff896-7 Number of Slices: 12131 out of 13696 88% Number of Slice Flip Flops: 14472 out of 27392 52% Number of 4 input LUTs: 15191 out of 27392 55% Number of IOs: 109 Number of bonded IOBs: 96 out of 556 17% Number of BRAMs: 78 out of 136 57% Number of MULT18X18s: 136 out of 136 100% Number of GCLKs: 1 out of 16 6% =========================================================== In the next step, I tried to download the design. In my first try, the system consisted of the following parts: In my last try, of consists of the following parts: http://www.student.uni-oldenburg.de/peter.kampmann/deuI.jpg http://www.student.uni-oldenburg.de/peter.kampmann/report_DEUI.pdf the Synthesis of the design aborts with the following message: ERROR:Place:665 - The design has 106 block-RAM components of which 4 block-RAM components require the adjacent multiplier site to remain empty. This is because certain input pins of adjacent block-RAM and multiplier sites share routing ressources. In addition, the design has 136 multiplier components. Therefore, the design would require a total of 140 multiplier sites on the device. The current device has only 136 multiplier sites. After that, I removed the RS232 from the design and tried again. Finally, I moved all components from the PLB to the OPB Bus, where possible, that gives: http://www.student.uni-oldenburg.de/peter.kampmann/deuIII.jpg http://www.student.uni-oldenburg.de/peter.kampmann/report_DEU.pdf And the following error: ERROR:Place:665 - The design has 84 block-RAM components of which 2 block-RAM components require the adjacent multiplier site to remain empty. This is because certain input pins of adjacent block-RAM and multiplier sites share routing ressources. In addition, the design has 136 multiplier components. Therefore, the design would require a total of 138 multiplier sites on the device. The current device has only 136 multiplier sites. Has anybody experienced the same problem? Does anyone have a solution for that, without building a smaller design? The FPGA has 136 multipliers and 136 Block RAMs, does that mean you cannot use all multipliers when you design a complete system with PowerPCs etc?Article: 108567
Peter Kampmann schrieb: > Hi, > > after Developing connection my custom peripheral to the OPB-Bus, I > tried to download my Design to the FPGA. > The custom Design including the Bus needs 87% of my FPGA (Virtex2Pro30 > 896-7). > The device summary is as follows: > Has anybody experienced the same problem? Does anyone have a solution > for that, without building a smaller design? The FPGA has 136 > multipliers and 136 Block RAMs, does that mean you cannot use all > multipliers when you design a complete system with PowerPCs etc? if your design uses 100% of the multipliers and some other ip requires the BRAM placement that requires the multiplier being empty then, it want fit. maybe there is a way to relax the placement with some trick, try to create a design where you are using 0 BRAMs and 100% multipliers, see if that gets mapped without problems. I think if you are not using OCM brams the PPC design should not require and BRAMs at all so all multipliers should be useable, of course if that is not the case and the use of PPC instantly reduces the amount of useable multipliers then this should be documented by Xilinx somehow AnttiArticle: 108568
Markus Fuchs wrote: > In the company I'm working for we're using DSPs and FPGAs to develop motion > controllers. Since we're doing all the position, speed, torque and current > control stuff in software, we're in need of powerful devices. > > We're considering to get rid of the DSP and change to a SoC design in > future. Therefore I would be interested if anyone can suggest an affordable > SoC development board with an ambedded (digital signal) processor to > evaluate the possiblities of SoC. I'm especially interested in Altera boards > as we're already using them in our old designs. What FPGA should we choose? > Cyclone, Cyclone II, Stratix? What embedded processor fits best to motion > control needs? NIOS, ARM, PowerPC? > > I'm looking forward to your suggestions. TIA. I think you are asking about moving the DSP into the FPGA - but we have no info on important details like: ** Code size and Data Size of present DSP ** Speed and Numeric ability of present DSP ** Does that DSP have FLASH or RAM ** Does this have instant-on, or Watchdog requirements ** Do you expect to execute code from FPGA BRAM, or Off Chip SRAM, or from Serial FLASH ? ** ADCs and other non digital peripherals included on present DSP ** Mix of core motion Sw, vs Human interface code ? ** PCB layer count you expect ** Design life time of the result 32 bit uC are now quite widespread and cheap, so a possible split of design resource would be to pull the DSP-motion code into the FPGA, (where it runs in BRAM) and move the Human and Interface code into a Flash uC -jgArticle: 108569
karunesh.ind@gmail.com wrote: > i am preparing for intervew and i want the answer how Barrel shifter > can be used to optimize of C code at ARM processor. > > i have learnt that Barrel shifter is :A digital circuit that can shift > a data word by any number of bits in a single cycle. It is implemented > as a sequence of multiplexors: the output of one MUX is connected to > the input of the next MUX in a way that depends on the shift distance. > The number of multiplexors required is log2(n), where n is the > computer's register size. > There are many uses: The most basic application could be merging of multiple data bit fields into one word. For example, we need to put value A in upper half word of a register and value B is lower half word, and if we know that both values are less then 16 bits and unsigned, we can write C = (A << 16) + B; It could then compiled into ADD Rc, Rb, Ra, LSL #16 Only one instruction is needed for the add and the shift. The second common usage is when accessing array. For example, A is set to array base address, B is array index, and the array elements are word. When reading an element from the array, we can then use LDR Rc, [Ra, Rb, LSL #2] Only one instruction is needed for the memory read, addition for address and the shift by 2 (word size). For question about ARM processors, the newsgroups - comp.sys.arm - comp.arch.embedded are more suitable. JosephArticle: 108570
On 12 Sep 2006 08:34:10 -0700, james7uw@yahoo.ca wrote: >> What you need to do is to simulate the post-map VHDL file and trace it back >> to why output signal 'x' at time t is set to '0' but when you use your >> original code it is '1'. Use the sim results from using your original code >> as your guide for what 'should' happen and the post-map VHDL simulation for >> what is actually happen and debug the problem. > >I agree that finding out what is going on is the best >approach. Do you have any debugging tips other than comparing >the simulation results in detail and seeing what logic calculations >must be getting removed? One tip: instantiate both behavioural and post-map modules in your testbench, and run them in parallel. You can assert on differences in the outputs, and trace internal signals in the wave window (to the extent that you can still recognize internal signals). Possibly also set breakpoints on differences in internal signals which ought to be the same. - BrianArticle: 108571
On 12 Sep 2006 07:45:19 -0700, "kits59@gmail.com" <kits59@gmail.com> wrote: > >Brian Drummond wrote: >> On 11 Sep 2006 12:58:57 -0700, "kits59@gmail.com" <kits59@gmail.com> >> wrote: >> >> Something else must be the problem : check clocks, resets, is your bRAM >> mapped to cover the boot address (FFFFFFFC for the PPC405), there are no >> "Warning: unbound component" messages when ModelSim loads the design >> etc? >> >> - Brian > >The funny thing that I should probably state is that the entire system >was simulating perfectly fine under EDK 7.2. The bRAM is mapped >correctly so that the starting values are where they expect them to be. Ah. I haven't tried porting a design to 8.x from 7.x yet, but had bad experiences running 6.x projects under 7.1. Did you use the "import earlier version" tool when moving to 8.1? Did it ask you if you wanted to upgrade any cores, or report that it had to, because some of the earlier cores were no longer supported? Sometimes the updated cores are incompatible with the earlier ones, and the innocuous "update" breaks the design in hard-to-find ways. I confess I never DID get to the bottom of one such "upgrade" on a demonstration app, I didn't find out exactly how the upgrade had broken it - it was easier to simply ask the vendor to supply a 7.1 version! If I HAD to fix it, I'd try using the original cores, or (if not possible) I'd check the logs of the first build in 8.1, and check all port connections, and see if the register definitions (and address maps) had changed between versions... I wonder if Xilinx have fixed the "import from earlier versions" problems with EDK 8.1? Anyone have experience of this transition? - BrianArticle: 108572
I have a memec mini-module board. I've loaded and run the reference design and I am able to boot all the way into the Linux prompt. However, my design doesn't need ethernet so I deleted it. Now it doesn't even run the code in the initial BRAM successfully. I've traced the code and it seems to end up at the _exit crt function. (BTW, is there a way to get the c symbols in XMD?) Any ideas why removing the ethernet would clobber the system to the point that even the uart doesn't work? Thanks, ClarkArticle: 108573
David Ashley wrote: > John_H wrote: > > I don't really follow how the circuit works. I mean, before you can > output P0 you would have had to read in every single row from A to O, > that's a lot of data you need to store. Perhaps on the order of a > 384 element shift register? > > -Dave > Yes, but since the input is serial and we only take one output at a time, the SRL 16s let us collapse the shift register into LUT resources giving a 16:1 savings. Since the data is input in row raster form, it can naturally be done by shifting each row into a series of SRL16s. Then the read out is down columns, so you read one sample out of each Row's shift register, advancing the shift register after each read.Article: 108574
On 13 Sep 2006 03:29:29 -0700, "Peter Kampmann" <peter.kampmann@googlemail.com> wrote: >Hi, >ERROR:Place:665 - The design has 106 block-RAM components of which 4 >block-RAM components require the adjacent > multiplier site to remain empty. This is because certain input pins >of adjacent block-RAM and multiplier sites share > routing ressources. The hint is here... only 4 blockRAMs require the multiplier site empty. >Has anybody experienced the same problem? Does anyone have a solution >for that, without building a smaller design? The FPGA has 136 >multipliers and 136 Block RAMs, does that mean you cannot use all >multipliers when you design a complete system with PowerPCs etc? I have seen this for Spartan-3, and didn't realise it also applied to V2Pro. There are _some_ shared connections between a BRAm and a multiplier. Not all uses of the BRAM require those shared connections; indeed, most don't. Specifically, using both ports at the fullest width (32 or 36 bits per port) is a problem. If you can identify a way to use deeper but narrower BRAM blocks (only 18 bits wide for example) for four of your BRAMs, you are OK. If not, I notice your LUT and FF usage are both below 60%. Therefore, if you can move 4 of your multipliers into LUT fabric, you are also OK. (For example, if you are using a few of the 18*18 mults as 8*8 mults, these are an ideal candidate) ATTRIBUTE mult_style : STRING; ATTRIBUTE mult_style of <mylabel_1> is "block"; ATTRIBUTE mult_style of <mylabel_2> is "lut"; might be useful... - Brian
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z