Messages from 108550

Article: 108550
Subject: Re: xilinx bram instantation template in vhdl?
From: "rickman" <gnuarm@gmail.com>
Date: 12 Sep 2006 19:31:42 -0700
Links: << >> << T >> << A >>

Ben Jackson wrote:
> On 2006-09-11, David Ashley <dash@nowhere.net.dont.email.me> wrote:
> > I want to create a 1152 by 6 bit rom and I want to use
> > a bram. It can be clocked or not clocked, but I'd prefer
> > not clocked. Can someone point me to a template?
>
> There's a whole PDF of them called "xst.pdf" which you can google.

I was not aware of this document.  It looks very useful, but it seems
to be a bit out of date.  It does not mention a number of newer
families including Spartan 3.  I guess the information applies as
appropriate depending on the feature.

Will Xilinx be updating this document anytime soon?

Article: 108551
Subject: Re: fastest FPGA
From: John_H <newsgroup@johnhandwork.com>
Date: Wed, 13 Sep 2006 02:32:56 GMT
Links: << >> << T >> << A >>

David Ashley wrote:
> John_H wrote:
>> A 2-D example using fixed length SRLs that comes to my mind is a 90 degree 
>> pixel rotation.
>>
>> If you have a 16x16 array of vectors that come in in the order
>>
>> A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 Aa Ab Ac Ad Ae Af
>> B0 B1 B2 B3 B4 ...
>> C0 C1 ...
>>   .
>>   .
>>   .
>> P0 P1 P2 P3 ...
>>
>> And want to send them back out rotated 90 degrees so the order is
>>
>> A0 B0 C0 D0 E0 F0 G0 H0 I0 J0 K0 L0 M0 N0 O0 P0
>> A1 B1 C1 D1 E1 ...
>> A2 B2 ...
> 
> Just a nitpick but wouldn't this be a transpose? You'd need to
> invert in X or Y to get a 90 degree rotation.
> 
> -Dave

If Sally comes into the room followed by Barbara then Sheila and finally 
Carol but exiting the room is four pairs of shoes followed by four 
nicely folded outfits followed by a basket of lingerie and finally four 
unclad women racing after their departed belongings, is it just a 
transposition?  Things got very rearranged in the process.

In the example above the A values enter first followed by the B values 
and so on.  When they exit the rotator scheme, they exit as the zero 
label values followed by the 1 label values and so on.  The transpose is 
a 90 degree rotation of 16x16 blocks within a 256 element grid.  To get 
this to run continuously with simple registers would require 384 
registers.  When the resource usage can be nearly quartered, isn't it 
something to consider?

The issue at hand was data reordering.  The rotation is a simple reorder 
but in a way that isn't easy to parallelize at high speeds without 
throwing a huge number of resources at the problem when the information 
is available in a serial fashion.

Article: 108552
Subject: Re: Xilinx ISE ver 8.2.02i is optimizing away and removing "redundant" logic - help!
From: james7uw@yahoo.ca
Date: 12 Sep 2006 19:55:45 -0700
Links: << >> << T >> << A >>

Here's a handy link to this whole thread, provided by Google:
http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6

KJ wrote:
> james..@yahoo.ca wrote:
> > The "redundant" and "unused" logic terms I am copying from the mapper
> > report and Xilinx documentation. The mapper report (see my
> > first post) says "redundant" logic is being removed, not "unused
> > logic".
> >From my reading of the Xilinx manuals I understand that "unused logic"
> > means logic that is not connected to anything, so it can be removed
> > (this latter is not what is happening to me).
> > However, I haven't found anything in the manuals that explains what
> > "redundant logic" is or how to write the code to avoid it.
> A simple example of the 'redundant' logic that I was asking about is
> something that one might decide to put in to avoid race conditions is
> the following code which implements a transparent latch (By the way, do
> not implement this in real code in an FPGA).
> Q <= (en and D)          -- #1
>     or (not(en) and Q)    -- #2
>     or (en and Q);          --#3
>
> The point is that #3 is a redundant logic term and any synthesis tool
> will be able to recognize this and remove it.  If you remember how to
> do Karnaugh maps this example is also easy enough to see it for
> yourself.  If you don't know about Karnaugh maps just take my word on
> it that #1 and #2 are 'logically' all you need.  Term #3 is something
> that you would need to put into any actual implementation because using
> only #1 and #2, although they are logically complete have a race
> condition when 'en' is switching.

Yes, I'm familiar with Karnaugh maps and I understand the point.
Remember, I am past synthesis and my problem is in the mapper,
going from .NGD (Native Generic Database) to .NCD (Native Circuit
Description) files. Does this redundant logic removal process you
just described happen at this stage? Remember, the "Redundant"
terminology is Xilinx's, not mine, and it is being invoked by the
mapper. I am just wondering what Xilinx means by "Redundant
Blocks" (sic) of logic. This terminology can be seen in the section
from the mapper that I included with my first post.

>
> > I have a lot
> > of identical ROMs that I use to do parallel processing; those were
> > being removed in the synthesis and translate step due to not having
> > clocks on them.
> I don't doubt what you say but I also don't quite understand why ROMs
> would be 'removed' either.  Maybe all you meant is that is that you
> couldn't find specific entities in the post-map VHDL that equated to
> the various 'ROMs' that you instantiated in the original code....but
> that's OK, a ROM is simply an array of constants, I would expect those
> to get rolled right into the logic.  I can see where targetting a
> particular family might have to use logic blocks instead of embedded
> memory to implement what your code says (but could use embedded memory
> if you chose to implement a clocked ROM) but that doesn't mean that
> that the original unclocked ROM is not synthesizable at all.

The explanation I received was that without a clock, they
were being interpreted as asynchronous RAMs and were
optimized away. Further explanation was not given to me.
That was happening at the Translate step,
which was the previous step to the mapping step, and is fixed.

>
> > So my mind is pretty much a blank as to what is
> > meant by "redundant" logic, other than the common meaning that it
> > is repetitive -- but it isn't really, of course, because I'm using them
> > simultaneously for different data.
> 'Redundant' in this context generally means that the fitter found that
> you have two equations that are logically equivalent.  An example...
>
> d <= a or b or c;
> h <= e or f or g;
> ....
> a <= e;
> b <= f;
> c <= g;
>
> The signal 'h' is redundant since it is logically equivalent to 'd'
> since, although the signals appear to be different for calculating 'h',
> from a logic perspective they are identical because of the 'a<= e....'
> assignments.

Does the mapper really do this? Is this what Xilinx means by
"Redundant Blocks" of logic at the mapping stage?

Thanks again,

Best regards,
-James

Article: 108553
Subject: Re: Xilinx ISE ver 8.2.02i is optimizing away and removing "redundant" logic - help!
From: james7uw@yahoo.ca
Date: 12 Sep 2006 20:03:05 -0700
Links: << >> << T >> << A >>

Here's a handy link to this whole thread, provided by Google:
http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6

David Ashley wrote:
> James,
>
> Maybe there are switches to the synthesizer that would allow
> turning off the optimization?

Yes, but absolutely nothing is for turning off optimization of
"Redundant Blocks" (sic)* of logic; everything is for turning off
removal of "Unused" logic. The mapper -u option, the "keep" constraint
and the "save" constraint, are all for preventing removal of "Unused"
logic, not "Redundant Blocks" (sic)* of logic. It's enough to make me
tear my hair out. Anyway, as you can read from the other posts,
doing that is a kludge and at best a debugging step to identify
the problem area, not the real way I want to solve the problem.

*See mapper report in my first post in this thread.

> I would tend to agree that looking for bugs in the toolchain might
> not be the best way to work through this.
>
> I haven't been following this thread all along, but one thing occurs
> to me. I'm new to VHDL and have settled in to an approach where
> I make little incremental changes, then immediately test and verify
> something didn't break. That way I can go back and the source of
> the problem is obvious, because there is only a little bit of code to
> examine.
>
> In your case it's like maybe the sequence is
> working, change code
> working, change code
> working, change code
> working, change code
> broken, change code  <-- it broke here but you didn't discover it
> broken, change code
> broken, change code
> broken, change code
> broken   <--- you're here
>
> It's just a theory. But I've seen this sort of thing before. The
> most recent change didn't cause the problem and in fact
> couldn't have caused the problem, but it's not working.
> Therefore the tools must be broken. Really the problem
> occured earlier...

I think I may very well have to try that, building up my
project piece by piece.

>
> Sorry to intrude...
> -Dave

Not at all. I'm grateful for your input.

Best regards,
-James

Article: 108554
Subject: Re: fastest FPGA
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: Tue, 12 Sep 2006 20:29:35 -0700
Links: << >> << T >> << A >>

John_H wrote:
> If Sally comes into the room followed by Barbara then Sheila and finally
> Carol but exiting the room is four pairs of shoes followed by four
> nicely folded outfits followed by a basket of lingerie and finally four
> unclad women racing after their departed belongings, is it just a
> transposition?  Things got very rearranged in the process.

Transpose - it's a term from linear algebra, at least that's what I'm
thinking of. A[i][j] becomes A[j][i] for 0 <= i < N, 0 <= j < N.
It's a reflection over the 45 degree line from 0,0 to N,N.

> In the example above the A values enter first followed by the B values
> and so on.  When they exit the rotator scheme, they exit as the zero
> label values followed by the 1 label values and so on.  The transpose is
> a 90 degree rotation of 16x16 blocks within a 256 element grid.  To get
> this to run continuously with simple registers would require 384
> registers.  When the resource usage can be nearly quartered, isn't it
> something to consider?

I don't know if we're discussing the same thing. The way your data
goes from input to output is a transpose, not a rotation. I'm just
compaining about the terminology. BTW I'm not making this up :).

> The issue at hand was data reordering.  The rotation is a simple reorder
> but in a way that isn't easy to parallelize at high speeds without
> throwing a huge number of resources at the problem when the information
> is available in a serial fashion.

I don't really follow how the circuit works. I mean, before you can
output P0 you would have had to read in every single row from A to O,
that's a lot of data you need to store. Perhaps on the order of a
384 element shift register?

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Article: 108555
Subject: Re: FPGA timing
From: dkarchmer@gmail.com
Date: 12 Sep 2006 20:46:06 -0700
Links: << >> << T >> << A >>

skyworld wrote:
> Hi Ray & Kolja,
> thanks for your reply. The advice is very helpful, but the question is
> that the code is for ASIC design and is frozen. I just migate the code
> to FPGA to check its function. So do you have any suggestion on how to
> setup constraints? something like DC do in ASIC design? thanks
>

If you are using Quartus II V6.0 Full Edition, you may want to take a
look at the new TimeQuest Timing Analyzer. Assuming your ASIC had an
SDC file with timing constraints (e.g. from DC Compiler), you should be
able to use it if you only map the signal names (e.g. ports, pins and
cells) to the Quartus names. You may want to check the following link:

http://www.altera.com/support/software/quartus2/timequest/tq-spt-index.html

If you don't have access to V6.0 Full Edition, you can still use the
Classic Timing Analyzer and do a lot of the same constraints, only you
will have to learn a different constraint format and some of the
differences between the Classic Timing Analyzer and a Timing Analyzer
like PrimeTime. For more help on the Classic Timing Analyzer, check:

http://www.altera.com/support/software/quartus2/timing/sof-qts-timing.html

Hope this helps.

-David Karchmer
 Altera

Article: 108556
Subject: Re: Spartan-3: 5V -> 2.5V level shifting
From: "Serebr" <serebr77@gmail.com>
Date: 12 Sep 2006 21:08:18 -0700
Links: << >> << T >> << A >>

jidan1@hotmail.com wrote:
> 1) Can also the confg. dedicated pins made 5V tolerant through a serial
> resistor although they are powered from 2.5V? (I calculated this an I
> came to Rser=220OHM)

Be cautious while working with CCLK configuration pin. Our experience:
for +3.3v CMOS driver serial resistor to that pin should be no more
than 100 ohm. Otherwise configuration clock CCLK doesn't work.
I highly recommend to read XAPP453 "The 3.3V Configuration of Spartan-3
FPGAs": http://direct.xilinx.com/bvdocs/appnotes/xapp453.pdf

It seems to me that 100 ohm required for proper "zero" level
translating to CCLK pin (at least it looks so on oscilloscope). If it's
true, this can lead to failure with Rser=220OHM.
I recommend to use for level shifting simple gate logic, for example
SN74LVC3G17.

Article: 108557
Subject: Re: fastest FPGA
From: John_H <newsgroup@johnhandwork.com>
Date: Wed, 13 Sep 2006 04:54:36 GMT
Links: << >> << T >> << A >>

David Ashley wrote:
> Transpose - it's a term from linear algebra, at least that's what I'm
> thinking of. A[i][j] becomes A[j][i] for 0 <= i < N, 0 <= j < N.
> It's a reflection over the 45 degree line from 0,0 to N,N.

My apologies.  I took "just the transpose" to be along the lines of a 
bit swizzle.  The transpose as you properly describe moves the top and 
bottom edges of a region to the right and left and the right and left 
edges to the top and bottom.  Rotation does the same thing, just 
reflected across another access, changing the way the SRLs are arranged.

As with a rotate, a "simple" transpose is resource intensive especially 
if the desire is to maintain the transpose or rotation.  This is why the 
SRLs can significantly help out.

<snip>

>> The issue at hand was data reordering.  The rotation is a simple reorder
>> but in a way that isn't easy to parallelize at high speeds without
>> throwing a huge number of resources at the problem when the information
>> is available in a serial fashion.
> 
> I don't really follow how the circuit works. I mean, before you can
> output P0 you would have had to read in every single row from A to O,
> that's a lot of data you need to store. Perhaps on the order of a
> 384 element shift register?
> 
> -Dave

The rotate/transpose uses an input SRL "triangle," increasing SRL delays 
from the earliest bit to leave per word (0-length SRL or direct connect) 
to the latest bit to leave (15-length SRL).  The barrel shifter 
transposes the input SRL outputs to an output SRL triangle.  The 
earliest bit to leave from the first word goes directly from the input 
to the longest output delay (15-length SRL) so it will match up with the 
shortest output delay (0-length SRL or direct connect) that takes the 
last word's earliest bit directly; the first bit from the 16th word 
shows up at the same time as the first bit of the first word.  The 
latency from the start of the 26x16 square to the start of the output is 
15 clocks plus any pipeline stages (such as in the barrel shifter). 
When one block ejects from the mechanism, the next block loads.

A "simple" transpose or rotate that maintains the pipeline would require 
a large number of parallel-in, serial-out shift registers which must be 
implemented as discrete registers.  384 registers for the selective 
load/global shift approach.

The same mechanism takes less that 100 LUTs to accomplish the same goal 
with the same speed capability.

SRLs are a win for a transpose or rotate where the function size is 
almost 1/4 of a more traditional approach.

- John_H

Article: 108558
Subject: Re: Spartan-3: 5V -> 2.5V level shifting
From: "John Adair" <g1@enterpoint.co.uk>
Date: 12 Sep 2006 23:12:17 -0700
Links: << >> << T >> << A >>

Another example is the LP2996 which we use on our development boards.

John Adair
Enterpoint Ltd.

Jim Granville wrote:
> Austin Lesea wrote:
>
> > Jim,
> >
> > DDR regulator?  I must have missed this new term.
> >
> > Do you have an example?|
>
> Sure, Go to Linear or Maxim's web sites, and search for DDR regulator.
> These target the Vtt terminations on DDR memory busses, and they can
> source and sink current.
> 
> -jg

Article: 108559
Subject: Re: fastest FPGA
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: Tue, 12 Sep 2006 23:33:05 -0700
Links: << >> << T >> << A >>

John_H wrote:
> SRLs are a win for a transpose or rotate where the function size is
> almost 1/4 of a more traditional approach.

I've been seeing this "SRL" term used a lot, what does it mean?

:) I just did a google search for "srl xilinx" and got some useful
info, and so I created a Wikipedia page on it since one didn't seem
to exist.

http://en.wikipedia.org/wiki/Shift_Register_Look_Up_Table

Anyone reading this feel free to expand on it.

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Article: 108560
Subject: Clock Source in Low Latency Mode RocketIO
From: "MNiegl" <Michael.Niegl@cern.ch>
Date: 13 Sep 2006 00:46:48 -0700
Links: << >> << T >> << A >>

Hi everyone!

I have a uncertainty concerning clock source using the Virtex 4
RocketIOs. I would like to use them in reduced latency mode "Full PCS
Bypass". In the user guide it says for this mode the RXUSRCLK has to be
derived internally from RXUSRCLK2 (clocking the interface to the
fabric) through internal dividers. In the 4 byte interface mode I plan
to use the ratio between RXUSRCLK and RXUSRCLK2 is 1:1 though. This
makes me wonder, if it is possible, to source RXUSRCLK2 externally
(this isn't explicitly stated anywhere in the guide). This is an
absolute must for my design as there is no way to recover a clock from
the incoming data signal.

Maybe someone can help me clarify this problem.

Cheers,
Michael

Article: 108561
Subject: Re: Spartan-4 ?
From: "Antti" <Antti.Lukats@xilant.com>
Date: 13 Sep 2006 02:32:42 -0700
Links: << >> << T >> << A >>

Antti schrieb:

> John_H schrieb:
>
> > Antti wrote:
> > > New low cost families other than from Xilinx are known to be coming
> > > this autumn (Cyclone-3, MAX3, LatticeXP2) but there is no advance info
> > > an Spartan-4 yet, is there a hope at all that there will be modern low
> > > cost family from Xilinx too?
> > >
> > > Spartan-3 is 'not for new designs' as there is no price roadmap for it,
> > > Spartan-3E only has small members, eg not replacement for S3
> > >
> > > so we have vacuum in the place of Spartan-4 !
> > >
> > > I wonder if that vacuum will be filled with Cyclone-3 or is Spartan-4
> > > coming this autumn?
> > >
> > > Antti
> >
> > I don't know about positioning but
> >
> > http://tinyurl.com/fvup6
> >
> > (or
> > http://www.xilinx.com/xlnx/xil_ans_display.jsp?iLanguageID=1&iCountryID=1&getPagePath=23856)
>
> Thanks John,
> its only visible from 8.2 SP2
>
> but here is quick link to Spartan3A libraries guide :)
> http://toolbox.xilinx.com/docsan/xilinx82/books/docs/s3adl/s3adl.pdf
>
> it looks like the ICAP and therefore self-reconfiguration is added to
> Spartan-3A, but it still lacks the SPI or NOR flash configuration that
> is available on S3e.
>
> those the Spartan3A may be the replacement part for Spartan-3 meaning
> that Spartan-4 is possible even further away from being available, I
> was
> expecting Spartan-4 announcement of prelim info in 6 months from now,
> but guess we have to wait more.
>
> actually no, the Spartan3e is partial downgrade from s3e ?
> only xc3s50a, xc3s200a, xc3s400a, xc3s700a, xc3s1400a
> so no large Spartan3a devices either :(
>
> is the Spartan-3 family really the last big low cost Xilinx FPGA ??
>

just a small correction: Spartan-3A do support SPI and NOR flash
configuration modes, and there are some power saving features added:
SUSPEND/AWAKE pins!

I wonder why is the Spartan-3A suppport already in the ISE when there
is no public information available, specially the power saving features
could be interesting!

Antti

Article: 108562
Subject: SoC Development Board
From: "Markus Fuchs" <markus@yeahware.com>
Date: Wed, 13 Sep 2006 11:37:59 +0200
Links: << >> << T >> << A >>

In the company I'm working for we're using DSPs and FPGAs to develop motion 
controllers. Since we're doing all the position, speed, torque and current 
control stuff in software, we're in need of powerful devices.

We're considering to get rid of the DSP and change to a SoC design in 
future. Therefore I would be interested if anyone can suggest an affordable 
SoC development board with an ambedded (digital signal) processor to 
evaluate the possiblities of SoC. I'm especially interested in Altera boards 
as we're already using them in our old designs. What FPGA should we choose? 
Cyclone, Cyclone II, Stratix? What embedded processor fits best to motion 
control needs? NIOS, ARM, PowerPC?

I'm looking forward to your suggestions. TIA.

Markus

-- 
Markus Fuchs - http://www.yeahware.com

Article: 108563
Subject: Re: uclinux on spartan-3e starter kit
From: "Antti" <Antti.Lukats@xilant.com>
Date: 13 Sep 2006 02:43:05 -0700
Links: << >> << T >> << A >>


John Williams schrieb:

> Antti wrote:
>
> > a binary demo for Linux isnt much interesting or useful - everybody is
> > waiting when does PetaLogix finally release the PetaLinux, but so far
> > there has been to release date information announced by PetaLogix? Can
> > we assume that the actual PetaLinux release date is coming closer also,
> > or is PetaLogix still holding back information about possible release
> > date?
>
> Antti, a binary demo may not be useful to you, but it's obviously of some value
> to the people who've been asking for it, and the numerous people who've
> downloaded it in the just last 12 hours.
>
> One reason binary demos are valuable is for people who want a quick evaluation,
> a proof of concept, or even just be able to show their boss that indeed Linux on
> an FPGA works, and might make sense for their project.  They don't want to take
> the time to learn how to build it themselves, they just want the "5 minute
> demo", and that's what this is about.
>
> One of the goals of PetaLinux, of course, is to enable people to build the
> 5-minute demo themselves, as well as provide an environment for major FPGA-based
> embedded Linux development. This is a lofty goal, which is why it's taking a
> while to get it developed and documented to a state where we are happy to
> release it.
>
> > I see the 'binary demo' is still based on EDK 8.1 tools - so it can not
> > fully support the MicroBlaze version 5, I wonder why hasnt PetaLogix
> > used EDK 8.2 tools? To what I know PetaLogix had early access to EDK
> > 8.2 (and GNU code?) and those would have been in the position to use
> > the latest GCC toolchain.
>
> Yes, we do have early access to the 8.2 tools, and indeed the current demo is
> based on 8.1.  Why?  Because as a small organisation operating out of a
> university research group we have limited resources which we must manage
> carefully.
>
> We also have paying clients who expect us to deliver what we have promised.  If
> this means that PetaLinux, and other nice-to-have features that we will be
> giving aware free to the community (including you) must sometimes take a
> back-seat, then I can make no apologies for that fact.
>
> Regards,
>
> John

Hi John,

ok, I see while you are busy with paying clients you have no longer
interest to work on mb-uclinux improvement. Good point. That is
possible the reason why Xilinx did choose lynuxworks to deliver the
microblaze 2.6.x port.

For those who want to use GPL tools to compile MicroBlaze 5.0
applications on WinXP platform here is the cygwin compiled MicroBlaze
toolchain from EDK 8.2 release.

http://www.xilant.com/downloads/mb_gnu_8_2.zip

I have only tested it to succesfully compile MicroBlaze u-boot, so the
toolchain is working at least.

Antti Lukats

Article: 108564
Subject: Re: Xilinx ISE ver 8.2.02i is optimizing away and removing "redundant" logic - help!
From: "KJ" <kkjennings@sbcglobal.net>
Date: Wed, 13 Sep 2006 09:56:02 GMT
Links: << >> << T >> << A >>


<james7uw@yahoo.ca> wrote in message 
news:1158116145.025583.289450@h48g2000cwc.googlegroups.com...
>>
>> > I have a lot
>> > of identical ROMs that I use to do parallel processing; those were
>> > being removed in the synthesis and translate step due to not having
>> > clocks on them.
>> I don't doubt what you say but I also don't quite understand why ROMs
>> would be 'removed' either.  Maybe all you meant is that is that you
>> couldn't find specific entities in the post-map VHDL that equated to
>> the various 'ROMs' that you instantiated in the original code....but
>> that's OK, a ROM is simply an array of constants, I would expect those
>> to get rolled right into the logic.  I can see where targetting a
>> particular family might have to use logic blocks instead of embedded
>> memory to implement what your code says (but could use embedded memory
>> if you chose to implement a clocked ROM) but that doesn't mean that
>> that the original unclocked ROM is not synthesizable at all.
>
> The explanation I received was that without a clock, they
> were being interpreted as asynchronous RAMs and were
> optimized away.
Well whatever is 'optomizing' them away has a bug in it then if the output 
is now 'different' because of that optomization.  Like I said, an asynch ROM 
is simply a table of constants.  Synthesis tools are very good at optomizing 
constants (as they should be).  It wouldn't surprise me at all that...
- You wouldn't be able to 'find' the ROM after mapping to a particular part 
because the result of those constants has been integrated into whatever 
downstream logic that the ROM was feeding.
- That the implementation might (probably) use more logic resources and none 
of the internal memory if the targetted part requires a clock in order to be 
able to map it into one of those internal memories.

In any case, the overall function has not changed it should simulate the 
same.  If not, then a simple test case and a service request to Xilinx might 
be in order.

> Further explanation was not given to me.
> That was happening at the Translate step,
> which was the previous step to the mapping step, and is fixed.
Not sure I would call it 'fixed' (unless what was 'broken' was just the 
ability to use internal memory which as mentioned above is not really a 
functional issue but one of trying to properly use internal resources to 
implement a given function).  Any way, moving on.

>>
>> 'Redundant' in this context generally means that the fitter found that
>> you have two equations that are logically equivalent.  An example...
>>
>> d <= a or b or c;
>> h <= e or f or g;
>> ....
>> a <= e;
>> b <= f;
>> c <= g;
>>
>> The signal 'h' is redundant since it is logically equivalent to 'd'
>> since, although the signals appear to be different for calculating 'h',
>> from a logic perspective they are identical because of the 'a<= e....'
>> assignments.
>
> Does the mapper really do this?
Yes as it should.  Remember, 'logic' does care about propogation delays and 
from the standpoint of transforming the source code into an implementation 
these things can legally be combined as redundant since they (in this case 
'd' and 'h') perform exactly the same function.  You wouldn't be able to 
tell from the outside which is 'd' and which is 'h' by wiggling the inputs 
'e', 'f' or 'g'.  Another simple example is
x <= not(y0);
y0 <= not(y1);
y1 <= not(y2)
y2 <= not(y);

which is equivalent to x <= not(not(not(not(y))));
which is equivalent to x<= y;
which when implemented in an FPGA would not even use a single logic 
resource.  Whatever logic in the original source that needed 'x' or 'y' 
would get the same signal

> Is this what Xilinx means by
> "Redundant Blocks" of logic at the mapping stage?
I believe so, but haven't had the need to dig any deeper.

Article: 108565
Subject: Re: fastest FPGA
From: "rickman" <gnuarm@gmail.com>
Date: 13 Sep 2006 03:05:09 -0700
Links: << >> << T >> << A >>

David Ashley wrote:
> John_H wrote:
> > SRLs are a win for a transpose or rotate where the function size is
> > almost 1/4 of a more traditional approach.
>
> I've been seeing this "SRL" term used a lot, what does it mean?
>
> :) I just did a google search for "srl xilinx" and got some useful
> info, and so I created a Wikipedia page on it since one didn't seem
> to exist.
>
> http://en.wikipedia.org/wiki/Shift_Register_Look_Up_Table
>
> Anyone reading this feel free to expand on it.
>
> -Dave

How do you get the wikipedia corrected?  I clicked the link on the SRL
page to the FPGA page and then on through to the partial
re-configuration page at...

http://en.wikipedia.org/wiki/Partial_re-configuration

This page says "In current versions of software, Xilinx supports
partial reconfiguration on Spartan 3...".  I am pretty certain that
this is not supported in Spartan 3.  I have requested that this be
supported in Spartan 3 since they came out and I still have not seen it
appear.  

Am I wrong, or is the wikipedia wrong?


From removethisthenleavejea@replacewithcompanyname.co.uk Wed Sep 13 03:18:19 2006
Path: newssvr29.news.prodigy.net!newsdbm05.news.prodigy.com!newsdst01.news.prodigy.net!prodigy.com!newscon04.news.prodigy.net!newsfeed.telusplanet.net!newsfeed.telus.net!news-east.rr.com!news.rr.com!newscon02.news.prodigy.net!prodigy.net!news.glorb.com!solnet.ch!solnet.ch!news.clara.net!wagner.news.clara.net!monkeydust.news.clara.net!demeter.uk.clara.net
From: "John Adair" <removethisthenleavejea@replacewithcompanyname.co.uk>
Newsgroups: comp.arch.fpga
References: <ee8ji4$ctj$02$1@news.t-online.com>
Subject: Re: SoC Development Board
Date: Wed, 13 Sep 2006 11:18:19 +0100
Lines: 39
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.2869
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2962
X-RFC2646: Format=Flowed; Response
X-Complaints-To: abuse@clara.net (please include full headers)
X-Trace: 1233905003d8130921e523a037c23143221803056e4345e7617284164507daef
NNTP-Posting-Date: Wed, 13 Sep 2006 11:18:23 +0100
Message-Id: <1158142703.3706.0@demeter.uk.clara.net>
Xref: prodigy.net comp.arch.fpga:119459

Markus

Not Altera, as yet, but we have a range of Xilinx based boards with a high 
degree of flexibility and ability to add custom things. Have look here 
http://www.enterpoint.co.uk/boardproducts.html and see if anything there 
takes your interest. Commercial bit - If you don't find what you need we 
specialise in doing derivatives quickly too. The best example, in the public 
domain of what we do in fast turn, is our MINI-CAN board that had a design 
cycle, manufacture, 5 days of bench test and boards delivered to the 
customer in 18 calendar days.

John Adair
Enterpoint Ltd. - Home of Broaddown2 with added Virtex-4 Solution.
http://www.enterpoint.co.uk


"Markus Fuchs" <markus@yeahware.com> wrote in message 
news:ee8ji4$ctj$02$1@news.t-online.com...
> In the company I'm working for we're using DSPs and FPGAs to develop 
> motion controllers. Since we're doing all the position, speed, torque and 
> current control stuff in software, we're in need of powerful devices.
>
> We're considering to get rid of the DSP and change to a SoC design in 
> future. Therefore I would be interested if anyone can suggest an 
> affordable SoC development board with an ambedded (digital signal) 
> processor to evaluate the possiblities of SoC. I'm especially interested 
> in Altera boards as we're already using them in our old designs. What FPGA 
> should we choose? Cyclone, Cyclone II, Stratix? What embedded processor 
> fits best to motion control needs? NIOS, ARM, PowerPC?
>
> I'm looking forward to your suggestions. TIA.
>
> Markus
>
> -- 
> Markus Fuchs - http://www.yeahware.com
>

Article: 108566
Subject: Xilinx Platform Studio, build up System: "block-RAM components require the adjacent multiplier"
From: "Peter Kampmann" <peter.kampmann@googlemail.com>
Date: 13 Sep 2006 03:29:29 -0700
Links: << >> << T >> << A >>

Hi,

after Developing connection my custom peripheral to the OPB-Bus, I
tried to download my Design to the FPGA.
The custom Design including the Bus needs 87% of my FPGA (Virtex2Pro30
896-7).
The device summary is as follows:

===========================================================

Device utilization summary:
---------------------------

Selected Device : 2vp30ff896-7

 Number of Slices:                   12131  out of  13696    88%
 Number of Slice Flip Flops:         14472  out of  27392    52%
 Number of 4 input LUTs:             15191  out of  27392    55%
 Number of IOs:                        109
 Number of bonded IOBs:                 96  out of    556    17%
 Number of BRAMs:                       78  out of    136    57%
 Number of MULT18X18s:                 136  out of    136   100%
 Number of GCLKs:                        1  out of     16     6%


===========================================================

In the next step, I tried to download the design.

In my first try, the system consisted of the following parts:

In my last try, of consists of the following parts:

http://www.student.uni-oldenburg.de/peter.kampmann/deuI.jpg
http://www.student.uni-oldenburg.de/peter.kampmann/report_DEUI.pdf


the Synthesis of the design aborts with the following message:

ERROR:Place:665 - The design has 106 block-RAM components of which 4
block-RAM components require the adjacent
   multiplier site  to remain empty. This is because certain input pins
of adjacent block-RAM and multiplier sites share
   routing ressources. In addition, the design has 136 multiplier
components. Therefore, the design would require a
   total of 140 multiplier sites on the device. The current device has
only 136 multiplier sites.

After that, I removed the RS232 from the design and tried again.

Finally, I moved all components from the PLB to the OPB Bus, where
possible, that gives:

http://www.student.uni-oldenburg.de/peter.kampmann/deuIII.jpg
http://www.student.uni-oldenburg.de/peter.kampmann/report_DEU.pdf

And the following error:

ERROR:Place:665 - The design has 84 block-RAM components of which 2
block-RAM components require the adjacent multiplier
   site  to remain empty. This is because certain input pins of
adjacent block-RAM and multiplier sites share routing
   ressources. In addition, the design has 136 multiplier components.
Therefore, the design would require a total of 138
   multiplier sites on the device. The current device has only 136
multiplier sites.

Has anybody experienced the same problem? Does anyone have a solution
for that, without building a smaller design? The FPGA has 136
multipliers and 136 Block RAMs, does that mean you cannot use all
multipliers when you design a complete system with PowerPCs etc?

Article: 108567
Subject: Re: Xilinx Platform Studio, build up System: "block-RAM components require the adjacent multiplier"
From: "Antti" <Antti.Lukats@xilant.com>
Date: 13 Sep 2006 03:41:19 -0700
Links: << >> << T >> << A >>

Peter Kampmann schrieb:

> Hi,
>
> after Developing connection my custom peripheral to the OPB-Bus, I
> tried to download my Design to the FPGA.
> The custom Design including the Bus needs 87% of my FPGA (Virtex2Pro30
> 896-7).
> The device summary is as follows:

> Has anybody experienced the same problem? Does anyone have a solution
> for that, without building a smaller design? The FPGA has 136
> multipliers and 136 Block RAMs, does that mean you cannot use all
> multipliers when you design a complete system with PowerPCs etc?

if your design uses 100% of the multipliers and some other ip requires
the BRAM placement that requires the multiplier being empty then, it
want fit.

maybe there is a way to relax the placement with some trick, try to
create a design where you are using 0 BRAMs and 100% multipliers, see
if that gets mapped without problems.

I think if you are not using OCM brams the PPC design should not
require and BRAMs at all so all multipliers should be useable, of
course if that is not the case and the use of PPC instantly reduces the
amount of useable multipliers then this should be documented by Xilinx
somehow

Antti

Article: 108568
Subject: Re: SoC Development Board
From: Jim Granville <no.spam@designtools.maps.co.nz>
Date: Wed, 13 Sep 2006 22:59:03 +1200
Links: << >> << T >> << A >>

Markus Fuchs wrote:
> In the company I'm working for we're using DSPs and FPGAs to develop motion 
> controllers. Since we're doing all the position, speed, torque and current 
> control stuff in software, we're in need of powerful devices.
> 
> We're considering to get rid of the DSP and change to a SoC design in 
> future. Therefore I would be interested if anyone can suggest an affordable 
> SoC development board with an ambedded (digital signal) processor to 
> evaluate the possiblities of SoC. I'm especially interested in Altera boards 
> as we're already using them in our old designs. What FPGA should we choose? 
> Cyclone, Cyclone II, Stratix? What embedded processor fits best to motion 
> control needs? NIOS, ARM, PowerPC?
> 
> I'm looking forward to your suggestions. TIA.

I think you are asking about moving the DSP into the FPGA - but we
have no info on important details like:
** Code size and Data Size of present DSP
** Speed and Numeric ability of present DSP
** Does that DSP have FLASH or RAM
** Does this have instant-on, or Watchdog requirements
** Do you expect to execute code from FPGA BRAM, or Off Chip SRAM,
or from Serial FLASH ?
** ADCs and other non digital peripherals included on present DSP
** Mix of core motion Sw, vs Human interface code ?
** PCB layer count you expect
** Design life time of the result

32 bit uC are now quite widespread and cheap, so a possible
split of design resource would be to pull the DSP-motion code into the 
FPGA, (where it runs in BRAM) and move the Human and Interface code into 
a Flash uC

-jg

Article: 108569
Subject: Re: use of Barrel shifter IN ARM TDMI 9
From: Joseph <joseph.yiu@somewhere-in-arm.com>
Date: Wed, 13 Sep 2006 12:54:25 +0100
Links: << >> << T >> << A >>

karunesh.ind@gmail.com wrote:
> i am preparing for intervew and i want the answer how Barrel shifter
> can be used to optimize of C code at ARM processor.
> 
> i have learnt that Barrel shifter is :A  digital circuit that can shift
> a data word by any number of bits in a single cycle. It is implemented
> as a sequence of multiplexors: the output of one MUX is connected to
> the input of the next MUX in a way that depends on the shift distance.
> The number of multiplexors required is log2(n), where n is the
> computer's register size.
> 

There are many uses:
The most basic application could be merging of multiple data bit fields 
into one word.  For example, we need to put value A in upper half word
of a register and value B is lower half word, and if we know that both
values are less then 16 bits and unsigned, we can write

   C = (A << 16) + B;

It could then compiled into
   ADD	Rc, Rb, Ra, LSL #16

Only one instruction is needed for the add and the shift.

The second common usage is when accessing array.  For example, A is set 
to array base address, B is array index, and the array elements are 
word.  When reading an element from the array, we can then use

   LDR  Rc, [Ra, Rb, LSL #2]

Only one instruction is needed for the memory read, addition for address 
and the shift by 2 (word size).

For question about ARM processors, the newsgroups
- comp.sys.arm
- comp.arch.embedded
are more suitable.

Joseph

Article: 108570
Subject: Re: Xilinx ISE ver 8.2.02i is optimizing away and removing "redundant" logic - help!
From: Brian Drummond <brian_drummond@btconnect.com>
Date: Wed, 13 Sep 2006 12:54:49 +0100
Links: << >> << T >> << A >>

On 12 Sep 2006 08:34:10 -0700, james7uw@yahoo.ca wrote:

>> What you need to do is to simulate the post-map VHDL file and trace it back
>> to why output signal 'x' at time t is set to '0' but when you use your
>> original code it is '1'.  Use the sim results from using your original code
>> as your guide for what 'should' happen and the post-map VHDL simulation for
>> what is actually happen and debug the problem.
>
>I agree that finding out what is going on is the best
>approach. Do you have any debugging tips other than comparing
>the simulation results in detail and seeing what logic calculations
>must be getting removed?

One tip: instantiate both behavioural and post-map modules in your
testbench, and run them in parallel. You can assert on differences in
the outputs, and trace internal signals in the wave window (to the
extent that you can still recognize internal signals). 

Possibly also set breakpoints on differences in internal signals which
ought to be the same.

- Brian

Article: 108571
Subject: Re: Simulating EDK 8.1i System using ModelSim 6.1e
From: Brian Drummond <brian_drummond@btconnect.com>
Date: Wed, 13 Sep 2006 13:07:39 +0100
Links: << >> << T >> << A >>

On 12 Sep 2006 07:45:19 -0700, "kits59@gmail.com" <kits59@gmail.com>
wrote:

>
>Brian Drummond wrote:
>> On 11 Sep 2006 12:58:57 -0700, "kits59@gmail.com" <kits59@gmail.com>
>> wrote:
>>

>> Something else must be the problem : check clocks, resets, is your bRAM
>> mapped to cover the boot address (FFFFFFFC for the PPC405), there are no
>> "Warning: unbound component" messages when ModelSim loads the design
>> etc?
>>
>> - Brian
>
>The funny thing that I should probably state is that the entire system
>was simulating perfectly fine under EDK 7.2.  The bRAM is mapped
>correctly so that the starting values are where they expect them to be.

Ah.

I haven't tried porting a design to 8.x from 7.x yet, but had bad
experiences running 6.x projects under 7.1.

Did you use the "import earlier version" tool when moving to 8.1?
Did it ask you if you wanted to upgrade any cores, or report that it had
to, because some of the earlier cores were no longer supported?
Sometimes the updated cores are incompatible with the earlier ones, and
the innocuous "update" breaks the design in hard-to-find ways.

I confess I never DID get to the bottom of one such "upgrade" on a
demonstration app, I didn't find out exactly how the upgrade had broken
it - it was easier to simply ask the vendor to supply a 7.1 version!

If I HAD to fix it, I'd try using the original cores, or (if not
possible) I'd check the logs of the first build in 8.1, and check all
port connections, and see if the register definitions (and address maps)
had changed between versions...

I wonder if Xilinx have fixed the "import from earlier versions"
problems with EDK 8.1? Anyone have experience of this transition?

- Brian

Article: 108572
Subject: removing Ethernet_MAC kills mini-module project
From: "Anonymous" <someone@microsoft.com>
Date: Wed, 13 Sep 2006 12:20:44 GMT
Links: << >> << T >> << A >>

I have a memec mini-module board. I've loaded and run the reference design
and I am able to boot all the way into the Linux prompt. However, my design
doesn't need ethernet so I deleted it. Now it doesn't even run the code in
the initial BRAM successfully. I've traced the code and it seems to end up
at the _exit crt function. (BTW, is there a way to get the c symbols in
XMD?)

Any ideas why removing the ethernet would clobber the system to the point
that even the uart doesn't work?

Thanks,
Clark

Article: 108573
Subject: Re: fastest FPGA
From: Ray Andraka <ray@andraka.com>
Date: Wed, 13 Sep 2006 08:22:01 -0400
Links: << >> << T >> << A >>

David Ashley wrote:
> John_H wrote:

> 
> I don't really follow how the circuit works. I mean, before you can
> output P0 you would have had to read in every single row from A to O,
> that's a lot of data you need to store. Perhaps on the order of a
> 384 element shift register?
> 
> -Dave
> 

Yes, but since the input is serial and we only take one output at a 
time, the SRL 16s let us collapse the shift register into LUT resources 
giving a 16:1 savings.  Since the data is input in row raster form, it 
can naturally be done by shifting each row into a series of SRL16s. 
Then the read out is down columns, so you read one sample out of each 
Row's shift register, advancing the shift register after each read.

Article: 108574
Subject: Re: Xilinx Platform Studio, build up System: "block-RAM components require the adjacent multiplier"
From: Brian Drummond <brian_drummond@btconnect.com>
Date: Wed, 13 Sep 2006 13:28:35 +0100
Links: << >> << T >> << A >>

On 13 Sep 2006 03:29:29 -0700, "Peter Kampmann"
<peter.kampmann@googlemail.com> wrote:

>Hi,

>ERROR:Place:665 - The design has 106 block-RAM components of which 4
>block-RAM components require the adjacent
>   multiplier site  to remain empty. This is because certain input pins
>of adjacent block-RAM and multiplier sites share
>   routing ressources. 

The hint is here... only 4 blockRAMs require the multiplier site empty.

>Has anybody experienced the same problem? Does anyone have a solution
>for that, without building a smaller design? The FPGA has 136
>multipliers and 136 Block RAMs, does that mean you cannot use all
>multipliers when you design a complete system with PowerPCs etc?

I have seen this for Spartan-3, and didn't realise it also applied to
V2Pro. There are _some_ shared connections between a BRAm and a
multiplier. Not all uses of the BRAM require those shared connections;
indeed, most don't. Specifically, using both ports at the fullest width
(32 or 36 bits per port) is a problem. 

If you can identify a way to use deeper but narrower BRAM blocks (only
18 bits wide for example) for four of your BRAMs, you are OK.

If not, I notice your LUT and FF usage are both below 60%. Therefore, if
you can move 4 of your multipliers into LUT fabric, you are also OK.
(For example, if you are using a few of the 18*18 mults as 8*8 mults,
these are an ideal candidate)

ATTRIBUTE mult_style : STRING;
ATTRIBUTE mult_style of <mylabel_1> is "block";
ATTRIBUTE mult_style of <mylabel_2> is "lut";

might be useful...

- Brian

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search