Messages from 41650

Article: 41650
Subject: Re: pipelined correlation block on Virtex2000?
From: thor_spambox@yahoo.com (Jonas Thor)
Date: 4 Apr 2002 09:36:17 -0800
Links: << >> << T >> << A >>

maxedman3503@yahoo.com (Max Edmand) wrote in message news:<3a30996f.0203291324.18cb9ba2@posting.google.com>...
> Hello all,
> 
> I'm trying to design a block to perform correlation
> on two vectors: A and B. (A and B each have two elements:
> A=(a1, a2) and B=(b1, b2) 
> The correlation between A and B shoul be calculated like
> this: 
> 
> corr = [(a1 * b1) + (a2 * b2)] / [sqrt(a1^2 + a2 ^2) * sqrt(b1^2 + b2^2)]

I think the CORDIC algorithm can be useful here. 

Since sqrt(a^2 + b^2) is the magnitude of vector [a, b] you could use
the CORDIC algorithm to compute the vector magnitude and save a lot of
resources.

Check mr. Andraka's website for more information:

http://www.fpga-guru.com/cordic.htm 

/ Jonas Thor

thor(at)sm.luth.se
replace (at) with @

Article: 41651
Subject: Re: Schematic Stuff
From: Eric Crabill <eric.crabill@xilinx.com>
Date: Thu, 04 Apr 2002 10:14:21 -0800
Links: << >> << T >> << A >>


Hi,

What follows below is my personal opinion.

I think Viewdraw by Viewlogic (Innoveda) is the best
schematic entry tool I have ever used.  I've used the
Viewdraw 6.1 on Solaris and also Workview Office 7.5
on WindowsNT.

This tool does have a bit of a learning curve.  After
using it for a while, though, it now seems natural and
the keyboard shortcuts and command line really let you
fly through your work...

Hope that helps,
Eric

Christopher Saunter wrote:
> 
> Greetings All,
> 
> I have always found it more natural to work with schematics than an HDL
> (although learnign vhdl is proving very usefull in some areas...)
> 
> I have used the Aldec schematic capture program from Xilinx Foundation 3/4
> and ECS from Webpack.
> 
> Thus far, I am somewhat underwhelmed by these tools - I have always felt
> that a good tool (eg text editor, ide etc.) should allow you to work about
> as fast as you can enter data, and this is just not the case with the
> schematic capture tools I have used.
> 
> So my question is:  Does anyone know of a powerfull, flexible schematic
> editor with decent (preferably configurable) key bindings, rock like
> stability, a nice user interface, highly intuitive, that is fast and a
> pleasure to use etc?
> 
> One that uses an HDL description of each schematic behind the scenes ECS
> style is probably a plus.
> 
> Or should I just be gratefull I'm not directly entering netlists... ;-)
> 
> Cheers,
> Chris Saunter

Article: 41652
Subject: Re: Signals pollution.
From: Peter Alfke <peter.alfke@xilinx.com>
Date: Thu, 04 Apr 2002 10:23:01 -0800
Links: << >> << T >> << A >>

The way you describe it, you have crosstalk between (almost) adjacent pins.
That is not too uncommon, and depends on the output impedance of the affected
pin.
I suggest you check this by testing the output with an external pull-down
resistor, of 100 Ohm to 1 kilohm.
If the output impedance is around 20 Ohm, you should not have this crosstalk.
Assuming max 5 pF capacitance across, the time constant would be 100 ps,
invisible on most 'scopes.
If you find the output impedance to be 20 kilohm or more, then I am not
surprised about the crosstalk, since the coupling time constant could be 0.1
microsecond.

Moral: A pin with no active or low-impedance pull-up is easily affected by
adjacent signals. This has nothing to do with FPGA technology, it's a package
issue, and is usually irrelevant. Who cares about a signal that has no driver
attached to it ?

Peter Alfke
======================================
Frank Zampa wrote:

>
>
> The ripple I saw is in on the positive logic level, but i can't say
> that is "only" on the positive level, because the duration of low
> level of the signal with the ripple is shorter than the frequency of
> ripple, so i can't see the 0.8 V PP on the low level.
> I think that the ignal is not an LVTTL because the Spartan I use do
> not use that levels. On the signal without pollution the high logic
> level is about 4.1-4.3 Volts.
>
> Frank.

Article: 41653
Subject: Re: Monostable multivibrator
From: Mike Treseler <mike.treseler@flukenetworks.com>
Date: Thu, 04 Apr 2002 10:25:35 -0800
Links: << >> << T >> << A >>

JB wrote:
 
> I guess that to implement a monostable multivibrator using a Xilinx FPGA
> should be pretty common.
> 
> Maybe somebody provide me with a hint or an example?


Could use a shifter, an edge detector, and a counter.

 -- Mike Treseler

Article: 41654
Subject: Re: powerpc in virtex2pro
From: Lasse Langwadt Christensen <langwadt@ieee.org>
Date: Thu, 04 Apr 2002 11:26:11 -0700
Links: << >> << T >> << A >>

Peter Alfke wrote:
> 
> "Cyrille de Brébisson" wrote:
> 
> > In our design we are using an ARM CPU. My question is:
> > Can we put an ARM in the virtex 2 pro?
> > Were can I find/buy an ARM cpu core source (or precompiled) file to program
> > in my FPGA?
> >
> 
> Cyrille,
> the answer to both your questions is: No.
> The PowerPC in Virtex-II Pro is a "hard" implementation, packing the
> microprocessor with its caches and MMU into the smallest possible silicon
> area, <4 square millimeters.
> What you seem to be looking for is a "soft" implementation, using the
> programmable logic "fabric".
> That solution is impractical for something as complex as PowerPC or even ARM.
> It would take up an unreasonable portion of a large chip, and achieve mediocre
> performance at best.
> Xilinx offers a soft microprocessor, called MicroBlaze, especially tuned for
> efficient implementation in the Virtex architecture. It is not as fast and
> capable as PowerPC, but uses only ~900 slices.
> "Half the size and twice the speed of NIOS" is the Xilinx slogan. Please, no
> flames...
> 
> Peter Alfke, Xilinx Applications

you can definately put an ARM in an FPGA the last project I worked on, I
did 
an a ASIC proto of a SoC with an ARM7-TDMI-S in a virtexE, rigth now I'm
working 
on something similar but in a virtex2, so it can hopefully get more of
the 
clock gating in the design working in the prototype. 

Size and performance will not be like a hard implementation, but for a
prototype 
that doesn't really matter as long as the performance is enough and the
design 
fits a chip you can buy. And if you need to there's things that could be
changed 
to bettter fit and fpga, so performance could be increased, but for a
prototype 
you don't what to do that unless you have to.   

But anyways, buying the source code for an ARM will probably cost you an
arm ;) and a leg,

-Lasse
-- Lasse Langwadt Christensen, 
-- Aalborg, Danmark

Article: 41655
Subject: Re: powerpc in virtex2pro
From: "Falk Brunner" <Falk.Brunner@gmx.de>
Date: Thu, 4 Apr 2002 20:47:01 +0200
Links: << >> << T >> << A >>

"Peter Alfke" <peter.alfke@xilinx.com> schrieb im Newsbeitrag
news:3CAB7A34.F16B1AC4@xilinx.com...
> No surprise, and an excellent argument for on-chip microprocessors running
out of on-chip caches
> and BlockRAM, and having good connectivity to the FPGA fabric.
> Let me stop here, before I get into my Virtex-II Pro with PowerPC pitch...
:-)

Peter Alfke, always on duty !
SNCR. . .;-)

--
MfG
Falk

Article: 41656
Subject: Re: Monostable multivibrator
From: Keith R. Williams <krw@btv.ibm.com>
Date: Thu, 4 Apr 2002 13:54:14 -0500
Links: << >> << T >> << A >>

In article <E30r8.139983$u77.31687100@news02.optonline.net>, jbonill1
@optonline.net says...
> I am new come in the FPGA business.
> 
> I guess that to implement a monostable multivibrator using a Xilinx FPGA
> should be pretty common.

Do you have a clock?  What parameters? Do you want retriggerable or 
non-retriggerable.  Level sensitive trigger?  Edge?  

> 
> Maybe somebody provide me with a hint or an example?

Hint 1: don't think about doing this with a resistor/capacitor or 
&Diety. forbid, a chain of inverters for timing.

Hint 2:  If you have a clock with a period less than your time-delay, 
think down counter.  Set the counter on whatever trigger you want. 
Block triggers you don't.  Flip on trigger, flop on count = 0.

----
  Keith

Article: 41657
Subject: Re: powerpc in virtex2pro
From: "Falk Brunner" <Falk.Brunner@gmx.de>
Date: Thu, 4 Apr 2002 20:57:04 +0200
Links: << >> << T >> << A >>


"Austin Lesea" <austin.lesea@xilinx.com> schrieb im Newsbeitrag
news:3CAC80DE.40E4D721@xilinx.com...
> If 405ppc are everywhere, you may dedicate them to tasks that seem
horribly
> inefficient if you continue to think in terms of the one big expensive
monster
> processor.

110% acknowledge!!!!!!!

This "one big CPU for all task" is the ancient approach of those Intel guys.
I remember a day, not too long ago, where Intel saw the future of the
personel computer with just a big RAM and a CPU, doing everything just in
software. :-0
Hey guys, see those grafic controllers nowadays? See how many transistor
they have? See how much OPS they do?
Yes?
So go home and cry.
;-)

--
MfG
Falk

Article: 41658
Subject: Re: powerpc in virtex2pro
From: nweaver@CSUA.Berkeley.EDU (Nicholas Weaver)
Date: Thu, 4 Apr 2002 19:04:22 +0000 (UTC)
Links: << >> << T >> << A >>

Additional view from a computer architect type:

In article <3CAC80DE.40E4D721@xilinx.com>,
Austin Lesea  <austin.lesea@xilinx.com> wrote:
>> 1) How come there isn't a dedicated DDR interface on the chip. I've never
>> seen a PPC application that didn't require DRAM, a dedicated interface
>> would be cheaper and higher performing than using valuable CLBs to build
>> a soft interface. (If I'm mistaken about the lack of a dedicated DDR
>> interface please let me know, I didn't see any mention of one when I read
>> the spec).
>
>DDR is built out of the DDR FF in the IOB's and logic in the FPGA.  DDR isn't
>the only standard, and customers have many other applications.  DDR is neat,
>but too specific.

There is also a design pholosophy (which I can agree with for some
uses, can't for others here) that only the minimally useful set should
be implemented, because that is the cheapest and useable by the most
people.

A dedicated DDR SDRAM interface would be very nice, but that would
consume a couple mm^2 of silicon, which is only usable by those who
are going to plunk down a DDR interface, on a specific set of pins.

>> 2) I don't see the need for putting four processors on a die. In almost
>> all cases a single 405 should be adequate, in a few case you could make
>> good use of two but I don't think that you would ever need four. There
>> should have been a wider choice of parts with a single 405 core.
>
>We just don't know how customers will use all of this power.  If 405ppc's are
>'free', you can use one executing out of internal cache to handle the "error
>404", and another running off internal cache to monitor QOS, etc.
>
>When electric motors were very expensive, a machine shop had one, and leather
>belts to every tool station.  When fractional horsepower motors became
>inexpensive and ubiquitous, they were used everywhere, with no thought.
>
>If 405ppc are everywhere, you may dedicate them to tasks that seem horribly
>inefficient if you continue to think in terms of the one big expensive monster
>processor.

And processors these days, for a simple core, are INCREDIBLY cheap,
especially this one:

It has no memory (those are the BlockRAMs), only the register file,
datapath, and control logic.  

Even in synthesis, discounting the register file and caches, a 5 stage
SPARC uP core takes 1.3mm x .85mm in a .18uM process.  The caches, out
of 4 1024x32b memories, are almost as big as the core itself!
http://www.eecg.toronto.edu/~pagiamt/research/leon.html

So in the area of about ~8-10 Virtex 2 BlockRAMs (1024x18b memories),
you can fit a SYNTHESIZED sparc core (without a hardware
multiplier/divider or MMU).  I suspect that the Virtex 2 PPC core is
even smaller, but with most of the actual area being the interfacing
of the core to everything else.

I'd love to get my hands on an XC2VP4 or larger die or die photo, just
to verify these hunches about area in more detail.

But according to the datasheet, the XC2VP2 uses 4 columns, 4 high of
BlockRAMS, with the top and bottom of the center columns replaced with
the RocketIO transecivers, so a pitch of 4 clb slices/BlockRAM.

The XC2VP4 uses 4 columns, 10 high (its a 40x22 instead of a 16x22
array) and has 28 BlockRAMs, so 8 BlockRAMs are replaced for the PPC
core, and 128 CLBs (500 slices) of logic.  This is pretty CHEAP!  

If you have a low time critical function (EG, one which takes a fair
path-length, but isn't necessarily pipeline-every-cycle), if you can
replace just 128 CLBs with the use of the processor core, you've won,
bigtime.  So my assumption here is the 8 BlockRAMs of area are
replaced with the uP core, with the rest going to a heck of a lot of
interface logic.

>> 4) On chip Flash RAM would be useful. An embedded PPC is going to require
>> some Flash. Also it would be nice if the serial Flash RAM were on chip,
>> I bet every one is sick of the extra part that most Xilinx designs
>> require.
>
>Flash requires a process that is usually two years behind the leading
>process.  To do a flash capable FPGA would be to be obsolete on day 1 of the
>introduction.  Not very exciting.

The only way I could conceive of their being Flash on the die is some
fancy packaging, eg, a chip-up smaller flash chip bonded to internal
pads on a chip down larger part.  And do you REALLY want to spend an
extra $20 just to reduce your part count from 2 to 1, and save 16-30
external pins?

>> 6) This is a Virtex II issue, not just a Virtex II Pro issue. How about
>> offering versions of the Virtex II without the on board multipliers. The
>> multipliers make sense for DSP applications but they are a waste of money
>> and power for everything else. In my 12 years doing Xilinx designs I have
>> never needed a multiplier. I've frequently needed a CAM so I wouldn't
>> mind a few CAMs on board, but I'd rather have a cheaper part without the
>> multiplers.
>
>Well, they take up a tiny amount of area, so the cost savings is washed out
>completely by having to make two parts, with lower volumes in each.

And, as Ray Andraka has pointed out, a multiplier makes a great
shifter as well.  A variable shift is suprisingly expensive in an FPGA
fabric: there are a lot of muxes, but it is an operation that is
suprisingly common.

An 18x18 multiplier can implement an 18 bit variable rotation with
just 18 LUTs worth of logic to deincode the shift amount, and an
additional 18 LUTs worth of logic if you want to make it a left
shift/rotate, an additional 36 LUTs worth if you want to make a
variable left/right shift.

The multiplier blocks are an example of something which IS very
common.
-- 
Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Article: 41659
Subject: Re: powerpc in virtex2pro
From: nweaver@CSUA.Berkeley.EDU (Nicholas Weaver)
Date: Thu, 4 Apr 2002 19:09:15 +0000 (UTC)
Links: << >> << T >> << A >>

In article <3CAC9AC3.C181A79D@ieee.org>,
Lasse Langwadt Christensen  <langwadt@ieee.org> wrote:
>But anyways, buying the source code for an ARM will probably cost you an
>arm ;) and a leg,

While SPARC is free.  :)
http://www.gaisler.com/leon.html
-- 
Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Article: 41660
Subject: Re: hand placement
From: nweaver@CSUA.Berkeley.EDU (Nicholas Weaver)
Date: Thu, 4 Apr 2002 19:14:25 +0000 (UTC)
Links: << >> << T >> << A >>

In article <a8i7t9$c1b$1@newsreader.mailgate.org>,
Kevin Brace  <ihatespam84kevinbraceusenet@ihatespam84hotmail.com> wrote:
>        I have seen one Xilinx employee in this newsgroup saying that
>automatic P&R is getting better, so low level tools like floorplanner or
>FPGA Editor is getting less important.

Everytime Xilinx/Altera/Tool people say this, I have to laugh.

There is so much low hanging fruit in datapath recognition, which the
tools fail MISERABLY to recognise.  A simple first order pass, align
up the datapath, can be such a win.
-- 
Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Article: 41661
Subject: Re: powerpc in virtex2pro
From: nweaver@CSUA.Berkeley.EDU (Nicholas Weaver)
Date: Thu, 4 Apr 2002 19:40:46 +0000 (UTC)
Links: << >> << T >> << A >>

In article <a8i8kh$sh28a$2@ID-84877.news.dfncis.de>,
Falk Brunner <Falk.Brunner@gmx.de> wrote:
>110% acknowledge!!!!!!!
>
>This "one big CPU for all task" is the ancient approach of those Intel guys.
>I remember a day, not too long ago, where Intel saw the future of the
>personel computer with just a big RAM and a CPU, doing everything just in
>software. :-0
>Hey guys, see those grafic controllers nowadays? See how many transistor
>they have? See how much OPS they do?
>Yes?

Pfah.  Big bloated pieces of silicon.  :)

It has ALWAYS been that several small processors are more "efficient"
than one big processor, and it has always been a matter of
programmability.

A classic example is the Intel IXP1200 network processor, it consists
of a single ARM core and 6 small risc-like cores, with context-switch
on event (memory miss).  A really powerful architecture if you can
program it, and small too.  Excluding the numerous interfaces (SDRAM,
PCI, IXP bus, etc), it ends up being in the ~$10 silicon range.

There is a lot of space still left in architectures with such
performance that are also easier to program.

Remember, an 8x8mm die, in a wafer level package, can buy you >200
pins [1], 10+ 32b Gops/second, in the sub $10/chip range. [2]

[1] albeit at a .5mm pitch.  Then again, 200 pins, any other ways, is
going to easily add another $4-5 to the chip cost.  So it is a
tradeoff: higher board cost, lower part cost and area.
-- 
Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Article: 41662
Subject: Re: powerpc in virtex2pro
From: Ken McElvain <ken@synplicity.com>
Date: Thu, 04 Apr 2002 19:55:24 GMT
Links: << >> << T >> << A >>



Lasse Langwadt Christensen wrote:


>>
> 
> you can definately put an ARM in an FPGA the last project I worked on, I
> did 
> an a ASIC proto of a SoC with an ARM7-TDMI-S in a virtexE, rigth now I'm
> working 
> on something similar but in a virtex2, so it can hopefully get more of
> the 
> clock gating in the design working in the prototype. 


Clock gating for an asic design can be automatically converted to
enables in Certify with no source code changes.  This covers
flops, latches, memories (inferred or instantiated).

Ken McElvain CTO
Synplicity, Inc.


> 
> Size and performance will not be like a hard implementation, but for a
> prototype 
> that doesn't really matter as long as the performance is enough and the
> design 
> fits a chip you can buy. And if you need to there's things that could be
> changed 
> to bettter fit and fpga, so performance could be increased, but for a
> prototype 
> you don't what to do that unless you have to.   
>  
> But anyways, buying the source code for an ARM will probably cost you an
> arm ;) and a leg,
> 
> -Lasse
> -- Lasse Langwadt Christensen, 
> -- Aalborg, Danmark
>

Article: 41663
Subject: Re: powerpc in virtex2pro
From: Peter Alfke <peter.alfke@xilinx.com>
Date: Thu, 04 Apr 2002 11:55:45 -0800
Links: << >> << T >> << A >>

Austin answered the specific questions very well.
Please allow me to add some philosophical comments:

We are in the business of providing programmable solutions, but there is always
a temptation to add dedicated circuitry because it is smaller and faster and may
consume less power. We have to make agonizing choices, because any
specialization detracts from the universality, and any one of the special
circuits we add burdens each chip and must be paid for by every user, while it
may help only certain users or applications.

Over the years we have added global clocks, carry logic, BlockRAM, clock
management, lots of I/O standards, on-chip termination resistors, multipliers,
triple-DES decryption, and now also PowerPC and 3-gigabit SerDes dedicated
circuitry. Every one of these additions was made after carefully evaluating the
trade-offs between the dedicated area (cost) vs general usefulness. And we are
happy with our choices.

There is a long list of potential candidates that were rejected ( I was in favor
of adding a dedicated PCI interface, the the XC4000, which luckily was
rejected).

Some of our competitors have populated a graveyard (or at least a retirement
community) of commercially unsuccessful attempts to add excessive or poorly
executed specialization to programmable logic, and IMHO Excalibur with its
glued-on ARM and Mercury with its limited-speed incomplete dedicated clock
recovery may be headed in the same direction.

Whenever you add something costly, you should do it right, and don't leave the
job half completed!

Xilinx is obviously also adding dedicated circuitry, but only after very careful
consideration of the technical and economical trade-offs.
And it looks like we have been right in our choices so far.
But keep the suggestions coming.
We are listening!

Peter Alfke

Article: 41664
Subject: Re: hand placement
From: Kevin Brace <ihatespam84kevinbraceusenet@ihatespam84hotmail.com>
Date: Thu, 04 Apr 2002 12:53:46 -0800
Links: << >> << T >> << A >>

        I have seen one Xilinx employee in this newsgroup saying that
automatic P&R is getting better, so low level tools like floorplanner or
FPGA Editor is getting less important.
That can be true to some extent, but still, automatic P&R is so bad
that, when I have to reduce setup time (Tsu) of my PCI IP core, I still
have to rely on floorplanner.
In theory, I can route my design many times to improve the timings, but
typically, the improvement seems to end after the 10th routing, and
after that, things don't improve at all.
What I discovered through wasting lots of time routing my design
multiple times is that the problem of Xilinx or Altera's P&R tool is
that the tool doesn't place the timing critical LUTs and FFs in the
right place, or relevant LUTs and FFs within a CLB (in Xilinx) or a LAB
(in Altera).
Because the timing critical LUTs and FFs are placed physically so far
away from the destination (typically FFs), routing it multiple times
just won't save the design, because the path will have greater routing
delay inevitably.
That's when the designer has to force the placement to certain location
by using a floorplanner.
        If you are using ISE WebPACK, click on "View Floorplanner" after
you P&R your design.
Use UCF flow if you are using Floorplanner for the first time.
You should download the Xilinx Floorplanner manual before trying it, but
it only explains how the thing works, and it doesn't include anything
like a tutorial.
There is no tutorial available from Xilinx (I asked such a question
several months ago, but no one gave me a reply. It turns out, Xilinx
doesn't really have such a tutorial.), but if you are going to use
Floorplanner, and target Virtex architecture FPGAs including Spartan-II,
keep all relevant LUTs within a CLB because the routing delay within the
CLB is small.
Getting out of a CLB costs a lot in terms of routing delay, but still
the delay to a CLB horizontally adjacent to is still fairly small.
Another obvious advise will be that keep signal path distances to
minimum because greater the distance, more the routing delay.
        Also, weren't you looking for a low cost PCI card?
Insight Electronics recently released an upgraded version of the
Spartan-II PCI card, and the new one is a little more expensive ($225)
than the older one, but it has a bigger chip, and has more stuff on the
card.

http://www.insight.na.memec.com/cgi-bin/bvutf8/memec/scripts/local/mc_loc_b.jsp?Div=INSIGHT&Reg=AMERICAS&Country=UNITED_STATES&Lang=EN&EDOID=187428




Kevin Brace (In general, don't respond to me directly, and respond
within the newsgroup.)



Jimmy Zhang wrote:
> 
> Just keep hearing about this hand placement thing, don't know how it
> is done in reality. Does someone actually use their hands to do the
> placement as opposed to CAD based P&R. Any hints?
> 
> --
> -----------------------------------------------------
> Click here for Free Video!!
> http://www.gohip.com/freevideo/

Article: 41665
Subject: Re: hand placement
From: "Steve Casselman" <sc.nospam@vcc.com>
Date: Thu, 04 Apr 2002 21:14:15 GMT
Links: << >> << T >> << A >>

The way the placer works is to a random placement. Then it takes a nets
(usually in alphabetical order) and estimates the wire distance to all the
pins it is connected to. This is the "cost" of each net. Then it takes two
components (luts or flops)and swaps them. If the cost is lower it keeps the
swap otherwise it doesn't. The placement can be really improved just by
placing a few luts or flops. The placed components act like "attractors" for
the rest of the components connected it. I had a chance to look at the old
ppr code. I was able to speed the cost function by 9.8x by putting the
function in hardware.

Steve

"Nicholas Weaver" <nweaver@CSUA.Berkeley.EDU> wrote in message
news:a8i8mh$n57$1@agate.berkeley.edu...
> In article <a8i7t9$c1b$1@newsreader.mailgate.org>,
> Kevin Brace  <ihatespam84kevinbraceusenet@ihatespam84hotmail.com> wrote:
> >        I have seen one Xilinx employee in this newsgroup saying that
> >automatic P&R is getting better, so low level tools like floorplanner or
> >FPGA Editor is getting less important.
>
> Everytime Xilinx/Altera/Tool people say this, I have to laugh.
>
> There is so much low hanging fruit in datapath recognition, which the
> tools fail MISERABLY to recognise.  A simple first order pass, align
> up the datapath, can be such a win.
> --
> Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Article: 41666
Subject: Re: powerpc in virtex2pro
From: Austin Lesea <austin.lesea@xilinx.com>
Date: Thu, 04 Apr 2002 13:26:47 -0800
Links: << >> << T >> << A >>

Nicholas,

Just one minor point:  the 405ppc has its own caches (16K for data, and 16K for
instructions) so you can execute quite a bit right out of that without ever using a
BRAM.

Austin

Nicholas Weaver wrote:

> Additional view from a computer architect type:
>
> In article <3CAC80DE.40E4D721@xilinx.com>,
> Austin Lesea  <austin.lesea@xilinx.com> wrote:
> >> 1) How come there isn't a dedicated DDR interface on the chip. I've never
> >> seen a PPC application that didn't require DRAM, a dedicated interface
> >> would be cheaper and higher performing than using valuable CLBs to build
> >> a soft interface. (If I'm mistaken about the lack of a dedicated DDR
> >> interface please let me know, I didn't see any mention of one when I read
> >> the spec).
> >
> >DDR is built out of the DDR FF in the IOB's and logic in the FPGA.  DDR isn't
> >the only standard, and customers have many other applications.  DDR is neat,
> >but too specific.
>
> There is also a design pholosophy (which I can agree with for some
> uses, can't for others here) that only the minimally useful set should
> be implemented, because that is the cheapest and useable by the most
> people.
>
> A dedicated DDR SDRAM interface would be very nice, but that would
> consume a couple mm^2 of silicon, which is only usable by those who
> are going to plunk down a DDR interface, on a specific set of pins.
>
> >> 2) I don't see the need for putting four processors on a die. In almost
> >> all cases a single 405 should be adequate, in a few case you could make
> >> good use of two but I don't think that you would ever need four. There
> >> should have been a wider choice of parts with a single 405 core.
> >
> >We just don't know how customers will use all of this power.  If 405ppc's are
> >'free', you can use one executing out of internal cache to handle the "error
> >404", and another running off internal cache to monitor QOS, etc.
> >
> >When electric motors were very expensive, a machine shop had one, and leather
> >belts to every tool station.  When fractional horsepower motors became
> >inexpensive and ubiquitous, they were used everywhere, with no thought.
> >
> >If 405ppc are everywhere, you may dedicate them to tasks that seem horribly
> >inefficient if you continue to think in terms of the one big expensive monster
> >processor.
>
> And processors these days, for a simple core, are INCREDIBLY cheap,
> especially this one:
>
> It has no memory (those are the BlockRAMs), only the register file,
> datapath, and control logic.
>
> Even in synthesis, discounting the register file and caches, a 5 stage
> SPARC uP core takes 1.3mm x .85mm in a .18uM process.  The caches, out
> of 4 1024x32b memories, are almost as big as the core itself!
> http://www.eecg.toronto.edu/~pagiamt/research/leon.html
>
> So in the area of about ~8-10 Virtex 2 BlockRAMs (1024x18b memories),
> you can fit a SYNTHESIZED sparc core (without a hardware
> multiplier/divider or MMU).  I suspect that the Virtex 2 PPC core is
> even smaller, but with most of the actual area being the interfacing
> of the core to everything else.
>
> I'd love to get my hands on an XC2VP4 or larger die or die photo, just
> to verify these hunches about area in more detail.
>
> But according to the datasheet, the XC2VP2 uses 4 columns, 4 high of
> BlockRAMS, with the top and bottom of the center columns replaced with
> the RocketIO transecivers, so a pitch of 4 clb slices/BlockRAM.
>
> The XC2VP4 uses 4 columns, 10 high (its a 40x22 instead of a 16x22
> array) and has 28 BlockRAMs, so 8 BlockRAMs are replaced for the PPC
> core, and 128 CLBs (500 slices) of logic.  This is pretty CHEAP!
>
> If you have a low time critical function (EG, one which takes a fair
> path-length, but isn't necessarily pipeline-every-cycle), if you can
> replace just 128 CLBs with the use of the processor core, you've won,
> bigtime.  So my assumption here is the 8 BlockRAMs of area are
> replaced with the uP core, with the rest going to a heck of a lot of
> interface logic.
>
> >> 4) On chip Flash RAM would be useful. An embedded PPC is going to require
> >> some Flash. Also it would be nice if the serial Flash RAM were on chip,
> >> I bet every one is sick of the extra part that most Xilinx designs
> >> require.
> >
> >Flash requires a process that is usually two years behind the leading
> >process.  To do a flash capable FPGA would be to be obsolete on day 1 of the
> >introduction.  Not very exciting.
>
> The only way I could conceive of their being Flash on the die is some
> fancy packaging, eg, a chip-up smaller flash chip bonded to internal
> pads on a chip down larger part.  And do you REALLY want to spend an
> extra $20 just to reduce your part count from 2 to 1, and save 16-30
> external pins?
>
> >> 6) This is a Virtex II issue, not just a Virtex II Pro issue. How about
> >> offering versions of the Virtex II without the on board multipliers. The
> >> multipliers make sense for DSP applications but they are a waste of money
> >> and power for everything else. In my 12 years doing Xilinx designs I have
> >> never needed a multiplier. I've frequently needed a CAM so I wouldn't
> >> mind a few CAMs on board, but I'd rather have a cheaper part without the
> >> multiplers.
> >
> >Well, they take up a tiny amount of area, so the cost savings is washed out
> >completely by having to make two parts, with lower volumes in each.
>
> And, as Ray Andraka has pointed out, a multiplier makes a great
> shifter as well.  A variable shift is suprisingly expensive in an FPGA
> fabric: there are a lot of muxes, but it is an operation that is
> suprisingly common.
>
> An 18x18 multiplier can implement an 18 bit variable rotation with
> just 18 LUTs worth of logic to deincode the shift amount, and an
> additional 18 LUTs worth of logic if you want to make it a left
> shift/rotate, an additional 36 LUTs worth if you want to make a
> variable left/right shift.
>
> The multiplier blocks are an example of something which IS very
> common.
> --
> Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Article: 41667
Subject: Re: powerpc in virtex2pro
From: "Steve Casselman" <sc.nospam@vcc.com>
Date: Thu, 04 Apr 2002 22:00:49 GMT
Links: << >> << T >> << A >>

I have to disagree that a part with dedicated pins is a net loss for Xilinx.
For example my patent http://www.delphion.com/details?pn=US06178494__
suggests that it might be useful to have a part that can be inserted into a
pre-existing socket. For example if there were a part that fit into the
second slot in of a Pentium system there is a good chance you could sell
millions and millions of them.

Steve


"Peter Alfke" <peter.alfke@xilinx.com> wrote in message
news:3CACAFC2.140DBBFD@xilinx.com...
> Austin answered the specific questions very well.
> Please allow me to add some philosophical comments:
>
> We are in the business of providing programmable solutions, but there is
always
> a temptation to add dedicated circuitry because it is smaller and faster
and may
> consume less power. We have to make agonizing choices, because any
> specialization detracts from the universality, and any one of the special
> circuits we add burdens each chip and must be paid for by every user,
while it
> may help only certain users or applications.
>

Article: 41668
Subject: Re: powerpc in virtex2pro
From: nweaver@CSUA.Berkeley.EDU (Nicholas Weaver)
Date: Thu, 4 Apr 2002 22:05:49 +0000 (UTC)
Links: << >> << T >> << A >>

In article <3CACC517.AC390FDE@xilinx.com>,
Austin Lesea  <austin.lesea@xilinx.com> wrote:
>Nicholas,
>
>Just one minor point:  the 405ppc has its own caches (16K for data, and 16K for
>instructions) so you can execute quite a bit right out of that without ever using a
>BRAM.

OK.  That makes even more sense (i shoulda noticed something was
wrong), because otherwise it would take a HELL of a lot of interface
logic to occupy 128 CLBs worth of logic.

In any case, the assertion is:  A uP is small.  Including a fair
number of them in a large FPGA is rather low cost.

-- 
Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Article: 41669
Subject: Re: hand placement
From: Peter Alfke <peter.alfke@xilinx.com>
Date: Thu, 04 Apr 2002 14:12:38 -0800
Links: << >> << T >> << A >>

I argued ten years ago, and I am still convinced:

The human brain is better than any computer in recognizing the underlying
structure ( and thus drive some basic hand placement).

But a computer is much better at the tedious job of routing.
That's why routers have become very good, but the placer is still the problem
child. And a bad placement is very difficult to remedy later.

Peter Alfke

Article: 41670
Subject: Re: powerpc in virtex2pro
From: nweaver@CSUA.Berkeley.EDU (Nicholas Weaver)
Date: Thu, 4 Apr 2002 22:16:53 +0000 (UTC)
Links: << >> << T >> << A >>

In article <k24r8.1447$Jl4.914143265@newssvr13.news.prodigy.com>,
Steve Casselman <sc.nospam@vcc.com> wrote:
>I have to disagree that a part with dedicated pins is a net loss for Xilinx.
>For example my patent http://www.delphion.com/details?pn=US06178494__
>suggests that it might be useful to have a part that can be inserted into a
>pre-existing socket. For example if there were a part that fit into the
>second slot in of a Pentium system there is a good chance you could sell
>millions and millions of them.

However, the only consistant dedicated pins NEEDED are power and
ground.  Otherwise, the joys of reconfiguration, as long as the
reconfigurable logic is fast enough, you can match the interface.

Also, any dedicated circuitry is much harder to test, as it adds
irregularities which need to be tested.
-- 
Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Article: 41671
Subject: Re: hand placement
From: "Tim" <tim@rockylogic.com.nooospam.com>
Date: Thu, 4 Apr 2002 23:58:18 +0100
Links: << >> << T >> << A >>

Steve Casselman wrote:
>                                     I had a chance to look at the old
> ppr code. I was able to speed the cost function by 9.8x by putting the
> function in hardware.

Sounds interesting.  What did you do?

Article: 41672
Subject: Re: ACEX maximal clock...
From: kayrock66@yahoo.com (Jay)
Date: 4 Apr 2002 15:03:45 -0800
Links: << >> << T >> << A >>

I think you're in the ballpark ay 80MHz.  Why don't you just design
the circuit and let Max-Plus/Quartus tell you the answer to your
query.  However, the low density/high speed design you are describing
might be better suited to a CPLD architecture.

Regards,
Jay

"S?awomir Balon" <antyspam.bsl@post.pl> wrote in message news:<a8gvvm$8q8$1@news.tpi.pl>...
> >It depends on what you're trying to do with your clock, need to supply
> >more detail...
> 
> ok, i'm planning to use it for aquiring datas from two 8 bit flash adc
> (AD9057) clocked at 80MHz both but clocks are shifted by 180deg in phase
> (effectivelly 160MHz) will APEX -3 be fast enough to work with, or should i
> use a -2 device (this data will be stored in fast 16 bit SRAM).
> 
> regards
> Slawek

Article: 41673
Subject: Re: hand placement
From: "Steve Casselman" <sc.nospam@vcc.com>
Date: Thu, 04 Apr 2002 23:39:55 GMT
Links: << >> << T >> << A >>

I took the cost function put it in hardware and ran the database past it
several times. The cost function accounted for 30% of the placer
performance. That part of the placer took about 1/3 of a xc4010. From my
analysis I concluded that ppr could be speed up by 10x and would take about
50K gates. This holds to the normal 90/10 rule. Of course Xilinx was moving
over to par at the time and they concluded that they didn't need the
speedup. After spending a lot of time with the code I'm convinced that  P&R
is a sure bet for acceleration. Now with the PPC and Virtex II I'm sure that
over all speedups of 8-10x would be pretty straight forward. I estimate
about 2 man years of work and a design with 4-8 gig on board would do it.

Steve

"Tim" <tim@rockylogic.com.nooospam.com> wrote in message
news:1017961909.26884.0.nnrp-01.9e9832fa@news.demon.co.uk...
> Steve Casselman wrote:
> >                                     I had a chance to look at the old
> > ppr code. I was able to speed the cost function by 9.8x by putting the
> > function in hardware.
>
> Sounds interesting.  What did you do?
>
>
>
>

Article: 41674
Subject: Re: hand placement
From: Ray Andraka <ray@andraka.com>
Date: Fri, 05 Apr 2002 00:25:58 GMT
Links: << >> << T >> << A >>

We've heard the "place and route is good enough you don't need to do floorplanning unless you are doing the 1% designs from hell" line for as far back
as I can remember from Xilinx.  Fact is, floorplanning seems to be getting larger gains, not smaller, with the new devices.  I typically see 50-70%
performance improvement over a automatic placement.

Routing multiple times without running placement is not going get much in the way of performance gains.  The router does a pretty decent job if the
placement is good, and can't do much to salvage a poor placement.

Xilinx, as a company, promotes not using the floorplanner probably to avoid a feeling that the devices are more difficult to design in (which is not
the case, in fact the ability to improve performance and density through floorplanning is a big plus).  Floorplanning has always been the closet case,
and from the looks of it will continue to be.  Therefore you get poor documentation, lots of bugs, and very low priority on getting the bugs fixed
compared with the rest of the software package.  Until floorplanning becomes a mainstream design event, I doubt it will ever be anything more than the
poor cousin no one will admit to having.  Unfortunately, the mainstream doesn't use it because a) they are told they don't need it*, that it is only
there for the FAEs to get you out of trouble in special cases, b) They don't know the benefits because those are not told to them and the tool is not
easy to learn without doing it alot (and living with numerous bugs), and c) Even if someone convinces them to use it, the documentation is next to
useless as far as learning how to floorplan.  Part of the problem is that floorplanning is sort of like putting together a puzzle with many acceptable
solutions.  Some people have the knack for it, some don't and if you don't you will probably not inherit it ever.

Kevin Brace wrote:

>         I have seen one Xilinx employee in this newsgroup saying that
> automatic P&R is getting better, so low level tools like floorplanner or
> FPGA Editor is getting less important.
> That can be true to some extent, but still, automatic P&R is so bad
> that, when I have to reduce setup time (Tsu) of my PCI IP core, I still
> have to rely on floorplanner.
>

and so on...

>
> > -----------------------------------------------------
> > Click here for Free Video!!
> > http://www.gohip.com/freevideo/

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search