Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search

Messages from 90725

Article: 90725
Subject: Re: MAC Architectures
From: Tim Wescott <tim@seemywebsite.com>
Date: Wed, 19 Oct 2005 10:42:41 -0700
Links: << >>  << T >>  << A >>
Pramod Subramanyan wrote:

> Tim Wescott wrote:
> 
>>Jeorg's question on sci.electronics.design for an under $2 DSP chip got
>>me to thinking:
>>
>>How are 1-cycle multipliers implemented in silicon?  My understanding is
>>that when you go buy a DSP chip a good part of the real estate is taken
>>up by the multiplier, and this is a good part of the reason that DSPs
>>cost so much.  I can't see it being a big gawdaful batch of
>>combinatorial logic that has the multiply rippling through 16 32-bit
>>adders, so I assume there's a big table look up involved, but that's as
>>far as my knowledge extends.
>>
> 
> 
> There's no lookup table. Its just a BIG cascade of and's. This might
> help:
> 
> http://www2.ele.ufes.br/~ailson/digital2/cld/chapter5/chapter05.doc5.html
> 
Interesting.  So that's what they actually do in practice, just copy a 
page out of a textbook?  Wouldn't the stages of adders really cause a 
speed hit?  To have your signal ripple through so many stages would 
require you to slow your clock way down from what it could be otherwise 
-- it seems an odd way to build a chip who's purpose in life is to be 
really fast while doing a MAC.
> 
>>Yet the reason that you go shell out all the $$ for a DSP chip is to get
>>a 1-cycle MAC that you have to bury in a few (or several) tens of cycles
>>worth of housekeeping code to set up the pointers, counters, modes &c --
>>so you never get to multiply numbers in one cycle, really.
>>
>>How much less silicon would you use if an n-bit multiplier were
>>implemented as an n-stage pipelined device?  If I wanted to implement a
>>128-tap FIR filter and could live with 160 ticks instead of 140 would
>>the chip be much smaller?
>>
> 
> I think this would lead to lousy performance on small loops - such as
> those found in JPEG encoding.
> 
Good point.  Yes it would, unless you used some fancy pipelining to keep 
the throughput up (which would probably require a fancy optimizer to let 
humans write fast code).
> 
>>Or is the space consumed by the separate data spaces and buses needed to
>>move all the data to and from the MAC?  If you pipelined the multiplier
>>_and_ made it a two- or three- cycle MAC (to allow time to shove data
>>around) could you reduce the chip cost much?  Would the amount of area
>>savings you get allow you to push the clock up enough to still do audio
>>applications for less money?
> 
> 
> Quite a lot of the chip cost depends on the design complexity and the
> amount of time and money spent in R&D, not to mention the quantity of
> chips the company hopes to sell, so its not a direct proportional
> relation between cost and size of chip. If you're trying to save money,
> you could try using a fast general purpose microcontroller instead of a
> DSP.
> 
Yet DSP chips cost tons of money, which disappoints Jeorg who designs 
for high-volume customers who are _very_ price sensitive.  The question 
was more a hypothetical "what would Atmel do if Atmel wanted to compete 
with the dsPIC" than "should I have a custom chip designed for my 
10-a-year production cycle".
> 
>>Obviously any answers will be useless unless somebody wants to run out
>>and start a chip company, but I'm still curious about it.
>>
> 


-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Article: 90726
Subject: Re: MAC Architectures
From: Bob Monsen <rcsurname@comcast.net>
Date: Wed, 19 Oct 2005 11:02:08 -0700
Links: << >>  << T >>  << A >>
On Wed, 19 Oct 2005 10:42:41 -0700, Tim Wescott wrote:

> Yet DSP chips cost tons of money, which disappoints Jeorg who designs 
> for high-volume customers who are _very_ price sensitive.

Actually, I believe that prices have little to do with cost, particularly
in high volume, low material cost items like ICs. This is true until the
item has become a commodity, where anybody can make it. At that point,
market factors start to bring the prices down. Until that point, pricing
is more closely related to the cost of what the item replaces.

In the case of DSP chips, the replacement is a traditional microprocessor,
with its fast external memory, PC design and debug time, etc.

So, cutting down on the silicon area won't help prices; it'll just
increase the profits of the chipmakers. What helps prices is stiff, fair
competition, and lots of it. So, chip makers try to differentiate their
designs, making it hard to 'jump ship' and head off in new directions,
thus keeping a a particular group of users a 'captive audience'. Once
standardization sets in, they are doomed to compete.

---
Regards,
  Bob Monsen

Let us grant that the pursuit of mathematics is a divine madness of the 
human spirit.
- Alfred North Whitehead


Article: 90727
Subject: Re: which is Low power FPGA?
From: Jim Granville <no.spam@designtools.co.nz>
Date: Thu, 20 Oct 2005 07:58:05 +1300
Links: << >>  << T >>  << A >>
himassk wrote:
> I could nt make out the best low power FPGA among the Spartan3, Cyclone
> II and Lattice FPGAs.
> 
> Regards,
> Himassk

  You need to work it out for your application:
using the vendors mA(Static) and mA/MHz numbers, and also determine if
Typical, or Worst case matters most to you.
  Actual measurements will probably be needed as reality checks.

  Not all these numbers are clearly documented, and with everyone 
claiming to be lowest power, your task will not be easy.

  In particular, Look for what they do NOT say.

example: I see lattice make big claims for 4000Z static Icc over
coolrunner 2, but are very quiet on mA/MHz.
  Perhaps that number does not stack up as well ?

-jg




Article: 90728
Subject: Re: Implementation of 1024 point FFT in Actel FPGA
From: Thomas Womack <twomack@chiark.greenend.org.uk>
Date: 19 Oct 2005 20:09:27 +0100 (BST)
Links: << >>  << T >>  << A >>
In article <1129715750.010881.22720@g47g2000cwa.googlegroups.com>,
cisivakumar <cisivakumar@gmail.com> wrote:
>Hai,
>
>   I want to do the main project as Implementation of 1024 point FFT in
>Actel FPGA.I have to find a new frequency identification algorithm
>other than Fast Fourier Transform.Please give valuable notes,codes and
>suggestions for successully completing this project.

Mr Sivakumar,

I'm not quite sure what stage in your studies you're at -- I'm going
to guess this is a project for the last year of a university degree in
electronic engineering.  Does the university give a list of titles for
possible projects and ask you to pick one?

I think your career prospects would be more improved by taking courses
in how to write formal English than by finishing this project.  What
you've written comes across as very terse and rather rude; you make no
indication that you've put work into the project yourself before
asking people to help you.  "Please give" is a perfunctory request; a
more humble form like "could anyone offer me" might well make people
more keen to reply.

I _think_ that any method for identifying frequencies has to be
equivalent to the Fourier transform, basically by the definition of
'frequency'.  You could use the definition of frequency as
repetitiveness, and compute autocorrelations (pick the value of a to
maximise sum f(N) f(N+a)) but the efficient way of computing
autocorrelations is via the FFT again.

Tom


Article: 90729
Subject: Re: MAC Architectures
From: David Brown <david@westcontrol.removethisbit.com>
Date: 19 Oct 2005 21:45:04 +0200
Links: << >>  << T >>  << A >>
Tim Wescott wrote:
> Jeorg's question on sci.electronics.design for an under $2 DSP chip got 
> me to thinking:
> 
> How are 1-cycle multipliers implemented in silicon?  My understanding is 
> that when you go buy a DSP chip a good part of the real estate is taken 
> up by the multiplier, and this is a good part of the reason that DSPs 
> cost so much.  I can't see it being a big gawdaful batch of 
> combinatorial logic that has the multiply rippling through 16 32-bit 
> adders, so I assume there's a big table look up involved, but that's as 
> far as my knowledge extends.
> 

Single-cycle multipliers in small microcontrollers are frequently 8x8, 
which is obviously much easier.  The chip mentioned, the msp430, does 
16x16, but it is not actually single-cycle (as far as I remember).  The 
other big difference compared to expensive DSPs is the speed - it is a 
lot easier to do 16x16 in a single cycle at 8 MHz (the top speed of the 
current msp430's) than at a few hundred MHz (for expensive DSPs).

Article: 90730
Subject: Re: which is Low power FPGA?
From: "Peter Alfke" <peter@xilinx.com>
Date: 19 Oct 2005 14:03:32 -0700
Links: << >>  << T >>  << A >>
When you are interested in low power, you must analyze static and
dynamic power separately.
Static power (when the chip is powered up, but not being clocked) used
to be very low, but is significant in the newer smaller-geometry
technology. Static = leakage current depends strongly on temperature.
Make sure you look at both 25 degree and 85 degree values.
Some manufacturers conveniently underreport by mentioning 25 degree
values only :-(

Dynamic power comes from the charging and discharging of capacitances,
and depends therefore on the clock rate of all the various nodes inside
the chip and the I/O. It is proportional to clock frequency, and
increases with the square of the supply coltage (the current increases
linearily, the power obviously with the square of Vcc).
Newer technology reduces internal capacitances, thus use less power at
the same functionality and speed.
But functionality as well as clock rate often increases, and power
might therefore go up anyhow.

Peter Alfke, Xilinx Applications


Article: 90731
Subject: Re: Best Async FIFO Implementation
From: "raul" <raulizahi@gmail.com>
Date: 19 Oct 2005 14:04:38 -0700
Links: << >>  << T >>  << A >>
For simulation, are the Xilinx FIFO models any faster than before?
Just recently I had to write fully-synchronous FIFO models to
accelerate the simulations and achieved 100X (one hundred times)
improvement.

RAUL


Article: 90732
Subject: Re: MAC Architectures
From: Bevan Weiss <kaizen__@NOSPAMhotmail.com>
Date: Thu, 20 Oct 2005 10:08:56 +1300
Links: << >>  << T >>  << A >>
Tim Wescott wrote:
> Pramod Subramanyan wrote:
> 
>> Tim Wescott wrote:
>>
>>> Jeorg's question on sci.electronics.design for an under $2 DSP chip got
>>> me to thinking:
>>>
>>> How are 1-cycle multipliers implemented in silicon?  My understanding is
>>> that when you go buy a DSP chip a good part of the real estate is taken
>>> up by the multiplier, and this is a good part of the reason that DSPs
>>> cost so much.  I can't see it being a big gawdaful batch of
>>> combinatorial logic that has the multiply rippling through 16 32-bit
>>> adders, so I assume there's a big table look up involved, but that's as
>>> far as my knowledge extends.
>>>
>>
>>
>> There's no lookup table. Its just a BIG cascade of and's. This might
>> help:
>>
>> http://www2.ele.ufes.br/~ailson/digital2/cld/chapter5/chapter05.doc5.html
>>
> Interesting.  So that's what they actually do in practice, just copy a 
> page out of a textbook?  Wouldn't the stages of adders really cause a 
> speed hit?  To have your signal ripple through so many stages would 
> require you to slow your clock way down from what it could be otherwise 
> -- it seems an odd way to build a chip who's purpose in life is to be 
> really fast while doing a MAC.

It's much much harder than just copying a page out of a textbook. 
There's small optimizations that depend strongly on data distributions, 
etc etc.  Even before the designer can begin laying out the multiplier, 
which is pretty much the hardest part, they have to work out whether it 
has the characteristics required.

As an example I recently designed a 4bit*4bit multiplier as a class 
project.  It's much harder than many people realise to do, and it's 
complexity grows exponentially (in most cases) to the input bit width.

Sometimes it may be as simple as laying down a standard multiplier block 
(from one of many IP libraries around) however in most DSPs this will be 
the critical timing path for single cycle operation and so must be hand 
modified to produce acceptable path delays, then assessed under all 
conditions.

Certainly not a lookup table, that would indeed be simply copying from a 
book, and would also require (2^(2*N))*N/4 bytes of storage.  For 
anything but small N this would be enormous, and not very efficient in 
terms of chip real estate.

As an aside, the other members of my class implemented their multipliers 
  in a pipeline configuration, whilst I did mine in a completely 
parallel configuration (with ripple adder as high speed wasn't a design 
consideration).  This means that others had 2/3/4 cycle latencies whilst 
mine was a single cycle.  The trade-off is that the upper frequency of 
mine was more limited than was their's due to the increased path delays.

Getting single cycle high speed multipliers is a very challenging 
prospect, and one which much research is still ongoing.

You should have a go at making up a simple 3bit*3bit multiplier using 
single transistors on a PCB sometime.. it's quite similar to the layout 
flow used in IC design.

Article: 90733
Subject: Re: MAC Architectures
From: "Peter Alfke" <peter@xilinx.com>
Date: 19 Oct 2005 14:14:31 -0700
Links: << >>  << T >>  << A >>
Newer FPGAs have lots of fast 18 x 18 multipliers.
The humble XC4VSX25 has, among other goodies, 128 such multipliers
running at max 500MHz single-cycle rate.
The mid-range SX35 has 192, and the top SX55 has 512 such fast 18 x 18
multipliers each with its associated 48-bit accumulator structure.  We
invite you to keep that kind of arithmetic performance busy...  No
wonder these FPGAs can outperform sophisticated and expensive DSP
chips.

Peter Alfke, Xilinx


Article: 90734
Subject: Re: MAC Architectures
From: langwadt@ieee.org
Date: 19 Oct 2005 14:17:19 -0700
Links: << >>  << T >>  << A >>

Tim Wescott skrev:

> Pramod Subramanyan wrote:
>
snip
> >
> > http://www2.ele.ufes.br/~ailson/digital2/cld/chapter5/chapter05.doc5.html
> >
> Interesting.  So that's what they actually do in practice, just copy a
> page out of a textbook?  Wouldn't the stages of adders really cause a
> speed hit?  To have your signal ripple through so many stages would
> require you to slow your clock way down from what it could be otherwise

afair the delay for the straight forward N*N bit parallel multiplier is

only around double the delay of a N bit adder, i.e. the longest path in
the multiplier is lsb to msb plus top to bottom

> -- it seems an odd way to build a chip who's purpose in life is to be
> really fast while doing a MAC.

I think its more likely that they look at different options and find
the
smallest that is fast enough ;)

have a look at http://www.andraka.com/multipli.htm

> >
snip

> Yet DSP chips cost tons of money, which disappoints Jeorg who designs
> for high-volume customers who are _very_ price sensitive.  The question
> was more a hypothetical "what would Atmel do if Atmel wanted to compete
> with the dsPIC" than "should I have a custom chip designed for my
> 10-a-year production cycle".

I'm not sure the size of the multiplier makes a big difference, my
guess
is that if you look at the die you would see that most of it is memory


what price are you looking for?, how much memory?, how fast?

Not that I will build you one, but I'm curious :)

-Lasse


Article: 90735
Subject: Re: which is Low power FPGA?
From: Ray Andraka <ray@andraka.com>
Date: Wed, 19 Oct 2005 17:31:57 -0400
Links: << >>  << T >>  << A >>


none of them are really, however you can run the later ones at a 
considerably reduced voltage to achieve dramatic power savings at the 
cost of slower operation.

-- 
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759



Article: 90736
Subject: Re: Implementation of 1024 point FFT in Actel FPGA
From: Jon Elson <jmelson@artsci.wustl.edu>
Date: Wed, 19 Oct 2005 17:01:06 -0500
Links: << >>  << T >>  << A >>


cisivakumar wrote:

>Hai,
>
>   I want to do the main project as Implementation of 1024 point FFT in
>Actel FPGA.I have to find a new frequency identification algorithm
>other than Fast Fourier Transform.Please give valuable notes,codes and
>suggestions for successully completing this project.
>  
>
You might look up Hadamard transforms.  I never was able to find any
references to the Van Geen transform, but there was an article in about 1965
in the ARRL magazine by a John (I think) Van Geen who built an amazingly
simple frequency detection analyzer using simple digital circuits.  His ADC
was taken to the degenerate form of 1 bit (a comparator).  I think some
later "audio spectrum analyzer" modules used his technique, so it ought to
be written up somewhere.

Jon


Article: 90737
Subject: Re: Best Async FIFO Implementation
From: "Peter Alfke" <peter@xilinx.com>
Date: 19 Oct 2005 15:34:19 -0700
Links: << >>  << T >>  << A >>
Simulating asynchronous clocking must be very difficult and time
consuming (I dare not use the word "impossible" for fear of being
flamed). How do you cover all clock phase relationship, down to the
femtosecond level?  Synchronizers operate with that kind of timing
resolution.
Peter Alfke, speaking for himself.


Article: 90738
Subject: Re: MAC Architectures
From: Eric Jacobsen <eric.jacobsen@ieee.org>
Date: Wed, 19 Oct 2005 15:56:30 -0700
Links: << >>  << T >>  << A >>
On Wed, 19 Oct 2005 09:25:12 -0700, Tim Wescott <tim@seemywebsite.com>
wrote:

>Jeorg's question on sci.electronics.design for an under $2 DSP chip got 
>me to thinking:
>
>How are 1-cycle multipliers implemented in silicon?  My understanding is 
>that when you go buy a DSP chip a good part of the real estate is taken 
>up by the multiplier, and this is a good part of the reason that DSPs 
>cost so much.  I can't see it being a big gawdaful batch of 
>combinatorial logic that has the multiply rippling through 16 32-bit 
>adders, so I assume there's a big table look up involved, but that's as 
>far as my knowledge extends.
>
>Yet the reason that you go shell out all the $$ for a DSP chip is to get 
>a 1-cycle MAC that you have to bury in a few (or several) tens of cycles 
>worth of housekeeping code to set up the pointers, counters, modes &c -- 
>so you never get to multiply numbers in one cycle, really.
>
>How much less silicon would you use if an n-bit multiplier were 
>implemented as an n-stage pipelined device?  If I wanted to implement a 
>128-tap FIR filter and could live with 160 ticks instead of 140 would 
>the chip be much smaller?
>
>Or is the space consumed by the separate data spaces and buses needed to 
>move all the data to and from the MAC?  If you pipelined the multiplier 
>_and_ made it a two- or three- cycle MAC (to allow time to shove data 
>around) could you reduce the chip cost much?  Would the amount of area 
>savings you get allow you to push the clock up enough to still do audio 
>applications for less money?
>
>Obviously any answers will be useless unless somebody wants to run out 
>and start a chip company, but I'm still curious about it.

A while back when I was doing such things Wallace Trees and Booth
Multipliers were all the rage.   Doing a search on those turned up Ray
Andraka's page (no big surprise :)) which has a really good discussion
on alternatives.

Since then things have gotten even smaller and faster and, as someone
else pointed out, the FPGA companies now find it prudent to splatter
large numbers of very fast single-cycle multipliers around their parts
just because they can (and becuase they know people will use them).
I've no clue what they're doing there, but efficient single-cycle
multipliers have been around for a long time in various flavors.   I'm
sure they're not all the same.

Eric Jacobsen
Minister of Algorithms, Intel Corp.
My opinions may not be Intel's opinions.
http://www.ericjacobsen.org

Article: 90739
Subject: Re: Modelsim XE, what's the latest version?
From: bybell@rocketmail.com
Date: 19 Oct 2005 19:00:20 -0700
Links: << >>  << T >>  << A >>
|I must give the warning that gtkwave loads the whole vcd into memory
|and then some, so if you're opening 300MB VCD files, you really need
|to have at least 1 GB of memory (maybe more).  I know I have 768MB
|and it thrashed until I killed it.

FYI, with gtkwave you can use the converter tools in order to keep from
loading the whole VCD file into memory.  vcd2lxt, vcd2lxt2, or vcd2vzt
should do the trick.  Large files will load on small machines then as
they're only brought in as needed.

-t


Article: 90740
Subject: Re: using i2c core
From: "CMOS" <manusha@millenniumit.com>
Date: 19 Oct 2005 20:58:51 -0700
Links: << >>  << T >>  << A >>
if i do that there will be an output ( out put of the input Buffer )
without any connections. ( the IO buffer im talking about is made up of
one OBUF and one IBUF.).
is there a way to post images in this forum, so that i can post the
diagram?
CMOS


Article: 90741
Subject: Re: MAC Architectures
From: "Jon Harris" <jon99_harris7@hotmail.com>
Date: Thu, 20 Oct 2005 04:08:28 GMT
Links: << >>  << T >>  << A >>
"Tim Wescott" <tim@seemywebsite.com> wrote in message 
news:ELSdnRzy6pLQ7svenZ2dnUVZ_sydnZ2d@web-ster.com...
> Jeorg's question on sci.electronics.design for an under $2 DSP chip got me to 
> thinking:
>
> How are 1-cycle multipliers implemented in silicon?  My understanding is that 
> when you go buy a DSP chip a good part of the real estate is taken up by the 
> multiplier, and this is a good part of the reason that DSPs cost so much.  I 
> can't see it being a big gawdaful batch of combinatorial logic that has the 
> multiply rippling through 16 32-bit adders, so I assume there's a big table 
> look up involved, but that's as far as my knowledge extends.

In addition to the single-cycle MAC, there is also all the structure needed to 
keep that MAC busy, i.e. dual busses with single-cycle access to memory.

> Yet the reason that you go shell out all the $$ for a DSP chip is to get a 
> 1-cycle MAC that you have to bury in a few (or several) tens of cycles worth 
> of housekeeping code to set up the pointers, counters, modes &c -- 
> so you never get to multiply numbers in one cycle, really.

True, a fast MAC is most useful when you are doing a bunch of them in a row, 
like for example in a FIR filter.  But a fast multiply is pretty darn useful 
too, and often can be taken advantage of 1 or 2 at a time.

BTW, on a SHARC, the set-up is basically 3 cycles: set-up 2 pointers (typically 
one for data, one for coefs) and initialize your loop counter.  You may need to 
pre-fetch the data too, so that could add another cycle, or could be built into 
the main loop.  Not too bad, though.  It doesn't take too long of a filter, 
matrix, vector op, etc. to start paying dividends.
A idiosyncratic feature of the SHARC is that for fixed point, there is a true 
single-cycle MAC, whereas for floating point, you have a parallel multiply/add 
instruction, and you can't use the result of the multiply in the add.  At first 
glance it seems quite restritive, but in practice it just means one more stage 
of pipelining before you can rip off those single-cycle FIRs.  I'm guessing a 
single-cycle 40-bit floating-point MAC would have been the slowest instruction 
on the chip by a mile, and would have forced a much slower max clock rate.

> How much less silicon would you use if an n-bit multiplier were implemented as 
> an n-stage pipelined device?  If I wanted to implement a 128-tap FIR filter 
> and could live with 160 ticks instead of 140 would the chip be much smaller?

I don't think it would save that much total real estate. I'm basing this on the 
fact that Analog Devices started including two complete multipliers and ALUs in 
all their new SHARCs.  Or maybe with die shrinks, silicon area is not such a big 
deal?  When you can take advantage of the second unit, you can get FIRs in 1/2 
cycle per tap, or parallel operation on a second data stream "for free"--nice!

> Or is the space consumed by the separate data spaces and buses needed to move 
> all the data to and from the MAC?  If you pipelined the multiplier _and_ made 
> it a two- or three- cycle MAC (to allow time to shove data around) could you 
> reduce the chip cost much?  Would the amount of area savings you get allow you 
> to push the clock up enough to still do audio applications for less money?

My guess is that going to a 2-stage multiplier would not allow you to get you 
anywhere near twice the clock frequency.  On most DPSs, it's not just the MAC 
but every other instruction is also single cycle, so unless you can get them all 
to run 2x, you can't double the clock rate.  The MAC is probably the slowest 
path, but I would guess there are a lot of other "close seconds".  I've often 
wanted to see data from a DSP manufacturer on the speed of different 
instructions, just out of curiosity.  It would also allow you to determine if 
you could reliably overclock a chip based on what instructions your application 
used. 



Article: 90742
Subject: Re: Best Async FIFO Implementation
From: "Alex Shot" <alexshot@gmail.com>
Date: 20 Oct 2005 00:07:05 -0700
Links: << >>  << T >>  << A >>
Peter,
Let me correct you. The FIFO depth is defined as 2047. The memory depth
is 2048. If full indication goes high with delay of 1 write clock, the
new fifo is stored into the extra cell (let say 2048). No data is lost.


Article: 90743
Subject: to write the driver for my own ip core
From: Athena <lnzhao@emails.bjut.edu.cn>
Date: Thu, 20 Oct 2005 00:41:28 -0700
Links: << >>  << T >>  << A >>
Hi All,

I have writen an IP core which is used for implementing an algorithm. Now I have to write the driver for my IP core which mainly transfers the data between the ip core and the plb bus.

But I don't know how to write a driver for an IP CORE. I don't have any related materials.

Who has the experiences of writing the drivers for your own IP core or has some documents about it?

Please help me.

Thank you very much.

Article: 90744
Subject: Re: How to speed up the critical path (Xilinx)
From: "Eric DELAGE" <delage.eric@gmail.com>
Date: 20 Oct 2005 01:09:21 -0700
Links: << >>  << T >>  << A >>
> I would be happy about some suggestions on how I could start to make my
> design faster.

How much faster? Logic optimization tricks won't make you gain more
than ... let's say 10% (providing that you already use retiming
technics and that they give the best possible result).

Eric


Article: 90745
Subject: Re: which is Low power FPGA?
From: "jerzy.gbur@gmail.com" <jerzy.gbur@gmail.com>
Date: 20 Oct 2005 01:19:13 -0700
Links: << >>  << T >>  << A >>
OK, you right, you don't have to scream :)
But himassk asked about S3L, and I gave him answer about it.

regards.
Jerzy Gbur


Article: 90746
Subject: Re: to write the driver for my own ip core
From: =?ISO-8859-1?Q?Johan_Bernsp=E5ng?= <xjohbex@xfoix.se>
Date: Thu, 20 Oct 2005 10:25:31 +0200
Links: << >>  << T >>  << A >>
Athena wrote:
> Hi All,
> 
> I have writen an IP core which is used for implementing an algorithm. Now I have to write the driver for my IP core which mainly transfers the data between the ip core and the plb bus.
> 
> But I don't know how to write a driver for an IP CORE. I don't have any related materials.
> 
> Who has the experiences of writing the drivers for your own IP core or has some documents about it?
> 
> Please help me.
> 
> Thank you very much.

Well, since you are going to write drivers for cores connected to the 
PLB I suspect that you do have the EDK. There is a 'Device Driver 
Programming Guide' in the Processor IP Reference Guide which gives you 
the basic knowledge. You also have quite a few drivers in the 
./EDK/sw/XilinxProcessorIPLib/drivers catalog of your EDK installation. 
I've used them as examples when I have learned to write device drivers 
for EDK. The included drivers in the EDK also gives you an idea on how 
to write the .mdd and .tcl files.

Good luck.

cheers!

-- 
-----------------------------------------------
Johan Bernspång, xjohbex@xfoix.se
Research engineer

Swedish Defence Research Agency - FOI
Division of Command & Control Systems
Department of Electronic Warfare Systems

www.foi.se

Please remove the x's in the email address if
replying to me personally.
-----------------------------------------------

Article: 90747
Subject: Re: Rosetta Results
From: Martin Thompson <martin.j.thompson@trw.com>
Date: 20 Oct 2005 11:46:06 +0100
Links: << >>  << T >>  << A >>
Austin Lesea <austin@xilinx.com> writes:

> Martin,
> 
> http://www.xilinx.com/xlnx/xweb/xil_tx_display.jsp?sSecondaryNavPick=&iLanguageID=1&multPartNum=1&category=&sTechX_ID=al_v4rse&sGlobalNavPick=&BV_SessionID=@@@@1217042584.1129733086@@@@&BV_EngineID=cccgaddfmiggfdlcefeceihdffhdfkf.0
> 
> is the long URL.
> 

Thanks
<snip>
> 
> Given your affiliation (TRW), I imagine you know who is your
> Aerospace/Defense Xilinx FAE, and can contact him regarding this
> subject.
> 

Well, we are an autmotive only company now, so I have no direct
Aero/Defense links most of the time, but we always get to the right
people when we have questions to ask :-)

<snip>
> 
> To answer your specific question:  what about the D FF in the CLB?
> What is it's Failure Rate in Time compared to the configuration bits?
> 
> There are 200,000 DFF in our largest part (XC4VLX200).  the failure
> rate of these is .2 FIT (ie 1 FIT/Mb).  That is .2 failures in 1
> billion hours for the largest device (at sea level).  The DFF is a
> very large, well loaded structure as it is programmable to be:  a
> latch, a D flip flop, asynchronous reset, synchronous reset, with
> other options as well for load, preset, and capture.
> 
> Compared to the 6.1 FIT/million bits of configuration memory (as of
> today's readout) for the mean time to a functional failure, times the
> number of config bits (in Mbit), the DFF upset rate is many times less.
> 

OK, that's good to hear.  Presumably the DFF rate is what I need to
compare with an ASIC (as they don;t have configuration latches)?

> We also are recording the upset rate in the BRAM.
> 

That'll be interesting!

> In Virtex 4, we have FRAME_ECC for detecting and correcting
> configuration bit errors, and BRAM_ECC for detecting and correcting
> BRAM errors (hard IP built into every device).
> 
> Regardless, for the highest level of reliability, we suggest using our
> XTMR tool which automatically converts your design to a TMR version
> optimized for the best reliability using the FPGA resources (the XTMR
> tool understands how to TMR a design in our FPGA -- not something
> obvious how to do).  In addition to the XTMR, we also suggest use fo
> the FRAME_ and BRAM_ ECC features so that you are actively "scrubbing"
> configuration bits so they get fixed if they flip, and the same for
> the BRAM.  The above basically describes how our FPGAs are being used
> now in aerospace applications.
> 

Our application is automotive, therefore will be something like
Spartan-3E.  We will have to use cleverness to avoid spending too much
extra money on silicon - I don't think we can TMR the whole lot...
we'll be speaking to your experts!

Anyway, that's design work for later on...  Thanks for the info.

Cheers,
Martin

-- 
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.trw.com/conekt  
   

Article: 90748
Subject: Re: how to connect my IP-Core to Microblaze in EDK and ISE with IPIF
From: "moleo" <md_kabany@msn.com>
Date: Thu, 20 Oct 2005 08:38:29 -0400
Links: << >>  << T >>  << A >>
hello,
i have the same problem if you found a solution please let me know.

Greeting
moleo


Article: 90749
Subject: Re: to write the driver for my own ip core
From: Eli Hughes <emh203@psu.edu>
Date: Thu, 20 Oct 2005 09:01:41 -0400
Links: << >>  << T >>  << A >>
Hmm.....


Is you question that you don't how how to format your c functions so 
they conform to the Xilinx Driver 'standard'

or....

Is your problem that you don't know how to access your hardware at a 
low-level?

-Eli



Johan Bernspång wrote:
> Athena wrote:
> 
>> Hi All,
>>
>> I have writen an IP core which is used for implementing an algorithm. 
>> Now I have to write the driver for my IP core which mainly transfers 
>> the data between the ip core and the plb bus.
>>
>> But I don't know how to write a driver for an IP CORE. I don't have 
>> any related materials.
>>
>> Who has the experiences of writing the drivers for your own IP core or 
>> has some documents about it?
>>
>> Please help me.
>>
>> Thank you very much.
> 
> 
> Well, since you are going to write drivers for cores connected to the 
> PLB I suspect that you do have the EDK. There is a 'Device Driver 
> Programming Guide' in the Processor IP Reference Guide which gives you 
> the basic knowledge. You also have quite a few drivers in the 
> ./EDK/sw/XilinxProcessorIPLib/drivers catalog of your EDK installation. 
> I've used them as examples when I have learned to write device drivers 
> for EDK. The included drivers in the EDK also gives you an idea on how 
> to write the .mdd and .tcl files.
> 
> Good luck.
> 
> cheers!
> 



Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search