Messages from 81100

Article: 81100
Subject: Re: Newbie: Slow FPGAs
From: dave <dave@dave.dave>
Date: Thu, 17 Mar 2005 21:28:01 +0000
Links: << >> << T >> << A >>

Jezwold wrote:
> Ghz speed FPGAs with sub ns are seriously expensive,but as you say
> there must be a market otherwise they wouldnt make them. Ive just never
> come across anyone who used them to implement a general purpose CPU
> 

Thank you, I'm learning. (don't worry, I'll shut up soon!!)

Article: 81101
Subject: Re: Newbie: Slow FPGAs
From: Ben Twijnstra <btwijnstra@gmail.com>
Date: Thu, 17 Mar 2005 21:29:55 GMT
Links: << >> << T >> << A >>

Hi dave,

> I am probably opening a can of worms, but why are FPGAs so slow?

Because they're general-purpose logic embedded in a sea of wires.

> The CPU cores at www.opencore.org represent an ever growing number of
> excellent and very practical, but slow processor implementations. What I
> mean is 240-500Mhz FPGAs, when market CPUs are in the 3Ghz range.

These CPUs make use of the same kind of technology, but the difference is
that all the silicon is designed for one single purpose. A CPU is a CPU,
and that's all it can be. An FPGA can be a CPU, a bingo machine, an audio
effects processor, a medical imager, you name it.

> Surely there must be 1 or 2 Ghz FPGAs available with sub nano second
> gate switch/propagation times.

The switching time of a Stratix II logic element is about 250ps (depends on
which input is being used and what output path is chosen), which would
theoretically allow such speeds, and I'm sure that if I write an oscillator
for one logic cell, that I will get an oscillating signal in the GHz range.
Depending on which routing structure I use I will either get a 2GHz
(internal feedback path) or a 1GHz (local routing) signal.

> Or possibly it is a verilog, vhdl or synthesis problem with the designs?

Nope. ASIC synthesis is indeed more fine-grained than FPGA synthesis. If an
equation needs a NAND, in an ASIC, a NAND is placed, and dedicated wires
are laid out to connect the NAND to its inputs and outputs.. In an FPGA, a
connection is made to a design element, and this element is then configured
to function as a NAND. A single FPGA 'design' element is capable of
performing much more complex functions than NANDs alone though.

The output of the design element then goes into a big multiplexer. This
multiplexer connects the output of the design element to a variety of
routing structures. These routing structures, in turn connect to other
multiplexers, which connect to either other routing structures etc, or to
the inputs of other design elements.

As you can see, not only the design elements in an FPGA are general purpose,
but so is the interconnect.

So, in short, the silicon in a modern an FPGA is indeed high-performance,
but due to its general-purpose nature, it can never be as fast as dedicated
ASIC logic.

> Should I just use mass manufactured high speed CPUs and relegate the
> other discrete logic to CPLD/FPGAs??

Horses for courses. An 'industry CPU', as you call it (I'm thinking AMD or
Intel), requires a lot of support chips to properly function - it thrives
in an environment that conforms to a (large) number of conditions. An FPGA
can basically be plonked into any situation that requires some sort of
digital function - and a CPU is just one of the many functions that can be
integrated in that little square with legs. It will never be as good as
dedicated silicon, but in many cases, dedicated silicon just isn't there.

> Lastly, how fast is NIOSII?

Pretty fast. It's a 32bit RISC softcore with a 1/5/6-stage pipeline that
runs at ~150MHz in a Stratix II and at ~80MHz in a Cyclone - but Your
Mileage May Vary, depending on custom instructions for the ALU, other logic
getting in the way, speed grades, fill factor etc etc. Note that if you
have a parallelizable algorithm, you can spread the load over multiple
NIOSen in the same FPGA...

Best regards,


Ben

Not from the marketing department ;-)

Article: 81102
Subject: Re: Newbie: Slow FPGAs
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 17 Mar 2005 13:34:22 -0800
Links: << >> << T >> << A >>

dave wrote:

> I am probably opening a can of worms, but why are FPGAs so slow?

> The CPU cores at www.opencore.org represent an ever growing number of 
> excellent and very practical, but slow processor implementations. What I 
> mean is 240-500Mhz FPGAs, when market CPUs are in the 3Ghz range.

Your 3GHz CPU can do a few adds or multiplies at 3GHz.

An FPGA can do hundreds or thousands of adds, or multiplies
at a few hundred MHz.   Which is faster?

-- glen

Article: 81103
Subject: Re: Bit-Rounding Algorithm
From: "Pete Fraser" <pfraser@covad.net>
Date: Thu, 17 Mar 2005 13:38:29 -0800
Links: << >> << T >> << A >>

"morpheus" <saurster@gmail.com> wrote in message 
news:1111094017.939450.52810@o13g2000cwo.googlegroups.com...
> Hi All,
> If anyone knows of a bit rounding algorithm, please forward the
> information to me. I am trying to round-off 24 bits to 12-bits.
> Thanks
> MORPHEUS
> p.s. the 24 bits is the result of an additing between two 24 bit
> numbers. I need to round off the result and feed it to a 12-bit DAC.
> THNX
>
It depends on what you're doing with the output of the DAC.
Under some circumstances, it helps to take the "dropped"
twelve bits, delay them by a clock, then add them in again.

This has the effect of making the quantizing noise higher
freqency, and often less objectionable.

This is known as error feedback.

For plain rounding, just add half an output lsb, then truncate.

Article: 81104
Subject: Re: Newbie: Slow FPGAs
From: dave <dave@dave.dave>
Date: Thu, 17 Mar 2005 21:43:00 +0000
Links: << >> << T >> << A >>

Thank you. Ben.

Ben Twijnstra wrote:
> Hi dave,
> 
> 
>>I am probably opening a can of worms, but why are FPGAs so slow?
> 
> 
> Because they're general-purpose logic embedded in a sea of wires.
> 
> 
>>The CPU cores at www.opencore.org represent an ever growing number of
>>excellent and very practical, but slow processor implementations. What I
>>mean is 240-500Mhz FPGAs, when market CPUs are in the 3Ghz range.
> 
> 
> These CPUs make use of the same kind of technology, but the difference is
> that all the silicon is designed for one single purpose. A CPU is a CPU,
> and that's all it can be. An FPGA can be a CPU, a bingo machine, an audio
> effects processor, a medical imager, you name it.
> 
> 
>>Surely there must be 1 or 2 Ghz FPGAs available with sub nano second
>>gate switch/propagation times.
> 
> 
> The switching time of a Stratix II logic element is about 250ps (depends on
> which input is being used and what output path is chosen), which would
> theoretically allow such speeds, and I'm sure that if I write an oscillator
> for one logic cell, that I will get an oscillating signal in the GHz range.
> Depending on which routing structure I use I will either get a 2GHz
> (internal feedback path) or a 1GHz (local routing) signal.
> 
> 
>>Or possibly it is a verilog, vhdl or synthesis problem with the designs?
> 
> 
> Nope. ASIC synthesis is indeed more fine-grained than FPGA synthesis. If an
> equation needs a NAND, in an ASIC, a NAND is placed, and dedicated wires
> are laid out to connect the NAND to its inputs and outputs.. In an FPGA, a
> connection is made to a design element, and this element is then configured
> to function as a NAND. A single FPGA 'design' element is capable of
> performing much more complex functions than NANDs alone though.
> 
> The output of the design element then goes into a big multiplexer. This
> multiplexer connects the output of the design element to a variety of
> routing structures. These routing structures, in turn connect to other
> multiplexers, which connect to either other routing structures etc, or to
> the inputs of other design elements.
> 
> As you can see, not only the design elements in an FPGA are general purpose,
> but so is the interconnect.
> 
> So, in short, the silicon in a modern an FPGA is indeed high-performance,
> but due to its general-purpose nature, it can never be as fast as dedicated
> ASIC logic.
> 
> 
>>Should I just use mass manufactured high speed CPUs and relegate the
>>other discrete logic to CPLD/FPGAs??
> 
> 
> Horses for courses. An 'industry CPU', as you call it (I'm thinking AMD or
> Intel), requires a lot of support chips to properly function - it thrives
> in an environment that conforms to a (large) number of conditions. An FPGA
> can basically be plonked into any situation that requires some sort of
> digital function - and a CPU is just one of the many functions that can be
> integrated in that little square with legs. It will never be as good as
> dedicated silicon, but in many cases, dedicated silicon just isn't there.
> 
> 
>>Lastly, how fast is NIOSII?
> 
> 
> Pretty fast. It's a 32bit RISC softcore with a 1/5/6-stage pipeline that
> runs at ~150MHz in a Stratix II and at ~80MHz in a Cyclone - but Your
> Mileage May Vary, depending on custom instructions for the ALU, other logic
> getting in the way, speed grades, fill factor etc etc. Note that if you
> have a parallelizable algorithm, you can spread the load over multiple
> NIOSen in the same FPGA...
> 
> Best regards,
> 
> 
> Ben
> 
> Not from the marketing department ;-)
>

Article: 81105
Subject: Re: Newbie: Slow FPGAs
From: dave <dave@dave.dave>
Date: Thu, 17 Mar 2005 21:44:23 +0000
Links: << >> << T >> << A >>

glen herrmannsfeldt wrote:
> dave wrote:
> 
>> I am probably opening a can of worms, but why are FPGAs so slow?
> 
> 
>> The CPU cores at www.opencore.org represent an ever growing number of 
>> excellent and very practical, but slow processor implementations. What 
>> I mean is 240-500Mhz FPGAs, when market CPUs are in the 3Ghz range.
> 
> 
> Your 3GHz CPU can do a few adds or multiplies at 3GHz.
> 
> An FPGA can do hundreds or thousands of adds, or multiplies
> at a few hundred MHz.   Which is faster?
> 
> -- glen
> 

Ok....
Thank you.

Article: 81106
Subject: Re: Newbie: Slow FPGAs
From: "John_H" <johnhandwork@mail.com>
Date: Thu, 17 Mar 2005 21:58:34 GMT
Links: << >> << T >> << A >>

"dave" <dave@dave.dave> wrote in message
news:d1cs40$ja9$1@news6.svr.pol.co.uk...
<snip>
> Contrary to what you may think there is a market for Ghz speed flexible
> FPGAs. But hey, what do I know, I am just a HDL newbie.
<snip>

I completely agree.  There is a market for GHz speed FPGAs.
There's also a merket for TeraHertz speed processors.
And a market for safe, $1500 cars.

Article: 81107
Subject: Re: Newbie: Slow FPGAs
From: "sam" <sams235@gmail.com>
Date: 17 Mar 2005 14:17:41 -0800
Links: << >> << T >> << A >>

Couple of things

One factor affecting the speed of the circuit is the process. Some of
the FPGAs like Virtex -II are fabricated in .13um technology abd are
far behind the present CPU technology.

However, the latest FPGAs from Xilinx and Altera are 90nm (same as P4).
I think the reason for the performance difference for these guys is, as
u suggested, because of the tools. ASICs like CPUs are carefully
optimized for area, timing, power and so on at every level. This is
taken care of by the synthesis, PAR tools while designing the circuits
using FPGAs. All you do is code the design and let the synthesizer, PAR
tools do their best possible job. I also think this process is getting
further complicated by  the inherent design of the FPGAs.

Article: 81108
Subject: Re: How much current does an LED take?
From: Rich Grise <richgrise@example.net>
Date: Thu, 17 Mar 2005 22:18:56 GMT
Links: << >> << T >> << A >>

On Thu, 17 Mar 2005 15:00:59 +0800, Sea Squid top-posted:

> I found PP is unable to drive such LEDs, which needs 20mA, but what is the
> converter chip I shall order?
> 
> Thanks
> 
ULN2803 - Eight darlingtons in a DIP
http://www.st.com/stonline/books/ascii/docs/1536.htm

Of, course, you'll need a separate supply - there is no reliable +5V. Vcc
at the LPT port.

Good Luck!
Rich
> 
> 
> 
> 
> "Sea Squid" <Sea.Squid@hotmail.com> wrote in message
> news:423928c3@news.starhub.net.sg...
>> I want to experiment the parallel port with eight LEDs tied to a cut
>> parallel port cable, then send instructions with Visual Basic to create
>> some patterns. Is there any danger to my laptop?
>>
>> Thanks.
>>
>>
>>
>>

Article: 81109
Subject: Re: Bit-Rounding Algorithm
From: Ray Andraka <ray@andraka.com>
Date: Thu, 17 Mar 2005 17:34:51 -0500
Links: << >> << T >> << A >>

morpheus wrote:

>Hi All,
>If anyone knows of a bit rounding algorithm, please forward the
>information to me. I am trying to round-off 24 bits to 12-bits.
>Thanks
>MORPHEUS
>p.s. the 24 bits is the result of an additing between two 24 bit
>numbers. I need to round off the result and feed it to a 12-bit DAC.
>THNX
>
>  
>
Depends on your tolerance to quantization noise and bias.

The simplest approach is to simply truncate the 12 lower bits.  This 
results in an error between 0 and +1 lsb, which introduces a bias of 1/2 
LSB.

The bias can be reduced using simple rounding:  Add 1/2 of the retained 
LSB weight before truncating.  In your case, you'd add 0x0800 (a '1' in 
the top discarded bit) to the 24 bit value before truncating off the 12 
LSBs.  This reduces the bias to 1/2 of the lsb weight of the 24 bit 
word, but does not totally eliminate the bias. 

A bias remains because 0.5 (0x800 in the lowest 12 bits) always rounds 
up.  0.5 is equidistant to 0 and 1, so it introduces a small bias.  The 
bias can be eliminated by modifying the rounding algorithm to either 
round to or away from zero or  round towards even or odd.  Note with 
simple rounding, +0.5 rounds up to 1 and -0.5 rounds up to 0.  With the 
round to or round away from zero, the direction of rounding is modified 
when the value is negative so the +0.5 rounds up to 1 and -0.5 rounds 
down to -1 (or  both round to 0 if the sense is reversed).  This is 
called symmetric rounding.  It can be accomplished by adding 0.5-LSB (in 
your case 0x7FF) and then connecting the most significant (sign bit) to 
the carry in of the adder (invert the carry in to reverse the sense).  
If the carry-in is 0, then 0.5 is rounded down, and if it is 1 then 0.5 
is rounded up.  Symmetric rounding uses the sign to reverse the 
direction of rounding based on the sign of the input.

Rounding to even or odd is similar, except the value of the input bit 
corresponding to the LSB of the rounded output is used as the carry in 
so that rounding of n.5 is always to the even (or to the odd) value.  
Round to even/odd is useful when you are rounding as part of an 
arithmetic process before complete results are available because it is 
relatively easy to pre-compute the LSB value.

If quantization noise is an issue, then you can improve noise 
performance with feedback of the error introduced by rounding.  That is 
a bit more complicated, so I'll save discussion of it for another time.

In summary:

input   truncation    simple   symmetric  rnd to even
-2.7       -3               -3           -3            -3
-2.5      -3               -2           -3             -2
-2.3       -3              -2            -2             -2
-1.5      -2               -1           -2             -2
-0.5      -1                 0           -1            0
0.5         0                 1            1             0
1.5         1                 2            2             2
2.5         2                 3            3             2

-- 
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 81110
Subject: ISE 7.1 WebPack + EDK 6.3
From: "Peter Særensen" <pbs@mortician.dk>
Date: Thu, 17 Mar 2005 14:38:47 -0800
Links: << >> << T >> << A >>

Hi,

Im a poor student who purchased the 6.3 version of the EDK a few months back. Uptill now I have been using the evaluation version of ISE 6.3 (too expensive for me). I have been unable to use the WebPack version of the ISE 6.3 since "all" I have is a Virtex4 LX25. I had been planning to switch to the ISE 7.1 webpack (which supports my device) but I am unable to get it working with my EDK 6.3 installation. I get the following error:

$XILINX does not point to an iSE 6.3 installation

Isnt it possible to use the ISE 7.1 webpack with EKD 6.3 ??

thanks

Article: 81111
Subject: Re: Newbie: Slow FPGAs
From: Ben Twijnstra <btwijnstra@gmail.com>
Date: Thu, 17 Mar 2005 22:40:37 GMT
Links: << >> << T >> << A >>

Hi dave,

> Jezwold wrote:
>> I quite agree with John_H its a mistake to compare FPGA functionality
>> and CPU functionality,they are just fundamentaly different things.I
>> also think its a mistake to implement a CPU in a FPGA but I'll prolly
>> get flamed for saying that.
>> 
> 
> Honest question, why is it a mistake?

Embedded CPUs are 'hot' at the moment. In 1997 I implemented a PIC
controller in an Altera Flex FPGA as a proof-of-concept. The implementation
ran at three times the speed of the PIC, but cost $150, versus the $8 of
the PIC.

Nowadays, FPGA gate pricing has come down to levels that make implementing a
CPU on an FPGA economically viable. A NIOS II will easily fit into half an
Altera EP1C3 FPGA costing around $12 (in low volume), leaving the other
half for more specialized logic.

Then again, a PIC12/16/18 is available with lots of nice peripherals for
prices around $6 or lower, so if you just want to use a CPU with some
standard peripherals, then please just get a PIC (or a Cypress pSOC - they
have a reprogrammable analog array as well!!).

If there's some nonstandard digital function you need to build, and there
happens to be a CPU on the board you,re building as well, _then_ you may
want to look whether you can stuff the CPU in an unused corner of your
FPGA. Otherwise, just go dedicated.

Best regards,

Ben

Article: 81112
Subject: Re: Newbie: Slow FPGAs
From: Kolja Sulimma <news@sulimma.de>
Date: Thu, 17 Mar 2005 23:56:48 +0100
Links: << >> << T >> << A >>

dave wrote:
> I am probably opening a can of worms, but why are FPGAs so slow?
dynmic versus static hardware configuration.
A road from your home to your work place can be a lot faster if you do 
not require it to be used by thousends of other people that use the same 
road for different routes. You can get a long without traffic lights, 
cross roads, and so on.
The same for FPGAs they have switches were an asic has wires. So an fpga 
will have a slower cpu, than an cpu asic, and it will have a slower fir 
filter than an fir asic, but the fpga can do both the other two can't.
For that reason often the fpga is faster. Because it is a faster cpu 
than the fir asic and a faster fir filter than the cpu asic.

> The CPU cores at www.opencore.org represent an ever growing number of 
> excellent and very practical, but slow processor implementations. What I 
> mean is 240-500Mhz FPGAs, when market CPUs are in the 3Ghz range.

Clock speed ist not everything. These people have a 90MHz FPGA ray 
tracing hardware that beats a P4-3GHz by a factor of four.
http://www.saarcor.de/

> Surely there must be 1 or 2 Ghz FPGAs available with sub nano second 
> gate switch/propagation times.
The typical carry chain delay of an fpga is 50ps.

Kolja Sulimma

Article: 81113
Subject: Re: Beginning Xilinx FPGA Tutorials?
From: "Hendra" <u1000393@email.sjsu.edu>
Date: 17 Mar 2005 15:16:07 -0800
Links: << >> << T >> << A >>

Herb T wrote:
> Folks,
> I am trying to learn how to program Spartan II (XC2S100-5PQ208C) and
> Spartan 3 (XC3S400-4PQ208C) Xilinx FPGAs. I looked at the data sheets
> for these parts, and the more I do the more mystified I get.

You don't need to read the chip's data sheet in order to know how to
program it. All you need to know is the I/0 pin assignment on your FPGA
board.

> Based on these descriptions, about how long does it take to write
> simple VHDL programs that work, or become fluent enough to know a
good
> design from a chip fryer?

Start simple. Start building a 2 input AND gates. That shouldn't take
more than a few minutes. The tool should have the feature to connect
the input and output of your AND gate to any I/O available in the FPGA
chip. Digilent (an FPGA board vendor/www.digilentinc.com) has a video
tutorial on programming their FPGA board. You can also look at
www.engr.sjsu.edu/crabill. Look at lab1. The author has a good tutorial
on programming an FPGA board. He used Verilog, but the procedure for
VHDL should be similar. 
Good Luck!

Hendra

Article: 81114
Subject: Re: Tornado Board and Education Kit is available.
From: "info_" <"info_"@\\nospam_no_underscore_alse-fr.com>
Date: Fri, 18 Mar 2005 00:24:35 +0100
Links: << >> << T >> << A >>

KCL wrote:
> and what is the price of this board??

-1-
TRND1 - Tornado board
Special Introductory price : ..... 295 €uros VAT excl. + shipping

-2-
TEK5 - Tornado Education Kit 5 x boards + Tuition material
Special Introductory price : ...... 1 480 €uros,  VAT ecl. + shipping

Thanks for mentioning the wrong links, this Web page should have been
updated anyway... my fault.

The real site is either :
http://www.alse-fr.com   (French)  or
http://www.alse-fr.com/english  (guess :-)
You'll find all the contact information.

Both 1 & 2 ship with all you need, including the design software
and ready-to-use VHDL and Verilog example(s), scripts etc...
You still need a PC running Win2k or XP though.
(we have'nt ported our USB programmer on Linux yet)
And to get serious, a VHDL (or verilog) simulator is indeed welcome.

A beginner should have his first FPGA synthesized and running in
less than 15 minutes after unpacking the board.
Everything from HDL code to bitstream download and board
running requires only _one_ single command : a double-clik on make.bat.

Teachers should like our ready-to-use Tuition material.

Hobbyists shoud like the conditioned I/Os and RC servo.

Experts should like the advanced features like fast ADC,
Smart Card, I2C-PS2 links, USB transfers up to 1 Mbytes/s, etc...

Documents and Tuition materials are available in English
or in French. Education Kit solutions only in VHDL yet,
will be done in Verilog upon demand.

Board's example both in Verilog and VHDL.

Best regards,
Bert

Article: 81115
Subject: Re: Newbie: Slow FPGAs
From: Roel <electronics_designer@hotmail.com>
Date: Fri, 18 Mar 2005 00:44:38 +0100
Links: << >> << T >> << A >>

sam wrote:
> Couple of things
> 
> One factor affecting the speed of the circuit is the process. Some of
> the FPGAs like Virtex -II are fabricated in .13um technology abd are
> far behind the present CPU technology.
> 
> However, the latest FPGAs from Xilinx and Altera are 90nm (same as P4).
> I think the reason for the performance difference for these guys is, as
> u suggested, because of the tools. ASICs like CPUs are carefully
> optimized for area, timing, power and so on at every level. This is
> taken care of by the synthesis, PAR tools while designing the circuits
> using FPGAs. All you do is code the design and let the synthesizer, PAR
> tools do their best possible job. I also think this process is getting
> further complicated by  the inherent design of the FPGAs.
> 
Anyway it is strange why an PPC405 will only do ~500Mhz in 90nm in 2005 
and a P4 in 90nm >3Ghz (factor 6 !!!),  it suggests that the FPGA 
silicon is far less optimized as a CPU would, the process used for 
modern FPGA’s equals that of P4’s. Of course it all depends on the max 
level of logic between two clock levels expressed in FO4 delay's. The 
less levels the more though the design will be. A P4 is not a "though" 
design in this perspective and PPC405 also not, so still the question is 
why... Maybe the P4 is designed transistor by transistor, and a PPC405 
in a V4 is only synthesized by some less efficient synthesis tools?? For 
sure it gives hope for the next generation FPGA’s.

Roel

Article: 81116
Subject: Performance evaluation of Distributed Arithmetic architectures for FIR filters
From: "anup" <anuphosh@yahoo.com>
Date: 17 Mar 2005 15:51:41 -0800
Links: << >> << T >> << A >>

Hi,

   I am trying to figure out how to evaluate the speed of Distributed
Arithmetic architectures for FIR filter design. I was refering to the
paper "A Guide to Field Programmable Gate Arrays for Application
Specific Digital Signal Processing Performance" by G.R.Goslin at
Xilinx. The paper says that by using a parallel distributed arithmetic
architecture (where more bits of the inputs are processed at the same
time) greater sampling speed (number of samples per second) can be
achieved, compared to Serial Distributed Arithmetic.

 I am confused about this. According to me, if you pipeline the design,
you can achieve the sampling speed you want. I can see how using more
resources, you can achieve shorter latency, but the sampling speed
should not be affected. There is probably something I am missing here.
If any of you are familiar with this field, and know the answer, please
let me know.

Thank you,
Anup

Article: 81117
Subject: Re: Performance evaluation of Distributed Arithmetic architectures for FIR filters
From: Eric Smith <eric@brouhaha.com>
Date: 17 Mar 2005 16:04:41 -0800
Links: << >> << T >> << A >>

"anup" <anuphosh@yahoo.com> writes:
>  I am confused about this. According to me, if you pipeline the design,
> you can achieve the sampling speed you want. I can see how using more
> resources, you can achieve shorter latency, but the sampling speed
> should not be affected. There is probably something I am missing here.

Adding pipelining is usually done to reduce the cycle time (increasing
the clock rate) while also increasing the latency.

Suppose you wanted to build a floating-point multiply-and-add
unit.  Perhaps if you make it fully combinatorial, it takes
100 ns for each cycle.  You can process ten million samples per
second, and the latency is 100 ns.

Suppose instead you break it up into a pipeline with a combinatorial
multiplier, a pipeline register, and a combinatorial adder.  Suppose the
multiplier and adder each take 60 ns separately, and the pipeline
register setup and clock-to-output-valid time adds 10 ns.  Now your
latency (full operatin time) is 130 ns, which is longer.  But your
sample clock can be as fast as 70 ns, so you can process over fourteen
million samples per second.

Now suppose you internally pipeline the actual adder and the multiplier.
Perhaps each have three stages that take 20 ns each, and you still have
10 ns of delays for combined setup and clock-to-output-valid time of the
pipeline registers.  Now you have a latency of 170 ns, which is longer
yet.  But your sample clock can now be 30 ns, so you can process over
thirty million samples per second.

Note that the times used in this example are probably not representative
of real times for any actual system.

Eric

Article: 81118
Subject: Stapl, Xilinx, Altera Jam, XCF02S and iMPACT
From: "David Colson" <dscolson@rcn.com>
Date: Thu, 17 Mar 2005 20:03:48 -0500
Links: << >> << T >> << A >>

Hi,
I was wondering if anyone has tried to use a stapl file
generated from Xilinx iMPACT software to program
a XCF02S using Alteras Jam software or any other third
party programming software.

I believe the iMPACT software generates incorrect
data streams that are used to program the device.

The thing is the ACA data format used to program
generates the correct bit stream, but the Hex data format
seems to be scrambled.

I verified this by reading back the program written to the device and
comparing it to the *.mcs file generated by iMPACT. When the two
files are compared, locations that were program with the hex format
were scrambled. Those that used the ACA format were fine.

Any comments?

thanks
Dave Colson

Article: 81119
Subject: Re: Newbie: Slow FPGAs
From: Kolja Sulimma <news@sulimma.de>
Date: Fri, 18 Mar 2005 02:11:33 +0100
Links: << >> << T >> << A >>

Roel wrote:

> Anyway it is strange why an PPC405 will only do ~500Mhz in 90nm in 2005 
> and a P4 in 90nm >3Ghz (factor 6 !!!), 
no it is not. Even outside fpgas the market share of slow processors is 
a lot larger than that of fast processors. For every 3GHz P4 there are 
dozens of 200MHz Riscs and hundreds of 10MHz MCUs sold.
Why should Xilinx go for the exotic niche market that desktop PCs are 
from a processor builders view?

> it suggests that the FPGA 
> silicon is far less optimized as a CPU would, 
Both are optimized for different optimization goals. The PPC405 is by 
far smaller than a P4 and uses a lot less power.
Here is a recent processor that the makers of the P4 consider highly 
optimized. It runs at up to 520MHz:
http://www.intel.com/design/embeddedpca/products/pxa270/techdocs.htm

Two P4 cores would burn more than a hundret watts. They would also need 
large caches, many io pins to access external memory quickly enough.
(There is an empirical law in computer architecture that memory scales 
with performance.)

>the process used for 
> modern FPGA’s equals that of P4’s. 
No, it used to be that DRAMs used the most advanced technology first, 
now that switches more and more to fpgas. CPU usually adopt the 
technology many months later.

Kolja Sulimma

Article: 81120
Subject: Re: Performance evaluation of Distributed Arithmetic architectures for FIR filters
From: "anup" <anuphosh@yahoo.com>
Date: 17 Mar 2005 17:22:37 -0800
Links: << >> << T >> << A >>

Hi Eric,

 Thank you for your response. I guess I did not frame my question
properly. Your answer actually strengthens my doubt. According to what
you said, by using more pipeline stages, you can increase the sampling
speed.

But I read this article, where they use more resources to make use of
the parallelism in the application (FIR filters), and reduce the
latency. But they claimed that the sampling speed is also increased. My
doubt is that, even in the original design (with fewer resources), you
can achieve higher sampling speed by pipelining the design.

 One relation I see between "more resources" and "sampling speed" is
that you need fewer pipeline stages to achieve higher sampling speed.
For example, consider this computation:

 Y = Y + A[i]*X[i];

 Assume that I have only 2 adders. I can implement this as

 Y1 = A[i]*X[i] + A[i+1]*X[i+1]; | Y = Y + Y1

 Assume that it takes 10 ns to do (A[i]*X[i] + A[i+1]*X[i+1]);
 Therefore, without pipelining the adder, I can achieve sampling of 200
million samples per second.

 Now suppose, I have 4 adders, I can do

 Y1 = A[i]*X[i] + A[i+1]*X[i+1];     |               |
                                     |  Y' = Y1 + Y2 |  Y = Y + Y'
 Y2 = A[i+2]*X[i+2] + A[i+3]*X[i+3]; |               |

 Now I can achieve a sampling of 400 million samples per second.

 The original system (with 2 adders) can also achieve the same sampling
speed, if the adder is pipelined. I guess that if there is a limit on
the amount of pipelining that you can do, adding more resources is the
way to increase sampling speed. (I am trying to answer my own question
here)

Anyways, thanks for your help.

-Anup

Eric Smith wrote:
> "anup" <anuphosh@yahoo.com> writes:
> >  I am confused about this. According to me, if you pipeline the
design,
> > you can achieve the sampling speed you want. I can see how using
more
> > resources, you can achieve shorter latency, but the sampling speed
> > should not be affected. There is probably something I am missing
here.
>
> Adding pipelining is usually done to reduce the cycle time
(increasing
> the clock rate) while also increasing the latency.
>
> Suppose you wanted to build a floating-point multiply-and-add
> unit.  Perhaps if you make it fully combinatorial, it takes
> 100 ns for each cycle.  You can process ten million samples per
> second, and the latency is 100 ns.
>
> Suppose instead you break it up into a pipeline with a combinatorial
> multiplier, a pipeline register, and a combinatorial adder.  Suppose
the
> multiplier and adder each take 60 ns separately, and the pipeline
> register setup and clock-to-output-valid time adds 10 ns.  Now your
> latency (full operatin time) is 130 ns, which is longer.  But your
> sample clock can be as fast as 70 ns, so you can process over
fourteen
> million samples per second.
>
> Now suppose you internally pipeline the actual adder and the
multiplier.
> Perhaps each have three stages that take 20 ns each, and you still
have
> 10 ns of delays for combined setup and clock-to-output-valid time of
the
> pipeline registers.  Now you have a latency of 170 ns, which is
longer
> yet.  But your sample clock can now be 30 ns, so you can process over
> thirty million samples per second.
>
> Note that the times used in this example are probably not
representative
> of real times for any actual system.
> 
> Eric

Article: 81121
Subject: Re: Newbie: Slow FPGAs
From: Jim Granville <no.spam@designtools.co.nz>
Date: Fri, 18 Mar 2005 14:24:48 +1300
Links: << >> << T >> << A >>

Roel wrote:
> sam wrote:
> 
>> Couple of things
>>
>> One factor affecting the speed of the circuit is the process. Some of
>> the FPGAs like Virtex -II are fabricated in .13um technology abd are
>> far behind the present CPU technology.
>>
>> However, the latest FPGAs from Xilinx and Altera are 90nm (same as P4).
>> I think the reason for the performance difference for these guys is, as
>> u suggested, because of the tools. ASICs like CPUs are carefully
>> optimized for area, timing, power and so on at every level. This is
>> taken care of by the synthesis, PAR tools while designing the circuits
>> using FPGAs. All you do is code the design and let the synthesizer, PAR
>> tools do their best possible job. I also think this process is getting
>> further complicated by  the inherent design of the FPGAs.
>>
> Anyway it is strange why an PPC405 will only do ~500Mhz in 90nm in 2005 
> and a P4 in 90nm >3Ghz (factor 6 !!!),  it suggests that the FPGA 
> silicon is far less optimized as a CPU would, the process used for 
> modern FPGA’s equals that of P4’s. Of course it all depends on the max 
> level of logic between two clock levels expressed in FO4 delay's. The 
> less levels the more though the design will be. A P4 is not a "though" 
> design in this perspective and PPC405 also not, so still the question is 
> why... Maybe the P4 is designed transistor by transistor, and a PPC405 
> in a V4 is only synthesized by some less efficient synthesis tools?? For 
> sure it gives hope for the next generation FPGA’s.

  You need to be careful to compare BUS speeds, rather than CLK speeds. 
CLK speed can refer to how fast a single node in the chip toggles,
(== marketing fluff) and on that basis, the FPGAs could be pitched at 
10Ghz devices :) as the top end ones can do 10GHz comms....

  Once you work at Bus-bandwidth numbers the differences greatly reduce
and BUS bandwidth is also determined as much by memory devices, as it is
by CPU/FPGA process.
  FPGAs have more general IOs, (and can easily make a BUS wider), whilst
PCs try to save pins, and can focus the IO purely for DDR memory.

-jg

Article: 81122
Subject: Re: How much current does an LED take?
From: "Glenn Baddeley" <glenn.baddeley@team.telstra.com>
Date: Fri, 18 Mar 2005 01:54:26 +0000 (UTC)
Links: << >> << T >> << A >>

Use a series resistor of at least 3.3K Ohm to keep the current under 1
milliAmp.  Most LEDs will give out enough light at this current to be
visible.

Glenn.

Sea Squid wrote:
> I want to experiment the parallel port with eight LEDs tied to
> a cut parallel port cable, then send instructions with Visual Basic
> to create some patterns. Is there any danger to my laptop?
> 
> Thanks.

Article: 81123
Subject: Re: Bit-Rounding Algorithm
From: "Douglas Sykora" <djsykoraNOSPAMM@execpc.com>
Date: Thu, 17 Mar 2005 21:17:41 -0600
Links: << >> << T >> << A >>


"morpheus" <saurster@gmail.com> wrote in message
news:1111094017.939450.52810@o13g2000cwo.googlegroups.com...
> Hi All,
> If anyone knows of a bit rounding algorithm, please forward the
> information to me. I am trying to round-off 24 bits to 12-bits.
> Thanks
> MORPHEUS
> p.s. the 24 bits is the result of an additing between two 24 bit
> numbers. I need to round off the result and feed it to a 12-bit DAC.
> THNX
>

MORPHEUS,
  I know you are asking specifically about rounding, but I am a little
concerned about the
addition of two 24 bit numbers with a 24 bit result.  You need a 25 bit
result.  Do you have
overflow issues here?
Doug

Article: 81124
Subject: Re: Using XC2V6000 to send/receive test vectors.
From: "Sea Squid" <Sea.Squid@hotmail.com>
Date: Fri, 18 Mar 2005 11:26:18 +0800
Links: << >> << T >> << A >>

Thank you Jim.

I was aware that data2mem takes in a FULL bitstream of my compiled design,
and
output an updated FULL bitstream of the design. Since I am using a Virtex
6000, the
time required to configure the FPGA becomes intolerable.

I am able to wrtie automation scripts to employ the "small bit manipulation"
trick to
compare two bitstream and get a differential partial bitstream, but I am
concerned
whether this is the right approach.

Besides that, is it possible to automate the Impact to configure the FPGA,
for example,
1 configuration per 5 minutes, since I intend to do some exhaustive test of
all the 1000
input vectors.





"Jim Wu" <nospam@nospam.com> wrote in message
news:d1btiq$18g1@cliff.xsj.xilinx.com...
> Check the "data2mem" program installed with ise.
>
> HTH,
> Jim
> jimwu88NOOOSPAM@yahoo.com (remove capital letters)
> http://www.geocities.com/jimwu88/chips
>
> "Sea Squid" <Sea.Squid@hotmail.com> wrote in message
> news:4238e536$1@news.starhub.net.sg...
> > I made use of two 1K*10B single port RAMs generated with coregen
> > which is modified to contain my test vector, and P&R with that. However,
> > I have one thousand test vector files in plain text to send to the FPGA
> one
> > at a time.
> >
> > I am wondering about whether I can write a perl script to manipulate the
> > bitstream and generate an *incremental* bitstream so that I can avoid
> > running ISE for one thousand times? Where can I find such information?
> >
> > Thanks.
> >
> >
> >
>
>

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search