Messages from 101925

Article: 101925
Subject: Re: Can an FPGA be operated reliably in a car wheel?
From: cs_posting@hotmail.com
Date: 8 May 2006 11:09:03 -0700
Links: << >> << T >> << A >>

John_H wrote:

>>http://dwb.unl.edu/Teacher/NSF/C01/C01Links/www.ualberta.ca/~bderksen/windowpane.html
>
> Thanks for the link.  The information presents some concepts with glass
> deformation that I wasn't familiar with but the comment "This author
> believes that the correct explanation lies in the process by which window
> panes were manufactured at that time" without support for why the thicker
> end would always be installed on the bottom leaves me with the issue still
> open.  I'll nudge it more toward "maybe."
>
> Apologies to those disturbed by how off topic this got.

Well someone please go turn one upside down, measure it with a
micrometer, and make a note of it, so at least our grandkids will be
able to settle this.

(make sure it's not near where said grandkids are likely to be playing
ball in the years before they gain an appreciation for experimental
science)

Article: 101926
Subject: Re: Can an FPGA be operated reliably in a car wheel?
From: Austin Lesea <austin@xilinx.com>
Date: Mon, 08 May 2006 11:11:38 -0700
Links: << >> << T >> << A >>

Symon,

You have to be careful if you have differing thermal coefficients of 
expansion (if you epoxy it to the pcb).

If the part heats up, and the pcb heats up, you would like the 
coefficients of thermal expansion to all be the same, so you do not 
shear the solder bumps off the pcb.

It is a much bigger problem with flip chip, to match all the 
coefficients and mount the die to the substrate such that it will 
tolerate many years of thermal cycling without cracking and of the 
solder bumps.

I suspect this is a wire bond epoxy/plastic based molding compound type 
of application, where the FPGA package is of similar material to the 
pcb, so that thermal stresses should be minimal.

Why no use a RF link to send the data from sensors in the wheel to 
somewhere else?

How the heck does one power a pcb in a wheel?  Sounds like there are a 
ton of problems to solve.  How do you communicate with the wheel?  "Yo, 
wheel..."?

Austin

Symon wrote:

> "Peter Alfke" <peter@xilinx.com> wrote in message 
> news:1147110712.154203.47590@i39g2000cwa.googlegroups.com...
> 
>>Let's remember that the g-forces have a direction (outward), and it is
>>up to the pc-board designer to take advantage of this.
>>Peter Alfke
>>
> 
> Peter,
> You raise an interesting point. I wonder if the yield stress limit is the 
> same for compression as for expansion?
> Sod it, just epoxy the damn thing to the board! :-)
> Cheers, Syms. 
> 
>

Article: 101927
Subject: Re: FPGA-based hardware accelerator for PC
From: "JJ" <johnjakson@gmail.com>
Date: 8 May 2006 11:20:55 -0700
Links: << >> << T >> << A >>

Piotr Wyderski wrote:
> JJ wrote:
>
> > I have fantastic disbelief about that 6 ops /clock except in very
> > specific circumstances perhaps in a video codec using MMX/SSE etc where
> > those units really do the equiv of many tiny integer codes per cycle on
> > 4 or more parallel 8 bit DSP values.
>
> John, of course it is about peak performance, reachable with great effort.

Ofcourse, I don't think we differ much in opinion on the matter. But I
prefer to stick to avg throughputs available with C codes.

I think in summary any HW acceleration is justified when it is pretty
much busy all the time, embedded or or least can shrink very
significantly the time spent waiting to complete, but few opportunities
are going to get done I fear since the software experts are far from
having the knowhow to do this in HW.. For many apps that an FPGA might
barely be considered, one might also look at the GPUs or the Physix
chip or maybe wait for ClearSpeed to get on board (esp for flops) so
FPGA will be the least visible option.

> But the existence of every accelerator is explained only when even that
> peak performance is not enough. Otherwise you simply could write better
> code at no additional hardware cost. I know that in most cases the CPU
> sleeps because of lack of load or stalls because of a cache miss, but it
> is completely different song...
>
> > Now thats looking pretty much like what FPGA DSP can do pretty trivially
> > except for the clock ratio 2GHz v 150MHz.
>
> Yes, in my case a Cyclone @ 65MHz (130MHz internally + SDR interface,
> 260 MHz at the critical path with timesharing) is enough. But it is a
> specialized
> waveforming device, not a generic-purpose computer. As a processor, it could
> reach 180MHz and then stabilize -- not an impressive value today, not to
> mention
> that it contsins no cache, as BRAMs are too precious resources to be wasted
> that
> way.

The BRAMs are what define the opportunity, 500 odd BRAMs all whacking
data at say 300MHz & dual port is orders more bandwidth than any
commodity cpu will ever see, so if they can be used independantly,
FPGAs win hand down. I suspect alot of poorly executed software to
hardware conversion combines too many BRAMs into a single large and
relatively very expensive SRAM which gives all the points back to cpus.
That is also the problem with soft core cpus, to be usefull you wants
lots of cache, but merging BRAMs into useful size caches throws all
their individual bandwidth away. Thats why I propose using RLDRAM as it
allows FPGA cpus to use 1 BRAM each and share RLDRAM bandwidth over
many threads with full associativity of memory lines using hashed MMU
structure IPT sort of.

>
> > A while back, Toms Hardware did a comparison of 3GHz P4s v the P100 1st
> > pentium and all the in betweens and the plot was basically linear
>
> Interesting. In fact I don't care about P4, as its architecture is one
> big mistake, but linear speedup would be a shame for a Pentium 3...
>

Toms IIRC didn't have AMD on the lineup, must have been 1-2yrs ago. The
P4 end of the curve was still linear but the tests are IMO bogus as
they push linear memmory tests rather than the random test I use. I
hate when people talk of bandwidth for blasting GB of contiguous large
data around and completely ignore pushing millions of tiny blocks
around.

> > benchmark performance, it also used perhaps 100x the transistor count
>
> Northwood has 55 million, the old Pentium had 4.5 million.
>

100x overstating it a bit I admit, but the turn to multi cores puts cpu
back on the same path as FPGAs, Moores law for quantity rather than raw
clock speed which keeps the arguments for & against relatively
constant.

> > as well and that is all due to the Memory Wall and the necessiity to
> > avoid at all costs accessing DRAM.
>
> Yes, that is true. 144 MiB of caches of a POWER5 does help.
> A 1.5GHz POWER5 is as fast as a 3.2GHz Pentium 4 (measured
> on a large memory-hungry application). But you can buy many P4s
> at the price of a single POWER5 MQM.
>
> > Try running a random number generator say R250 which can generate a new
> > rand number every 3ns on a XP2400 (9 ops IIRC). Now use that no to
> > address a table >> 4MB. All of a sudden my 12Gops Athlon is running at
> > 3MHz ie every memory access takes 300ns
>
> Man, what 4MiB... ;-) Our application's working set is 200--600MiB. That's
> the PITA! :-/
>

Actually I ran that test from 32k doubling until I got to my ram limit
640MB (no swapping) on a 1GB system and the speed reduction is sort of
stair case log. At 32K obviously no real slow down, the step bumps
obviously indicate the memory system gradually failing, L1, L2, TLB,
after 16M, the drop to 300ns can't get any worse since the L2,TLBs have
long failed having so very little associativity. But then again it all
depends on temporal locality, how much work gets done per cache line
refill and is all the effort of the cache transfer thrown away every
time (trees). or only some of the time (code).

In the RLDRAM approach I use, the Virtex 2Pro would effectively see 3ns
raw memory issue rates for full random accesses but the true latency of
20ns is well hidden and the issue rate is reduced probably 2x to allow
for rehashing and bank collisions. Still 6ns issue rate v 300ns for
full random access is something to crow about. Ofcourse the technology
would work even better on full custom cpu. The OS never really gets
involved to fix up TLBs since there aren't any, the MMU does the rehash
work. The 2 big penalties are that tagging adds 20% to memory cost, 1
tag every 32bytes, and with hashing, the store should be left <80%
full, but memory is cheap, bandwidth isn't.

> > So on an FPGA cpu, without OoO, no Branch prediction, and with tiny
> > caches, I would expect to see only abouit .6 to .8 ops/cycle and
> > without caches
>
> In a soft DSP processor it would be much less, as there is much vector
> processing, which omits (or at least should) the funny caches built of
> BRAMs.
>

DSP has highly predicable data structures and high locality, not much
tree walking so SDRAM bandwidth can be better used directly, still code
should be cached.

> > I have no experience with the Opterons yet, I have heard they might be
> > 10x faster than my old 1GHx TB but I remain skeptical based on past
> > experience.
>
> I like the Cell approach -- no chache => no cache misses => tremendous
> preformance.
> But there are only 256KiB of local memory, so it is restricted to
> specialized tasks.
>

I suspect Cell will get used to accelerate as many apps as FPGAs or
more but it is so manually cached. I can't say I like it myself, so
much theoretical peak, but how to get at it. I much prefer the Niagara
approach to cpu design, if only the memory was done the same way.

>     Best regards
>     Piotr Wyderski

regards

John Jakson
transputer guy

Article: 101928
Subject: Re: FPGA-based hardware accelerator for PC
From: "JJ" <johnjakson@gmail.com>
Date: 8 May 2006 11:25:29 -0700
Links: << >> << T >> << A >>


Andreas Ehliar wrote:
> On 2006-05-07, JJ <johnjakson@gmail.com> wrote:
> > I would say that if we were to see PCIe on chip, even if on a higher $
> > part, we would quickly see alot more  co pro board activity even just
> > plain vanilla PC boards.
>
> You might be interested in knowing that Lattice is doing just that in
> some of their LatticeSC parts. On the other hand, you are somewhat
> limited in the kinds of application you are going to accelerate since
> LatticeSC do not have embedded multipliers IIRC. (Lattice are
> targetting communication solutions such as line cards that rarely needs
> high performance multiplication in LatticeSC.)
>
> /Andreas

Yeh, I have been following Lattice more closely recently, will take me
some time to evaluate their specs more fully, may get more interested
if they have a free use tool chain I can redo my work with.

Does anyone have PCIe on chip though? 

John Jakson
transputer guy

Article: 101929
Subject: Re: Can an FPGA be operated reliably in a car wheel?
From: Jan Panteltje <pNaonStpealmtje@yahoo.com>
Date: Mon, 08 May 2006 18:37:26 GMT
Links: << >> << T >> << A >>

On a sunny day (Mon, 08 May 2006 11:11:38 -0700) it happened Austin Lesea
<austin@xilinx.com> wrote in <e3o1kq$rf35@xco-news.xilinx.com>:

>Why no use a RF link to send the data from sensors in the wheel to 
>somewhere else?
You do not want the cellphone to jam yyour brakes....

>How the heck does one power a pcb in a wheel?  
Rotary transformer (classic solution, can also be used for data transfer, basically 2 halves of a pot core.
One could make a mechanical generator with some mass, every time speed changes it would turn, rotate
a magnet, like those old automatic mechanical rewind watches.
I'd go with the transformer.

Article: 101930
Subject: Re: FPGA-based hardware accelerator for PC
From: "JJ" <johnjakson@gmail.com>
Date: 8 May 2006 12:21:04 -0700
Links: << >> << T >> << A >>

Piotr Wyderski wrote:
> Andreas Ehliar wrote:
>
> > One interesting application for most of the people on this
> > newsgroup would be synthesis, place & route and HDL simulation.
> > My guess would be that these applications could be heavily
> > accelerated by FPGA:s.
>
> A car is not the best tool to make another cars.
> It's not a bees & butterflies story. :-) Same with FPGAs.
>

Well xyz auto workers do eat their own usually subsidised by the
employer.

I disagree, in a situation where FPGAs develop relatively slowly and
P/R jobs take many hours, there would be a good opportunity to use
FPGAs for just such a job. But then again FPGAs and the software is
evolving too fast and P/R jobs in my case have gone from 8-30hrs a few
years ago to a few minutes today so the incentive has gone.

If I was paying $250K like the ASIC guys do for this and that, a
hardware coprocessor might look quite cheap and the EDA software is
much more independant of the foundries. DAC usually has a few hardware
copro vendors most of them based on FPGAs. At one time some of those
were even done in full custom silicon, that was really eating your own.

> > My second guess that it is far from trivial to actually do this :)
>
> And who actually would need that?
>

I would be rather amazed if in a few years my 8 core Ulteron x86 chip
was still running EDA tools on 1 core.

>     Best regards
>     Piotr Wyderski

John Jakson
transputer guy

Article: 101931
Subject: Re: Can an FPGA be operated reliably in a car wheel?
From: "John_H" <johnhandwork@mail.com>
Date: Mon, 08 May 2006 19:30:36 GMT
Links: << >> << T >> << A >>

"Symon" <symon_brewer@hotmail.com> wrote in message 
news:445f8769$0$15784$14726298@news.sunsite.dk...
>
> Sod it, just epoxy the damn thing to the board! :-)
> Cheers, Syms.

Epoxy adds mass.  Does the improved tolerance to shear stress (assuming 
lateral g-force here) outweigh the effects of added mass?  (I like the term 
"outmass" versus "outweigh")

Article: 101932
Subject: Re: Xilinx 3s8000?
From: Ron <News5@spamex.com>
Date: Mon, 08 May 2006 12:55:56 -0700
Links: << >> << T >> << A >>

Thomas Womack wrote:
> As you might imagine, I would be ecstatic to see a few wide
> multipliers appearing in FPGAs - a 64x64->128 unit isn't _that_ large
> an IP block

Hi Tom. No, a 64x64->128 integer multiplier isn't that large at all. 
Here is the device utilization report for mine:

Device utilization summary:
---------------------------
Selected Device : 3s500epq208-4
  Number of Slices:                     557  out of   4656    11%
  Number of Slice Flip Flops:           370  out of   9312     3%
  Number of 4 input LUTs:               867  out of   9312     9%
  Number of bonded IOBs:                 18  out of    158    11%
  Number of GCLKs:                        1  out of     24     4%

Keep in mind though that my primary concern at present is minimizing LUT 
(gate) count, not speed, so this multiplier requires N clock cycles to 
multiply two N bit numbers together yielding a 2N length result. If 
you'd like I'll be happy to send you a copy of the Verilog source code 
for my multiplier and the combination multiplier/modulo module (which is 
only slightly larger than the multiplier module). It's 110 lines of Verilog.

Regards,

Ron

Article: 101933
Subject: Putting the Ring into Ring oscillators
From: Jim Granville <no.spam@designtools.co.nz>
Date: Tue, 09 May 2006 07:56:10 +1200
Links: << >> << T >> << A >>


This news is interesting

http://www.eet.com/news/latest/showArticle.jhtml?articleID=187200783&pgno=2

  You do need a hype filter when reading this, and many claims are 
extropolation-gone-wrong, but the base idea already exists in ring
osc designs inside FPGA now.

  Seems ( with the right tools ) you could extend this inside a FPGA, by 
creating a large physical ring (long routes), with the sprinkled 
buffers. The physical delays would reduce the process variations in the
clock, and you get the phase taps 'for free'.
  - but the tools _will_ need to co-operate :)

  We have done this inside CPLDs, and get appx 1.3ns granularity.

With FPGAs the buffer delays are much lower, and the routing
can be made to dominate.

  Sounds like a project for Antti :)

-jg

Article: 101934
Subject: Re: Can an FPGA be operated reliably in a car wheel?
From: Jim Granville <no.spam@designtools.co.nz>
Date: Tue, 09 May 2006 07:58:48 +1200
Links: << >> << T >> << A >>

Symon wrote:

> To be pedantic, 400g is a pretty substantial _acceleration_ on a mass!
> 
> The upshot is, if you want to put an unclamped FPGA in a tyre, I suggest 
> your FPGA has big balls. (With apologies to AC/DC!)

Or, "Lots of Balls" .. :)

-jg

Article: 101935
Subject: UK source for Digilent S3 board?
From: Mike Harrison <mike@whitewing.co.uk>
Date: Mon, 08 May 2006 20:02:52 GMT
Links: << >> << T >> << A >>

Anyone know a UK stockist for the Xilinx/Digilent S3 board ? 
(or anyone have a spare one - I only want the PCB) 
Xilinx website seems to have discontinued it & only lists Cedar as UK disti & their website has
minimal info, and Avnet only list their own boards as far as I can see.
Digilent want  $32 to ship USPS  (which would cost $9 in a GP flat-rate envelope)

Article: 101936
Subject: Re: flashing a led
From: "Marlboro" <ccon67@netscape.net>
Date: 8 May 2006 13:16:08 -0700
Links: << >> << T >> << A >>

Do you have a counter? (wthout clocks )

Article: 101937
Subject: Re: Xilinx 3s8000?
From: Ron <News5@spamex.com>
Date: Mon, 08 May 2006 13:17:02 -0700
Links: << >> << T >> << A >>

frank wrote:
> uh, guy - why the hell u wanna brute force rsa with an fpga.
 > there r quite better (faster and cheaper) methods to do so.

Example please? RSA-640 was solved with a distributed network of 
something like 80 Opterons doing sieving. I wouldn't call those "cheap."

> hope u calculated the throughput and the years/centurys of trying.

That's one of the the shortcomings of ECM that Tom touched on earlier. 
Unlike traditional factorization methods, ECM doesn't even guarantee any 
result at all! Because of that, I had to have two status LEDs; one to 
indicate completion, and another to indicate whether or not a solution 
was found.

The average throughput rate will hopefully be blazingly fast at about 
one or two bits per day. ;-)  There is no input to the FPGA because the 
number to be factored is hard coded into the FPGA (although I could 
easily read it from an external device if needed), and the factor (if 
found) will be displayed on the board's LCD display, so the only thing 
connected to the board during operation is power.

Because of the probabilistic nature of ECM, to the best of my knowledge 
no one has ever been able to calculate how long ECM would require on 
average for a particular factorization. I wonder if Tom Womack has 
investigated this in his work with ECM?

Ron

Article: 101938
Subject: Re: Xilinx 3s8000?
From: "Peter Alfke" <peter@xilinx.com>
Date: 8 May 2006 14:07:37 -0700
Links: << >> << T >> << A >>

Ron, it's amazing how nice and patient you can be when you want to...
Greetings
Peter Alfke

Article: 101939
Subject: Re: Xilinx 3s8000?
From: "Isaac Bosompem" <x86asm@gmail.com>
Date: 8 May 2006 14:14:04 -0700
Links: << >> << T >> << A >>


Thomas Womack wrote:
> In article <1147022814.787257.294510@i39g2000cwa.googlegroups.com>,
> Peter Alfke <alfke@sbcglobal.net> wrote:
> >
> >Ron wrote:
> >> So to multiply two 704 bit numbers
> >> together (depending upon how it's implemented of course) would require
> >> roughly sixty 64-bit multiplies and a bunch of adds. ...
> >
> >If I remember right, 704 is 11 times 64, so the multiplication would
> >take 121 of those 64-bit multipliers, not "roughly sixty"...
>
> It depends on precise details of the implementation, and you have to
> write moderately ugly code because the x86 multiply instruction
> produces its outputs in fixed registers, but if you apply

You could use the IMUL instruction (signed multiply) you free yourself
from that restriction, you have to make certain that your product will
fit in 32-bits and that your values stay in their restricted place (no
overflow).

Article: 101940
Subject: Re: flashing a led
From: "Peter Alfke" <peter@xilinx.com>
Date: 8 May 2006 14:27:57 -0700
Links: << >> << T >> << A >>

A counter has a clock by definition. The clock is the signal you are
counting. It either comes in from the outside, or you can generate an
internal clock bty means of a string of buffers plus one inverter,
connected back to the input ( a ring oscillator)
The frequency stability is bad, + or - 50%, but in some cases (like
this one) nobody cares. Anything between 1 kHz and 100 MHz would do the
job.
Peter Alfke

Article: 101941
Subject: Re: Funky experiment on a Spartan II FPGA
From: "Peter Alfke" <peter@xilinx.com>
Date: 8 May 2006 15:51:15 -0700
Links: << >> << T >> << A >>

Xilinx gives complete data on thermal resistance with and without
heatsink and airflow.
We cannot give blanket data for power consumption, because it depends
on the aggregate frequency times capacitance product of every node
(assuming the same Vcc for all of them).
That is a problem shared by all programmable devices, but not shared by
ASICs and ASSPs, like microprocessors. They usually operate under
fairly well-specified internal conditions. FPGAs do not.

"Worst case" would be a shift register running at max frequency. Such a
design is not only unrealistic, but would most likely overheat even
with the best heatsink.
But if you reduce the frequency, you can easily test this "worst-case"
design. Just do not overdo the frequency...
Peter Alfke

Article: 101942
Subject: Re: Xilinx 3s8000?
From: Ron <News5@spamex.com>
Date: Mon, 08 May 2006 16:07:18 -0700
Links: << >> << T >> << A >>

Just for fun, here are the figures for a bus-width of 704 bits and 1024 
bits.

Device utilization summary:  (704 bit bus-width)
---------------------------
Selected Device : 3s500epq208-4
  Number of Slices:                    2592  out of   4656    55%
  Number of Slice Flip Flops:          1779  out of   9312    19%
  Number of 4 input LUTs:              4176  out of   9312    44%
  Number of bonded IOBs:                 18  out of    158    11%
  Number of GCLKs:                        1  out of     24     4%

Device utilization summary:  (1024 bit bus-width)
---------------------------
Selected Device : 3s500epq208-4
  Number of Slices:                    2975  out of   4656    63%
  Number of Slice Flip Flops:          2099  out of   9312    22%
  Number of 4 input LUTs:              4896  out of   9312    52%
  Number of bonded IOBs:                 18  out of    158    11%
  Number of GCLKs:                        1  out of     24     4%


The amazing thing is that the slice and LUT counts seem to increase 
*less* than the bus-width increases (ie; the size of the numbers it can 
multiply). I've taken pains to ensure the optimizer isn't optimizing 
something away that it shouldn't, so as far as I know these numbers are 
correct.

The synthesizer reports a maximum frequency of 58MHz for the 64 bit 
design, 16MHz for the 704 bit design, 12 MHz for the 1024 bit design "as 
is" without any tweaking to improve the timing, so it should take about 
1.1 microseconds to multiply two 64 bit numbers together, and 85 
microseconds to multiply two 1024 bit numbers together.

Ron

Article: 101943
Subject: Re: Xilinx 3s8000?
From: Mike Harrison <mike@whitewing.co.uk>
Date: Mon, 08 May 2006 23:27:10 GMT
Links: << >> << T >> << A >>

On Mon, 08 May 2006 16:07:18 -0700, Ron <News5@spamex.com> wrote:

>Just for fun, here are the figures for a bus-width of 704 bits and 1024 
>bits.
>
>Device utilization summary:  (704 bit bus-width)
>---------------------------
>Selected Device : 3s500epq208-4
>  Number of Slices:                    2592  out of   4656    55%
>  Number of Slice Flip Flops:          1779  out of   9312    19%
>  Number of 4 input LUTs:              4176  out of   9312    44%
>  Number of bonded IOBs:                 18  out of    158    11%
>  Number of GCLKs:                        1  out of     24     4%
>
>Device utilization summary:  (1024 bit bus-width)
>---------------------------
>Selected Device : 3s500epq208-4
>  Number of Slices:                    2975  out of   4656    63%
>  Number of Slice Flip Flops:          2099  out of   9312    22%
>  Number of 4 input LUTs:              4896  out of   9312    52%
>  Number of bonded IOBs:                 18  out of    158    11%
>  Number of GCLKs:                        1  out of     24     4%
>
>
>The amazing thing is that the slice and LUT counts seem to increase 
>*less* than the bus-width increases (ie; the size of the numbers it can 
>multiply). I've taken pains to ensure the optimizer isn't optimizing 
>something away that it shouldn't, so as far as I know these numbers are 
>correct.
>
>The synthesizer reports a maximum frequency of 58MHz for the 64 bit 
>design, 16MHz for the 704 bit design, 12 MHz for the 1024 bit design "as 
>is" without any tweaking to improve the timing, so it should take about 
>1.1 microseconds to multiply two 64 bit numbers together, and 85 
>microseconds to multiply two 1024 bit numbers together.


Presumably you could do it rather quicker using the S3's multiplier blocks.....

Article: 101944
Subject: Re: Putting the Ring into Ring oscillators
From: Mike Harrison <mike@whitewing.co.uk>
Date: Mon, 08 May 2006 23:29:34 GMT
Links: << >> << T >> << A >>

On Tue, 09 May 2006 07:56:10 +1200, Jim Granville <no.spam@designtools.co.nz> wrote:

>
>This news is interesting
>
>http://www.eet.com/news/latest/showArticle.jhtml?articleID=187200783&pgno=2
>
>  You do need a hype filter when reading this, and many claims are 
>extropolation-gone-wrong, but the base idea already exists in ring
>osc designs inside FPGA now.
>
>  Seems ( with the right tools ) you could extend this inside a FPGA, by 
>creating a large physical ring (long routes), with the sprinkled 
>buffers. The physical delays would reduce the process variations in the
>clock, and you get the phase taps 'for free'.
>  - but the tools _will_ need to co-operate :)
>
>  We have done this inside CPLDs, and get appx 1.3ns granularity.
>
>With FPGAs the buffer delays are much lower, and the routing
>can be made to dominate.
>
>  Sounds like a project for Antti :)
>
>-jg

Just a silly thought - how about using a very long async delay path as a memory device - like the
mercury delay-line memories of olden times . Not useful but maybe an interesting exercise for those
with too much time on their hands....

Article: 101945
Subject: Re: FPGA-based hardware accelerator for PC
From: ptkwt@aracnet.com (Phil Tomson)
Date: 9 May 2006 00:09:23 GMT
Links: << >> << T >> << A >>

In article <e3mq62$k21$2@news.lysator.liu.se>,
Andreas Ehliar  <ehliar@lysator.liu.se> wrote:
>On 2006-05-06, Piotr Wyderski
><wyderski@mothers.against.spam-ii.uni.wroc.pl> wrote:
>> What could it accelerate? Modern PCs are quite fast beasts...
>> If you couldn't speed things up by a factor of, say, 300%, your
>> device would be useless. Modest improvements by several tens
>> of percents can be neglected -- Moore's law constantly works
>> for you. FPGAs are good for special-purpose tasks, but there
>> are not many such tasks in the realm of PCs.
>
>One interesting application for most of the people on this
>newsgroup would be synthesis, place & route and HDL simulation.
>My guess would be that these applications could be heavily
>accelerated by FPGA:s. My second guess that it is far from trivial
>to actually do this :)
>

Certainly on the simulation side of things various companies like Ikos 
(are they still around?) have been doing stuff like this for years.  

To some extent this is what ChipScope and Synplicity's Identify are doing 
only using more of a logic analyzer metaphor.  Breakpoints are set and 
triggered through JTAG.

As far as synthesis itself and P&R I would think that these could be 
acellerated in a highly parallel architecture like an FPGA.

There are lots of algorithms that could be sped up in an FPGA - someone 
earlier in the thread said that the set of algorithms that could benefit 
from the parallelism available in FPGAs was small, bit I suspect it's 
actually quite large.

Phil

Article: 101946
Subject: Re: FPGA-based hardware accelerator for PC
From: ptkwt@aracnet.com (Phil Tomson)
Date: 9 May 2006 00:12:40 GMT
Links: << >> << T >> << A >>

In article <e3nvv7$h30$1@atlantis.news.tpi.pl>,
Piotr Wyderski <wyderskiREMOVE@ii.uni.wroc.pl> wrote:
>Andreas Ehliar wrote:
>
>> One interesting application for most of the people on this
>> newsgroup would be synthesis, place & route and HDL simulation.
>> My guess would be that these applications could be heavily
>> accelerated by FPGA:s.
>
>A car is not the best tool to make another cars.
>It's not a bees & butterflies story. :-) Same with FPGAs.

Err... well, cars aren't exactly reprogrammable for many different 
purposes, though, are they?

>
>> My second guess that it is far from trivial to actually do this :)
>
>And who actually would need that?
>

Possibly you?  What if we could decrease your wait for P&R from hours to 
minutes?  I suspect you'd find that interesting, no?



Phil

Article: 101947
Subject: Re: Xilinx 3s8000?
From: Ron <News5@spamex.com>
Date: Mon, 08 May 2006 17:15:39 -0700
Links: << >> << T >> << A >>

Mike Harrison wrote:
> Presumably you could do it rather quicker using the S3's multiplier blocks.....

Good point, but then I'd be tied into a particular FPGA. The multipliers 
are very impressive however. If I ever get my design to fit on 
something, then I can start taking advantage of things like the built-in 
multipliers to speed things up. Lets see, 18x18->36 bits in less than 5 
ns. For a 1024 bit multiply, it would take roughly 1,624 eighteen bit 
multiplies and a bunch of multi-precision additions, which translates 
into around 8 microseconds per 1024 bit word! Very impressive indeed.

Ron

Article: 101948
Subject: Re: Xilinx 3s8000?
From: Ron <News5@spamex.com>
Date: Mon, 08 May 2006 17:22:34 -0700
Links: << >> << T >> << A >>

P.S.
Before someone catches my error, yes indeed you could run some of these 
multiplies in parallel to cut the timing even more. The datasheet says 
the Spartan-3E devices have between 4 to 36 dedicated multiplier blocks 
per device, so depending on how many there are on the FPGA the 8 
microseconds I mentioned earlier could be cut by as much as 1/4 to 
1/36th to 22ns for 1024 bits!!! I will definitely have to look into this 
at some point. It would be great if a multiprecision package for the 
multipliers were already available in Verilog.

Article: 101949
Subject: Re: Anyone use Xilinx ppc405 profiling tools?
From: "Joseph" <joeylrios@gmail.com>
Date: 8 May 2006 17:37:33 -0700
Links: << >> << T >> << A >>

Alan,

We were having issues with the EDK profiling tools for ppc405 also.
And we were/are using 7.1.  Now I am a little anxious to get back to
the lab and see if an upgrade to 8.1 makes things go more smoothly.
Thanks for your posts.  Oh, and in response to one of your questions:
lots of folks are using the ppc405, but I am not sure how many are
using the Xilinx profiling tools (for reasons you have already
discovered!).

Joey

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search