Messages from 99150

Article: 99150
Subject: Re: PacoBlaze with multiply and 16-bit add/sub instructions
From: "Pablo Bleyer Kocik" <pablobleyer@hotmail.com>
Date: 20 Mar 2006 15:54:06 -0800
Links: << >> << T >> << A >>

Jim Granville wrote:
>
> Sounds impressive.
> You have seen the AS Assembler, and the Mico8 from Lattice ?

 Yes, I am very much aware of Mico8 and I have used AS in several
projects in the past. I know that it supports PicoBlaze (and Mico8
now). But what I want to do now is a small version of a language like
HLA or terse for PicoBlaze. Something simple and readable that is easy
to modify like the current KCAsm (hey, adding the mul and add/sub
instructions took less than one minute. ;o)

 Here is what sarKCAsm is currently looking like (currently a JavaCC
implementation, but I am swapping to ANTLR now because it has better
support for trees).

---8<---

	s0 = $ca ; load
	s1 = s0 + $fe ; same as s1 = s0, s1 += $fe
	func($be, $ef) ; function call, s0 = $be, s1 = $ef

	s3 = 16

loop:
	func(s0, s1)
	s0 == $55 ; compare
	done Z? ; conditional jump
	s3 -= 1
	done Z?
	loop ; unconditional jump

done:
	done

func(s0: s0, s1): ; result + clobber list
	s0 <- $0 ; read from port 0
	s0 ^= s1 ; xor
	s1 << C ;  sla
	# ; return

--->8---

> FWIR the Mioo8 is very similar to PicoBlaze ( as expected, both are
> tiny FPGA targeted CPUs ), but I think with a larger jump and call reach
> (but simpler RET options).
> If you are loading on features, the call-lengths might need attention ?

 For now the limits of the PicoBlaze model have been within my needs
(IIRC, mico8 has the same 10-bit jumps/calls as PB3 and it is very
isomorphic to it). My main drive to create PacoBlaze was to get the
most versatile processor that I could use as a peripheral controller in
my projects (eg motor control, bus controller, PWM generator, audio
co-processor, specifically in the JBRD of my Javabotics project,
http://bleyer.org/javabotics/). It isn't difficult to extend the memory
model of PicoBlaze using PacoBlaze, though.

> Have you tried targeting this to a lattice device ?

 Not yet. I plan to synthesize the core using different tools that I
may have access to, but that is not in my list of priorities.

 Cheers.

--                 /"Naturally, there's got to be some
PabloBleyerKocik / limit, for I don't expect to live
 pablo          / forever, but I do intend to hang on
  @bleyer.org  / as long as possible." -- Isaac Asimov

Article: 99151
Subject: Re: Looking for a V4FX development board
From: "Brannon" <brannonking@yahoo.com>
Date: 20 Mar 2006 16:17:58 -0800
Links: << >> << T >> << A >>

I've got a Dini DN8000K10 in hand that seems to work quite well and
having the features you were looking for.

Article: 99152
Subject: Re: slice macro replace the bus macro in the virtex-4 how to do that?????
From: "Valerios" <carik.valera@gmail.com>
Date: 20 Mar 2006 16:30:02 -0800
Links: << >> << T >> << A >>

Hello Ivan.
For the assured correct coupling what bus macro is it better to use?
Probably Hard Macro?
I will be very thankful if you were able to lay out separately
something like a *.nmc or *.xdl file.

Beforehand thankful.

Article: 99153
Subject: Microblaze to My IP-Core connection
From: "mogogo" <awetyemane@gmail.com>
Date: Mon, 20 Mar 2006 18:34:59 -0600
Links: << >> << T >> << A >>

Hi,
I created my test IP-core (ANDgate) on ISE and imported it to XPS to
connect it to the microblaze. I am using Spartan 3 starter kit.

I defined my core to have only 1 register with 32 bit width = slv_reg0.

This means i have to put my ANDgate result(X AND Y = Z) to slv_reg0.
slv_reg0 <= Z.  Then i tried to read this value from microblaze processor.
but i couldn't read it. My result is always 0x00000000. Can some one help
me on this problem.

Mogogo

Article: 99154
Subject: Re: FPGA FIR advice
From: Ray Andraka <ray@andraka.com>
Date: Mon, 20 Mar 2006 19:42:35 -0500
Links: << >> << T >> << A >>

Isaac Bosompem wrote:
> Allan Herriman wrote:
> 
>>On 20 Mar 2006 07:41:37 -0800, "Isaac Bosompem" <x86asm@gmail.com>
>>wrote:
>>
>>
>>>John_H wrote:
>>>
>>>>Isaac Bosompem wrote:
>>>>
>>>>>Hi Ray and Peter,
>>>>>
>>>>>I am sorry for hijacking your thread Roger , but I think my question is
>>>>>relevant.
>>>>>
>>>>>I was thinking of using about 8 FIR (bandpass) filters in parallel to
>>>>>create a graphic equalizer. Now I know there are some phase problems
>>>>>with this method but it seems to me like a very logical way to go about
>>>>>this problem. I was wondering if you guys know of any better methods?
>>>>>
>>>>>I also was thinking of using 16 taps each.
>>>>>
>>>>>320 FF's is not a lot actually. My XC3S200 (which is probably dirt
>>>>>cheap) has almost 4000 FF's. Enough for your filter network and much
>>>>>more.
>>>>>
>>>>>-Isaac
>>>>
>>>>Another great advantage to FPGA FIRs: most of the time the FIRs are
>>>>symmetric which allows half the taps (half the multipliers) to implement
>>>>the full FIR by adding t-d to t+d before multiplying by the common
>>>>coefficient, the implementation is more elegant.
>>>
>>>Hi John,
>>>
>>>I cannot see what you mean? Can you offer a quick example?
>>
>>An FIR filter is implemented as a dot product of a constant vector and
>>a vector made up of the input samples delayed,
>>
>>y[n] = c[0].x[n] + c[1].x[n-1] + ... c[m-1].x[n-m+1]
>>for an m tap filter.
>>
>>The c[n] are the coefficients.  If the filter has a linear phase
>>response, the coefficients are symmetric, so
>>c[0] = c[m-1], c[1] = c[m-2], etc.
>>
>>We can group the expression for y[n] as follows:
>>
>>y[n] = c[0].(x[n] + x[n-m+1) + c[1].(x[n-1] + x[n-m+2]) + ...
>>
>>This has (roughly) halved the number of multipliers.
>>I say roughly, because m is often odd.
>>
>>Regards,
>>Allan
> 
> 
> Ahh, I see, thanks.
> 
> Do you guys know of a good filter design software, I have an old one
> for DOS, but it is quite difficult to use (Also I do not have access to
> MATLAB).
> 
> -Isaac
> 

One of the tools I use is Scopefir, which is a FIR filter design tool. 
http://www.iowegian.com  For the money, it is a good value.

Article: 99155
Subject: Re: Fixed vs Float ?
From: Al Clark <dsp@danvillesignal.com>
Date: Tue, 21 Mar 2006 00:45:57 GMT
Links: << >> << T >> << A >>

I don't take issue with anything Tim stated, but I will add a few 
comments.

I think that the added complexity of floating point in an FPGA will 
probably be enough to rule it out. 

Fixed point implementations are often better than floating point 
implementations. This comparison tends to be true when the result of a 
multiplication is twice the width as the inputs in the fixed point case 
and when a floating point result is the same size as its inputs. This is 
usually the case in a DSP processor. This also assumes that you use a 
filter structure that takes advantage of the long result.

Most IIR filters are constructed as cascaded biquads (and sometimes one 
first order section). The choice of the biquad structure has a 
significant impact on performance. If we restrict our choices to one of 
the direct forms, then usually the direct form I (DF I) structure is best 
for fixed point implementations. This assumes that we have a double wide 
accumulator. If this is not the case, the DF I is not a particularly good 
structure. Floating point implementations are usually implemented as DF 
II or the slightly better transposed DF II.

You can also improve the performance of a fixed point DF I by adding 
error shaping. This is relatively cheap from a resoursce point of view in 
this structure.

As Tim pointed out, you have to pay attention to scaling with fixed point 
implementations. 

Like every design problem, you need to examine the performance 
requirements carefully. I would look at the pole-zero placement on the 
unit circle. For you need a high Q filter at some low frequency as 
compared to the sampling rate, the math precision is going to be 
critical. The poles might not be on the unit circle, but they will be 
very close. If the precision is poor, the filter is likely to blow up. In 
other situations, just about anything will work. 

Here is a good link describing biquad structures:
  
http://www.earlevel.com/Digital%20Audio/Biquads.html

-- 
Al Clark
Danville Signal Processing, Inc.
--------------------------------------------------------------------
Purveyors of Fine DSP Hardware and other Cool Stuff
Available at http://www.danvillesignal.com





Tim Wescott <tim@seemywebsite.com> wrote in
news:FNadnRoXwvTbsYLZRVn-qQ@web-ster.com: 

> Tim Wescott wrote:
> 
>> Roger Bourne wrote:
>> 
>>> Hello all,
>>>
>>> Concerning digital filters, particurlarly IIR filters, is there a
>>> preferred approach to implementation - Are fixed-point preferred
>>> over floating-point calculations ? I would be tempted to say yes.
>>> But, my google search results leave me baffled for it seems that
>>> floating-point computations can be just as fast as fixed-point.
>>> Furthermore, assuming that fixed-point IS the preferred choice, the
>>> following question crops up:
>>> If the input to the digital filter is 8 bits wide and the
>>> coefficents are 16 bits wide, then it would stand to reason that the
>>> products between the coefficients and the digital filter
>>> intermediate data values will be 24 bits wide. However, when this
>>> 24-bit value is to get back in the delay element network (which is
>>> only 8 bits wide), some (understatemen) resolution will be lost. How
>>> is this resolution loss dealt with? so it will lead to an erroneous
>>> filter? -Roger
>>>
>> This is a simple question with a long answer.
>> 
>> Floating point calculations are always easier to code than
>> fixed-point, if for no other reason than you don't have to scale your
>> results to fit the format.
>> 
>> On a Pentium in 'normal' mode floating point is just about as fast as
>> fixed point math; with the overhead of scaling floating point is 
>> probably faster -- but I suspect that fixed point is faster in MMX
>> mode (someone will have to tell me).  On a 'floating point' DSP chip
>> you can also expect floating point to be as fast as fixed.
>> 
>> On many, many cost effective processors -- including CISC, RISC, and 
>> fixed-point DSP chips -- fixed point math is significantly faster
>> than floating point.  If you don't have a ton of money and/or if your
>> system needs to be small or power-efficient fixed point is mandatory.
>> 
>> In addition to cost constraints, floating point representations use
>> up a significant number of bits for the exponent.  For most filtering
>> applications these are wasted bits.  For many calculations using
>> 16-bit input data the difference between 32 significant bits and 25
>> significant bits is the difference between meeting specifications and
>> not. 
>> 
>> For _any_ digital filtering application you should know how the data 
>> path size affects the calculation.  Even though I've been doing this
>> for a long time I don't trust to my intuition -- I always do the
>> analysis, and sometimes I'm still surprised.
>> 
>> In general for an IIR filter you _must_ use significantly more bits
>> for the intermediate data than the incoming data.  Just how much
>> depends on the filtering you're trying to do -- for a 1st-order
>> filter you usually to do better than the fraction of the sampling
>> rate you're trying to filter, for a 2nd-order filter you need to go
>> down to that fraction squared*.  So if you're trying to implement a
>> 1st-order low-pass filter with a cutoff at 1/16th of the sample rate
>> you need to carry more than four extra bits; if you wanted to use a
>> 2nd-order filter you'd need to carry more than 8 extra bits.
>> 
>> Usually my knee-jerk reaction to filtering is to either use 
>> double-precision floating point or to use 32-bit fixed point in 1r31 
>> format.  There are some less critical applications where one can use 
>> single-precision floating point or 16-bit fractional numbers to 
>> advantage, but they are rare.
>> 
>> * There are some special filter topologies that avoid this, but if 
>> you're going to use a direct-form filter out of a book you need
>> fraction^2. 
>> 
> Oops -- thought I was responding on the dsp newsgroup.
> 
> Everything I said is valid, but if you're contemplating doing this on
> an FPGA the impact of floating point vs. fixed is in logic area and
> speed (which is why fast floating point chips are big, hot and
> expensive). Implementing an IEEE compliant floating point engine takes
> a heck of a lot of logic, mostly to handle the exceptions.  Even if
> you're willing to give up compliance for the sake of speed you still
> have some significant extra steps you need to take with the data to
> deal with that pesky exponent.  I'm sure there are various forms of
> floating point IP out there that you could try on for size to get a
> comparison with fixed-point math.
>

Article: 99156
Subject: Re: Disk/LCD defect tolerant models for FPGA sales
From: Ray Andraka <ray@andraka.com>
Date: Mon, 20 Mar 2006 20:49:51 -0500
Links: << >> << T >> << A >>

fpga_toys@yahoo.com wrote:

> Seems that it can be completely transparent with very very modest
> effort. The parts all have non-volatile storage for configuration. If
> the defect list is stored with the bitstream, then the installation
> process to that storage just needs to read the defect list out before
> erasing it, merge the defect list into the new bit stream, as the part
> is linked (place and routed) for that system.
> With a system level design based on design for test, and design for
> defect management, the costs are ALWAYS in favor of defect management
> as it increases yeilds at the mfg, and extends the life in the field by
> making the system tollarent of intermittants that escape ATE and life
> induced failures like migration effects.
> 

Which reconfigurable FPGAs would those be with the non-volatile 
bitstreams?  I'm not aware of any.  Posts like these really make me 
wonder whether you've really done any actual FPGA design.  They instead 
indicate to me that perhaps it has been all back of the envelope concept 
stage stuff with little if any carry through to a completed design 
(which is fine, but it has to be at least tempered somewhat with actual 
experience garnered from those who have been there).  In particular, 
your concerns about power dissipation being stated on the data sheet, 
your claims of high performance using HLLs without getting into hardware 
description, your complaints about tool licensing while not seeming to 
understand the existing tool flow very well, the handwaving in the 
current discussion you are doing to convince us that defect mapping is 
economically viable for FPGAs, and now this assertion that all the parts 
have non-volatile storage sure makes it sound like you don't have the 
hands on experience with FPGAs you'd like us to believe you have.

> 
>>5) Timing closure has to be considered when re-spinning an FPGA
>>bitstream to avoid defects.  In dense high performance designs, it may
>>be difficult to meet timing in a good part, much less one that has to
>>allow for any route to be moved to a less direct routing.
> 
> 
> In RC that is not a problem ... it's handled by design. For embedded
> designs, that is a different problem.

What are you doing different in the RC design then?  From my 
perspective, the only ways to be able to be able to tolerate changes in 
the PAR solution and still make timing are to either be leaving a 
considerable amount of excess performance margin (ie, not running the 
parts at the high performance/high density corner), or spending an 
inordinate amount of time looking for a suitable PAR solution for each 
defect map, regardless of how coarse the map might be.

 From your previous posts regarding open tools and use of HLLs, I 
suspect it is more on the leaving lots of performance on the table side 
of things.  In my own experience, the advantage offered by FPGAs is 
rapidly eroded when you don't take advantage of the available 
performance.  However, you also had a thread a while back where you were 
overly concerned about thermal management of FPGAs, claiming that your 
RC designs could potentially trigger a mini China syndrome event in your 
box.  If you are leaving enough margin in the design so that it is 
tolerant to fortuitous routing changes to work around unique defects, 
then I sincerely doubt you are going to run into the runaway thermal 
problems you were concerned with.  I've got a number of very full 
designs in modern parts (V2P, V4) clocked at 250-400 MHz that function 
well within the thermal spec with at most a passive heatsink and modest 
airflow.  Virtually none of those designs would tolerate a quick reroute 
to avoid a defect on a critical route path without going through an 
extensive reroute of signals in that region, and that is assuming there 
was the necessary hooks in the tools to mark routes as 'do not use' (I 
am not aware of any hooks like that for routing, only for placement).

Still, I'd like to hear what you have to say. If nothing else, it has 
sparked an interesting conversation.  Having done some work in the RC 
area, and having done a large number of FPGA designs over the last 
decade (My 12 year old business is exclusively FPGA design, with a heavy 
emphasis on high performance DSP applications), most of which are 
pushing the performance envelope of the FPGAs, I am understandibly very 
skeptical about your chance of achieving all your stated goals, even if 
you did get everything you've complained about not having so far.

Show me that my intuition is wrong.

>

Article: 99157
Subject: Re: PacoBlaze with multiply and 16-bit add/sub instructions
From: ziggy <ziggy@fakedaddress.com>
Date: Tue, 21 Mar 2006 01:51:40 GMT
Links: << >> << T >> << A >>

In article <1142888577.488377.237030@t31g2000cwb.googlegroups.com>,
 "Pablo Bleyer Kocik" <pablobleyer@hotmail.com> wrote:

>  Hello people.
> 
>  As I announced some days ago, I updated the PacoBlaze3 core
> [http://bleyer.org/pacoblaze/] now with a wide ALU that supports an 8x8
> multiply instruction ('mul') and 16-bit add/sub operations ('addw',
> 'addwcy', 'subw', 'subwcy'). The new extension core is called
> PacoBlaze3M. It could be useful performing small DSP functions and math
> subroutines when there is a spare hardware multiplier block.

Cool, though I have not had had time to even get 2.0 running yet.. ( 
life got in the way of fun stuff )

Article: 99158
Subject: Re: PacoBlaze with multiply and 16-bit add/sub instructions
From: Jim Granville <no.spam@designtools.co.nz>
Date: Tue, 21 Mar 2006 14:24:44 +1200
Links: << >> << T >> << A >>

Pablo Bleyer Kocik wrote:

> Jim Granville wrote:
> 
>>Sounds impressive.
>>You have seen the AS Assembler, and the Mico8 from Lattice ?
> 
> 
>  Yes, I am very much aware of Mico8 and I have used AS in several
> projects in the past. I know that it supports PicoBlaze (and Mico8
> now). But what I want to do now is a small version of a language like
> HLA or terse for PicoBlaze. 

I realised that; - just checking you knew of them :)

> Something simple and readable that is easy
> to modify like the current KCAsm (hey, adding the mul and add/sub
> instructions took less than one minute. ;o)

Good targets.

>  Here is what sarKCAsm is currently looking like (currently a JavaCC
> implementation, but I am swapping to ANTLR now because it has better
> support for trees).

> ---8<---
> 
> 	s0 = $ca ; load
> 	s1 = s0 + $fe ; same as s1 = s0, s1 += $fe
> 	func($be, $ef) ; function call, s0 = $be, s1 = $ef
> 
> 	s3 = 16
> 
> loop:
> 	func(s0, s1)
> 	s0 == $55 ; compare
> 	done Z? ; conditional jump
> 	s3 -= 1
> 	done Z?
> 	loop ; unconditional jump
> 
> done:
> 	done
> 
> func(s0: s0, s1): ; result + clobber list
> 	s0 <- $0 ; read from port 0
> 	s0 ^= s1 ; xor
> 	s1 << C ;  sla
> 	# ; return

Will you also do boolean (Flag) functions ?

General comments: ( feel free to ignore... )

The expression clarity makes good sense, and I also like languages that 
can accept flexible constants: viz $55 or 0x55 or 55H, or 2#01010101 or 
16#55, or 2#01_0101_01.

I've also seen  XOR AND OR NOT etc keywords supported, as well as the
terse C equivalents. ( which are a real throwback to when source size 
mattered ).

but I'm not sure about labels in the left most code-column - that makes 
code harder to scan, and indent etc, and not as clear in a syntax 
highighted editor....

ie If you have to add a comment, then the language is probably not clear 
enough....

# for return ?         => why? - why not return, or RET or IFnZ RET
label then condition ? => most languages are IF_Z THEN or if_nZ DestAddr
Label for Loop jmp ?   => REPEAT Label, or LOOP label

If a 12yr old kid can read the source, and not need a raft of prior 
knowledge, then that's a good test of any language :)

-jg

Article: 99159
Subject: Re: PacoBlaze with multiply and 16-bit add/sub instructions
From: Jim Granville <no.spam@designtools.co.nz>
Date: Tue, 21 Mar 2006 14:30:15 +1200
Links: << >> << T >> << A >>

Pablo Bleyer Kocik wrote:

>  For now the limits of the PicoBlaze model have been within my needs
> (IIRC, mico8 has the same 10-bit jumps/calls as PB3 and it is very
> isomorphic to it). 

  I think I recall the Mico8 had more obvious expansion space in the 
opcodes - but either way, this is the sort of expansion that is nice to 
allow for early-on.

  With more smarts, users _are_ going to need larger address space :)

  The assembler should accept either size, and warn on the 
smaller/larger ceiling, based on a target/build family define.

-jg

Article: 99160
Subject: Re: DDS
From: Allan Herriman <allanherriman@hotmail.com>
Date: Tue, 21 Mar 2006 13:50:22 +1100
Links: << >> << T >> << A >>

On Mon, 20 Mar 2006 17:06:45 GMT, "John_H" <johnhandwork@mail.com>
wrote:

>"Allan Herriman" <allanherriman@hotmail.com> wrote in message 
>news:tumt12l1jq2cnppqn4e9ki3iiceni5d5bb@4ax.com...
>> On Mon, 20 Mar 2006 15:52:41 -0000, "Symon" <symon_brewer@hotmail.com>
>> wrote:
>>
><snip>
>>>Well, from a 200MHz clock, you can get exactly 1uHz if you make the
>>>accummulator overflow at 200,000,000,000,000. i.e.
>>>
>>>accum <= (accum + freq) mod 2E14;
>>>
>>>The accummulator doesn't have to saturate at a power of two.
>>
>> Depends on your definition of "regular" DDS I suppose...
>>
>> The logic to implement the pipelined mod 2e14 operation will probably
>> be a lot harder than simply making a regular binary phase accumulator
>> a few bits wider.  Still, if the requirement is for a step size of
>> *exactly* 1uHz, then the mod operation is needed.
><snip>
>
>The logic is extremely simple:
>When the MSbit changes, rather than adding PhaseInc add 
>((2^48)-(2e+14))/2+PhaseInc.  The dual-increment value is very easy to 
>support.  I added the /2 in there for "whenever" the MSbit changes rather 
>than just tracking the high-to-low transition. 

Of course!  I was thinking that the decision to add the extra
(2^48-2e14) would have to take place prior to the register, but now I
realise that it can be pipelined, which makes it possible to get it to
run at 200MHz.

This may complicate downstream processing, e.g. use of CORDIC to
generate a sinusoid.

Regards,
Allan

Article: 99161
Subject: Re: Disk/LCD defect tolerant models for FPGA sales
From: Jim Granville <no.spam@designtools.co.nz>
Date: Tue, 21 Mar 2006 14:51:13 +1200
Links: << >> << T >> << A >>

Ray Andraka wrote:
> fpga_toys@yahoo.com wrote:
> 
>> Seems that it can be completely transparent with very very modest
>> effort. The parts all have non-volatile storage for configuration. If
>> the defect list is stored with the bitstream, then the installation
>> process to that storage just needs to read the defect list out before
>> erasing it, merge the defect list into the new bit stream, as the part
>> is linked (place and routed) for that system.
>> With a system level design based on design for test, and design for
>> defect management, the costs are ALWAYS in favor of defect management
>> as it increases yeilds at the mfg, and extends the life in the field by
>> making the system tollarent of intermittants that escape ATE and life
>> induced failures like migration effects.
>>
> 
> Which reconfigurable FPGAs would those be with the non-volatile 
> bitstreams?  

  I think John was meaning store the info in the ConfigFlashMemory.
Thus the read-erase-replace steps.
.. but, you STILL have to get this info into the FIRST design somehow....

-jg

Article: 99162
Subject: Re: Urgent Help Needed!!!!!
From: "Michael Chan" <mchan@itee.uq.edu.au>
Date: Tue, 21 Mar 2006 13:07:47 +1000
Links: << >> << T >> << A >>

His 'complete disregard for manners' consisted of 6 abbreviations.  And 
while I agree his English was thoughtless, I think trashing him the way 
everyone has is even more thoughtless.  Instead of calling his actions 
deplorable, and ridiculing him with bad imitations, perhaps it would have 
been better manners to politely ask him to rephrase his question, and help 
him only once his English was satisfactory.

I'm a student as well, which may be why his post didn't offend me as it has 
some.  But if he was out to 'butcher' the English language as has been 
accused, he could have done far far worse (simply refer to the bad 
imitations).  I don't claim to know much about languages, but these 
abbreviations that have got everyone worked up are becoming more and more 
mainstream.  And when one spends a lot of time informally communicating on 
the internet, it is understandable that speaking in this fashion wouldn't 
immediately register as offensive.  Given the effort the OP went to in 
constructing his post (not a simple "help plx!!!11!one"), a gentle request 
for him to fix his English would have been sufficient IMO.

Article: 99163
Subject: Re: Disk/LCD defect tolerant models for FPGA sales
From: fpga_toys@yahoo.com
Date: 20 Mar 2006 19:07:56 -0800
Links: << >> << T >> << A >>

Ray Andraka wrote:
> Which reconfigurable FPGAs would those be with the non-volatile
> bitstreams?  I'm not aware of any.

What are XC18V04's? Magic ROMs?
What are the platform flash parts? Magic ROMs?
They are CERTAINLY non-volatile every time I've checked.

In fact, nonvolatile includes disks, optical, and just about any other
medium that doesn't go poof when you turn the power off.

 and now this assertion that all the parts
> have non-volatile storage sure makes it sound like you don't have the
> hands on experience with FPGAs you'd like us to believe you have.

Ok Wizard God of FPGA's ... just how do you configure your FPGA's
without having some form of non-volatile storage handy? What ever the
configuration bit stream sources is, if it is reprogramable ... IE
ignore 17xx proms ... you can store the defect list?

UNDERSTAND?

Now, the insults are NOT -- I REPEAT NOT - being civil.

> What are you doing different in the RC design then?

With RC there is an operating system, complete with disk based
filesystem. The intent is to do fast (VERY FAST) place and route on the
fly.

> From my
> perspective, the only ways to be able to be able to tolerate changes in
> the PAR solution and still make timing are to either be leaving a
> considerable amount of excess performance margin (ie, not running the
> parts at the high performance/high density corner), or spending an
> inordinate amount of time looking for a suitable PAR solution for each
> defect map, regardless of how coarse the map might be.

You are finally getting warm.  Several times in this forum I discussed
what I call "clock binning" where the FPGA accel board has several
fixed clocks arranged as integer powers. The dynamic runtime linker
(very fast place and route) places, routes, and assigns the next
slowest clock that matches the code block just linked. The concept is
use the fastest clock that is available for the code block that meets
timing. NOT change the clocks to fix the code.

>  From your previous posts regarding open tools and use of HLLs, I
> suspect it is more on the leaving lots of performance on the table side
> of things.

Certainly ... it may not hardware optimized to the picosecond. Some
will be, but that is a different problem. Shall we discuss every
project you have done in 12 years as though it was the SAME problem
with identical requirements?  I think not. So why do you for me?

In my own experience, the advantage offered by FPGAs is
> rapidly eroded when you don't take advantage of the available
> performance.

The performance gains are measured against single threaded CPU's with
serial memory systems. The performance gains are high degrees of
parallelism with the FPGA. Giving up a little of the best case
performance is NOT a problem. AND if it was, for a large dedicated
application, then by all means, use traditional PAR and fit the best
case clock the the code body.

> If you are leaving enough margin in the design so that it is
> tolerant to fortuitous routing changes to work around unique defects,
> then I sincerely doubt you are going to run into the runaway thermal
> problems you were concerned with.

This is a completely different problem set than that particular
question was addressing. That problem case was about hand packed
serial-parallel MACs doing a Red-Black ordered simulations with kernel
sizes between 80-200 LUT's, tiled in tight, running at best case clock
rate. 97% active logic. VERY high transistion rates.  About the only
thing worse, would be purposefully toggling everything.

A COMPLETELY DIFFERENT PROBLEM is compiling arbitrary C code and
executing it with a compile, link, and go strategy. Example is a
student iterratively testing a piece of code in an edit, compile and
run sequence. In that case, getting the netlist bound to a reasonable
set of LUTs quickly and running the test is much more important than
extracting the last bit of performance from it.

Like it or not .... that is what we mean by using the FPGA to EXECUTE
netlists. We are not designing highly optimized hardware. The FPGA is
simply a CPU -- a very parallel CPU.

> Show me that my intuition is wrong.

First you have taken and merged several different concepts, as though
they were some how the same problem .... from various posting topics
over the last several months.

Surely we can distort anything you might want to present by taking your
posts out of context and arguing them in the worst possible combination
against you.

Let's try - ONE topic, one discussion.

Seems that you have made up your mind. As you have been openly
insulting and mocking ... have a good day.  When are really interested,
maybe we can have a respectful discussion. You are pretty clueless
today.

Article: 99164
Subject: Re: Disk/LCD defect tolerant models for FPGA sales
From: fpga_toys@yahoo.com
Date: 20 Mar 2006 19:11:02 -0800
Links: << >> << T >> << A >>

Jim Granville wrote:
> Ray Andraka wrote:
> > fpga_toys@yahoo.com wrote:
> >>  The parts all have non-volatile storage for configuration.

>   I think John was meaning store the info in the ConfigFlashMemory.
> Thus the read-erase-replace steps.
> .. but, you STILL have to get this info into the FIRST design somehow....

Thanks Jim ... that is EXACTLY what I did say. It doesn't mater if the
configuration storage is on an 18V04, platform flash card, or a disk
drive.

Article: 99165
Subject: Simulation tool
From: "leaf" <adventleaf@gmail.com>
Date: 20 Mar 2006 20:13:43 -0800
Links: << >> << T >> << A >>

Hi,
 What simulation tool will enable me to read internal
signals/registers?
---
leaf

Article: 99166
Subject: Re: Disk/LCD defect tolerant models for FPGA sales
From: Ray Andraka <ray@andraka.com>
Date: Mon, 20 Mar 2006 23:14:39 -0500
Links: << >> << T >> << A >>

John, last time I checked, FPGAs did not get delivered from Xilinx with 
the config prom.  Sure, you can store a defect map on the config prom, 
or on your disk drive, or battery backed sram or whatever, but the point 
is that defect map has to get into your system somehow.  Earlier in this 
thread you were asking/begging Xilinx to provide the defect map, even if 
just to one of 16 quadrants for each non-zero-defect part delivered. 
That leads to the administration nightmare I was talking about.

In the absence of a defect map provided by Xilinx (which you were 
lobbying hard for a few days ago), the only other option is for the end 
user to run a large set of test configurations on each device while in 
system to map the defects. Writing that set of test configurations 
requires a knowledge of the device at a detail that is not available 
publicly, or getting ahold of the Xilinx test configurations, and 
expanding on them to obtain fault isolation.  I'm not sure you realize 
the number of routing permutations that need to be run just to get fault 
coverage of all the routing, switchboxes, LUTs, etc in the device, and 
much less achieve fault isolation. Your posts regarding that seem to 
support this observation.

 > With RC there is an operating system, complete with disk based
 > filesystem. The intent is to do fast (VERY FAST) place and route on the
 > fly.
 >

Now see, that is the fly in the ointment. The piece that is missing is 
the "very fast place and route".  There is and has been a lot of 
research into improving place and route, but the fact of the matter is 
that in order to get performance that will make the FPGA compete 
favorably against a microprocessor is going to require  a fast time to 
completion that is orders of magnitude faster than what we have now 
without giving up much in the way of performance.  Sure, I can slow a 
clock down (by bin steps or using a programmable clock) to match the 
clock to the timing analysis for the current design, but that doesn't 
help you much for many real-world problems where you have a set time to 
complete the task. (yes, I know that may RC apps are not explicitly time 
constrained, but they do have to finish enough ahead of other approaches 
to make them economically justifiable).  Remember also, that the RC FPGA 
starts out with a sizable handicap against a microprocessor with the 
time to load a configuration, plus if the configuration is generated on 
the fly the time to perform place and route.  Once that hurdle is 
crossed, you still need enough of a performance boost over the 
microprocessor to amortize that set-up cost over the processing interval 
to come out ahead.  Obviously, you gain from the parallelism in the 
FPGA, but if you don't also mind the performance angle, it is quite easy 
to wind up with designs that can only be clocked at a few tens of MHz, 
and often that use up so much area that you don't have room for enough 
parallelism to make up for the much lower clock rate.  So that puts the 
dynamically configured RC in a box, where problems that aren't 
repetitive and complex enough to overcome the PAR and configuration 
times are better done on a microprocessor, and problems that take long 
enough to make the PAR time insignificant may be better served by a more 
optimized design than what has been discussed, and we're talking not 
only about PAR results, but also architecturally optimizing the design 
to get the highest clock rates and density.  In my experience, FPGAs can 
do roughly 100x the performance of similar generation microprocessors, 
give or take an order of magnitude depending on the exact application 
and provided the FPGA design is done well.   It is very easy to lose the 
advantage by sub-optimal design.  If I had a dollar for every time I've 
gotten remarks that 100x performance is not possible, or that so and so 
did an FPGA design expecting only 10x and it turned out slower than a 
microprocessor because it wouldn't meet timing etc, I'd be retired.

I guess I owe you an apology for merging your separate projects.  I was 
under the impression (and glancing back over your posts still can 
interpret it this way) that these different topics were all addressing 
facets of the same RC project. I assumed (apparently erroneously) that 
this was all towards the same RC system.  I also apologize for the 
insults, as I didn't mean to insult you or mock you, rather I was trying 
to point out that, taking all your posts together that I thought you 
were trying to hit all the corners of the design space at once, and at 
the same time do it on the cheap with defect ridden parts.  I am still 
not convinced you aren't trying to hit everything at once....you know 
that old good, fast, cheap, pick any two thing.  Rereading my post, I 
see that I let my tone get out of hand, and for that I ask your forgiveness.

In any event, truely dynamic RC remains a tough nut to crack because of 
the PAR and configuration time issues.  By adding the desire to use 
defect ridden parts, you are only making an already tough job much 
harder.  I respectfully suggest you try first to get the system together 
using perfect FPGAs, as I believe you will find you already have an 
enormous task in front of you between the HLL to gates, the need for 
fast PAR, partitioning the problem over multiple FPGAs and between FPGAs 
and software, making a usable user interface and libraries etc, without 
exponentially compounding the problem by throwing defect tolerance into 
the mix.  Baby steps are necessary to get through something as complex 
as this.

Article: 99167
Subject: Re: Disk/LCD defect tolerant models for FPGA sales
From: Jim Granville <no.spam@designtools.co.nz>
Date: Tue, 21 Mar 2006 16:48:27 +1200
Links: << >> << T >> << A >>

Ray Andraka wrote:
<snip>
 > In my experience, FPGAs can
> do roughly 100x the performance of similar generation microprocessors, 
> give or take an order of magnitude depending on the exact application 
> and provided the FPGA design is done well.   It is very easy to lose the 
> advantage by sub-optimal design.  If I had a dollar for every time I've 
> gotten remarks that 100x performance is not possible, or that so and so 
> did an FPGA design expecting only 10x and it turned out slower than a 
> microprocessor because it wouldn't meet timing etc, I'd be retired.

How does a FPGA compare with something like the cell processor ?

  I'd have thought that for reconfig computing, something like an
array of CELLS, with FPGA bridge fabric, would be a more productive
target for RC.
  FPGAs are great at distributed fabric, but not that good at memory
bandwidth, especially at bandwidth/$.
  DSP task can target FPGAs OK, because the datasets are relatively small.
  Wasn't it Seymour Cray whot found that IO and Memory bandwidths
were the key, not the raw CPU grunt ?

-jg

Article: 99168
Subject: Re: DDS
From: "PeterC" <peter@geckoaudio.com>
Date: 20 Mar 2006 21:35:38 -0800
Links: << >> << T >> << A >>


Allan - why not use a down-counter and load it with 2e14, rather than
doing a pipelined 2e14 decode?

Article: 99169
Subject: Re: microprocessor design: where to go from here?
From: jhallen@TheWorld.com (Joseph H Allen)
Date: Tue, 21 Mar 2006 06:25:19 +0000 (UTC)
Links: << >> << T >> << A >>

In article <1142889836.685207.307560@g10g2000cwb.googlegroups.com>,
 <burn.sir@gmail.com> wrote:
>G=F6ran Bilski wrote:
>> My bible on CPU design is "Computer Architecture, A Quantitative Approach=
>"=2E
>>
>> I never stop reading it.
>>
>> G=F6ran Bilski
>
>
>Hello G=F6ran and Ziggy, and thanks for your replies.
>
>G=F6ran: I have the book right here on my desk and it is great. However,
>I was looking for something more hands on. You know, more code &
>algorithms and less statistics :) Something like a grad level textbook.

I just bought "CPU DESIGN - Answers to Frequently Asked Questions" by
Chandra M.R. Thimmannagari (ISBN 0-387-23799-2).  It's kind of a dense quick
reference guide to every interesting topic in modern CPU design.  It covers
everything from architecture to verilog to circuit design to improving
timing to test-benches to physical layout.  I like the low level
details.  For example, "9. Describe with an example a Picker Logic
associated with an Issue Queue in an Out-Of-Order Processor?"

Between this and the previous thread about multi-write port memory, I'm
ready to write my superscalor PIC :-)

-- 
/*  jhallen@world.std.com (192.74.137.5) */               /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}

Article: 99170
Subject: Re: DDS
From: Allan Herriman <allanherriman@hotmail.com>
Date: Tue, 21 Mar 2006 18:00:11 +1100
Links: << >> << T >> << A >>

On 20 Mar 2006 21:35:38 -0800, "PeterC" <peter@geckoaudio.com> wrote:

>
>Allan - why not use a down-counter and load it with 2e14, rather than
>doing a pipelined 2e14 decode?

It's not a counter, it's an accumulator.  With a down counting
accumulator, one would need to decode the underflow then load with
2e14 minus the current frequency input value.  I think this has the
same complexity as the up counting accumulator.

Besides, the OP stated that exact 1uHz resolution was not required, so
it's cheaper and simpler to avoid decoding altogether and use the full
48 bit binary range.

I agree with your sentiments though - it is sometimes eassier  to
implement a down counter than an up counter.

Regards,
Allan

Article: 99171
Subject: An Open Letter to Mr. John Bass (was: Urgent Help Needed!!!!!)
From: "Erik Widding" <widding@birger.com>
Date: 20 Mar 2006 23:06:37 -0800
Links: << >> << T >> << A >>

fpga_toys@yahoo.com wrote:
>
> Emily Postnews ... this is sarcasm:
>
> Q: Another poster can't spell worth a damn. What should I post?
>
> A: Post a followup pointing out all the original author's spelling and
> grammar mistakes. You were almost certainly the only one to notice
> them, genius that you are, so not only will others be intrigued at your
> spelling flame, but they'll get to read such fine entertainments rather
> than any actual addressing of the facts or issues in the message.

Mr. John Bass,

As this is a thinly veiled reference to my last post, I am going to
treat it as directed at me, personnally.  My response is almost
certainly ill advised, as I fear you are going to demonstrate to me and
a lot of other witnesses very shortly.  Einstein's definition of
insanity applies to me directly, for as I write, I do so with the
expectation of a different result than all of those that have come
before me by acknowledging, and not agreeing with, you.

Your post was actually almost funny.  You got the format of the
question and answer right, so either you know the reference, or you
allowed google to be your friend and got bored before actually learning
anything.  Unfortunately, it once again demonstrates that you don't
quite understand.  You were able to focus in on a detail and understand
it well enough to make the joke, but you have in the post, like many of
your other posts, completely missed the bigger picture.

The bigger picture is this: you are a guest in this community.  If you
actually understood the underlying concept about which you made your
joke, you would have been behaving much better leading up to this.  As
a guest in any community proper etiquette is that as the community
defines it, not that as the guest does.

Mark Twain's "Innocents Abroad" is my reading suggestion for you this
week.  It is the first account of what many today have come to call the
"ugly american".  (as I am American, I am entitled to go here, flame
away)  The amusement comes from the characters going to far away lands
and expecting it to be like where they came from and complaining about
it not being so.  Like many fellow americans that I have experienced
abroad, your behavior in this group reminds me very much of this book.
You come here, as a guest in this foreign land, and you expect our
community to bow to your demands.

We as a community ask that people make an effort to write in proper
english, making a best effort to use proper spelling and grammar, be
considerate of others, keep commercial postings to a minimum, etc.  We
have invested (many far more than I) in keeping this the community that
we have wanted it to become.  This community existed for ten years
before you got here.

You proudly profess that you like to play the role of devil's advocate
in your posts.  But you seem to lack sufficient understanding of the
underlying concepts to do this effectively, i.e. your argument that
insisting on formal spelling and grammar (as best as any poster is
able) is exclusionary to those who don't have english as a first
language.  Slang and idiomatic language is what comes last before
fluency is achieved.  If anything, insisting on the minimization of
slang and abbreviations helps those who might have to use dictionaries
and other translation aids greatly.  Try putting your "techno speak",
as you like to call it, through babelfish, and see what kind of crap
comes out the other side.

Many of us in this forum know each other personnally.  My company,
Ray's, and a number of others', that participate here, are all in the
Alliance partnership with Xilinx.  The cumulative experience here is
huge.  We will defend our turf from those that do not show it the
respect we ask for.

Ray's posts are always informative and well thought through.  Last time
I spoke with Ray on the telephone I was teasing him about having free
time on his hands, that he must have been between projects.  He asked
how I knew.  I replied, simple, whenever you have the time for a
handfull of posts in an afternoon, I know it is a safe time to call and
say hello.

The funny thing about this is when I see that you have the time to post
almost 300 times in this group in 60 days, I have to wonder if you have
any idea just how much time it is going to take you to complete your
ground breaking work in reconfigurable computing.  If you did, you
would be spending this time actually working on this, or whatever you
actually get paid to do, rather than trying to convince us just how
smart and benevolent you are.

Not only do you lack the bigger picture understanding with much of the
subject material that you chose to write on, you also seem to lack an
understanding of who you are actually communicating with.  So when you,
as you did this afternoon, rip someone like Ray a new one, it sends an
even more significant message: you not only don't have enough respect
for this forum, you don't know enough about the subject matter to know
who some of the strongest minds in the field are.

I have read as you have been given sensible information by many in this
group that reflects years of practical experience with FPGAs, silicon
fabrication and test, partial reconfiguration and place and route, and
stood back in utter disbelief as you couldn't be bothered to digest
this information and allow it to sculpt your view in the least.  You
simply defend your original view, or one that is simply contrary to the
masses, and keep arguing.

Mr. John Bass, fpga_toys, you are what is wrong with our community.
Welcome to my witch trial.  This forum existed for ten years before you
got here, and it will exist for many more after you get bored and leave
to torment another community.

Actually, as you seem to be a rather intelligent individual, that might
be able to become a member of this community I respectfully ask that
you drop the devil's advocate role and leave it for those that have
spent more time in our community and more than a few days in our
industry.  You might actually discover that what you want to do with
reconfigurable computing is possible in ways that you have not
imagined.  You have been so busy going off on the ways that this is not
possible given the current framework of the tools, the licensing, the
evil vendors that focus on embedded markets, blah, blah, blah... you
have missed the bigger picture that what you want to do is a whole lot
more similar to the current use of the devices than you realize.

Xilinx wants to sell chips.  Xilinx has all kinds of programs that help
outside companies that are consumers of Xilinx chips or offer companion
products or services that help sell its chips.  Xilinx refers to the
three hundred or so companies in the Alliance partnership as the
"xilinx eco-system".  Xilinx has a venture fund that was created to
fund this eco-system, and it is huge.  At least a half dozen presidents
of companies in this eco-system, and at least as many influential
people at Xilinx frequent this group.

Just a suggestion, it will probably require less time as the devil's
advocate, and less time evangelizing, to actually get any positive
attention from those that could help you if they wanted to.

This post is the extent of the help that I am likely to offer you, as
you have royally pissed me off.  Maybe some of the others will
eventually be more charitable.

Regards,
Erik.

---
Erik Widding
President
Birger Engineering, Inc.

 (mail) 100 Boylston St #1070; Boston, MA 02116
(voice) 617.695.9233 x207
  (fax) 617.695.9234
  (web) http://www.birger.com

.

Article: 99172
Subject: Re: Disk/LCD defect tolerant models for FPGA sales
From: fpga_toys@yahoo.com
Date: 20 Mar 2006 23:50:06 -0800
Links: << >> << T >> << A >>

Ray Andraka wrote:
> John, last time I checked, FPGAs did not get delivered from Xilinx with
> the config prom.  Sure, you can store a defect map on the config prom,
> or on your disk drive, or battery backed sram or whatever, but the point
> is that defect map has to get into your system somehow.  Earlier in this
> thread you were asking/begging Xilinx to provide the defect map, even if
> just to one of 16 quadrants for each non-zero-defect part delivered.
> That leads to the administration nightmare I was talking about.

Since NOTHING exists today, I've offered several IDEAS, including the
board mfg taking responsiblity for the testing and including it to the
end user .... as well as being able to do the testing at the end user
using a variety of options including triple redundancy and scrubbing.
Multiple ideas have been presented so provide options and room for
discussion. Maybe you missed that.

Not discussed was a proposal that the FPGA vendor could provide maybe
subquadrant level defect bin sorting .... which could be transmitted
via markings on the package, or by order selection, or even by using 4
balls on the package to specify the subquadrant.

For someone interested in finding solutions, there is generally the
intellectual capacity to connect the dots and finish a proposal with
alternate ideas.

For someone being obstructionist, there are no end to the objections
that can be raised.

I'm not sure you realize
> the number of routing permutations that need to be run just to get fault
> coverage of all the routing, switchboxes, LUTs, etc in the device, and
> much less achieve fault isolation. Your posts regarding that seem to
> support this observation.

I'm not sure that you understand, where there is a will, it certainly
can and will be done. After all, when it comes to routers for FPGA's
there are many independent implementations .... it's not a Christ
delivered on the mount technology for software guys to do these things.

>  > With RC there is an operating system, complete with disk based
>  > filesystem. The intent is to do fast (VERY FAST) place and route on the
>  > fly.
>  >
>
> but the fact of the matter is
> that in order to get performance that will make the FPGA compete
> favorably against a microprocessor is going to require  a fast time to
> completion that is orders of magnitude faster than what we have now
> without giving up much in the way of performance.

Ray, the problem is that you clearly have lost sight that sometimes the
expensive and critical resource to optimize for is people. Sometimes
it's the machine.

> I know that may RC apps are not explicitly time
> constrained, but they do have to finish enough ahead of other approaches
> to make them economically justifiable).

Ray .... stop lecturing ... I understand, and you are worried about
YOUR problems here, and clearly lack the mind reading ability to
understand everything from where I am coming or going.

There are a set of problems, very similar to DSP filters, which are
VERY parallel and scale very nicely in FPGA's. For those problems,
FPGA's are a couple orders of magnitude faster. Other's, that are
truely sequential with limited parallelism, are much better done on a
traditional ISA. It's useful to mate an FPGA system, with a
complementary traditional CPU. This is true in each of the prototypes I
built in the first couple years of my research. More reciently I've
also looked at FPGA centric designs for a different class of problems.

> Remember also, that the RC FPGA
> starts out with a sizable handicap against a microprocessor with the
> time to load a configuration, plus if the configuration is generated on
> the fly the time to perform place and route.  Once that hurdle is
> crossed, you still need enough of a performance boost over the
> microprocessor to amortize that set-up cost over the processing interval
> to come out ahead.  Obviously, you gain from the parallelism in the
> FPGA, but if you don't also mind the performance angle, it is quite easy
> to wind up with designs that can only be clocked at a few tens of MHz,
> and often that use up so much area that you don't have room for enough
> parallelism to make up for the much lower clock rate.

So? what's the point .... most of these applications run for hours,
even days.  I would like a future generation FPGA that has parallel
memory like access to the configuration space with high bandwidth ...
that is not today, and I've said so.

You are lecturing again, totally clueless about the issues I've
considered over the last 5 years, the architectures I've exported, the
applications I find interesting, or even what I have long term intent
for.  There are a lot of things I will not discuss without a purchase
order and under NDA.

> So that puts the
> dynamically configured RC in a box, where problems that aren't
> repetitive and complex enough to overcome the PAR and configuration
> times are better done on a microprocessor, and problems that take long
> enough to make the PAR time insignificant may be better served by a more
> optimized design than what has been discussed, and we're talking not
> only about PAR results, but also architecturally optimizing the design
> to get the highest clock rates and density.

So, what's your point? Don't think I've gone down that path? .... there
is a big reason I want ADB and the related interfaces that were done
for JHDLBits and several other university projects.  Your obsession
with "highest clock rates" leaves you totally blind to other tradeoffs.

> In my experience, FPGAs can
> do roughly 100x the performance of similar generation microprocessors,
> give or take an order of magnitude depending on the exact application
> and provided the FPGA design is done well.   It is very easy to lose the
> advantage by sub-optimal design.  If I had a dollar for every time I've
> gotten remarks that 100x performance is not possible, or that so and so
> did an FPGA design expecting only 10x and it turned out slower than a
> microprocessor because it wouldn't meet timing etc, I'd be retired.

With hand layout, I've done certain very small test kernels which
replicated to fill a dozen 2V6000's pull three orders of magnitude over
the reference SMP cluster for some important applications I wish to
target ... you don't get to a design that can reach petaflops by being
conservative, which is my goal.  I've used live tests on the DIni
boards confirm the basic processing rate and data transfers between
packages, for a number of benchmarks and test kernels, and they seem to
scale at this point. I've also done similar numbers with a 2V6000
array.  Later this year my goal is to get a few hundred LX200's, and
see if the scaling predictions are where I expect.

So, I agree, or I wouldn't be doing this.

> Rereading my post, I
> see that I let my tone get out of hand, and for that I ask your forgiveness.

Accepted. And I do have nearly six different competitive market
requirements to either fill concurrently, or with overlapping
sollutions. It is, six projects at a time at this point, and will later
settle into several clearly defined roles/solutions.

> In any event, truely dynamic RC remains a tough nut to crack because of
> the PAR and configuration time issues.

It's there in education project form .... getting the IP released, or
redoing it is a necessary part of optimizing the human element for
programming and testing. Production, is another set of problems and
solutions.

> By adding the desire to use
> defect ridden parts, you are only making an already tough job much
> harder.

Actually, I do not believe so. I'm 75% systems software engineer and
25% hardware designer, and very good at problem definition and
architecture issues. I've spent 35 years knocking off man year plus
software projects by myself in 3-4 months, and 5-8 man year projects
with a small team of 5-7 in similar time frames with a VERY strong KISS
discipline.

I see defect parts as a gold mine that brings volums up, and prices
down to make RC systems very competitive for general work, as well as
highly optimized work where they will shine big time.

I'm used to designing for defect management ... in disks, in memories,
and do not see this as ANY concern.

> I respectfully suggest you try first to get the system together
> using perfect FPGAs,

I've built several, and have several more ready to fab.

> as I believe you will find you already have an
> enormous task in front of you between the HLL to gates,

FpgaC I've been using just over 2-1/2 years, even with it's current
faults which impact density by between 2-20%. Enough to know where it
needs to go, and have that road map in place. There is a slowly growing
user base and developer group for that project. The project will mature
during 2006, and in some ways I've yet to talk about.

> the need for fast PAR,

This is a deal breaker, and why I've put my head up after a couple
years and started pushing when JHDLBits with ADB was not released.
There is similar code in several other sources that will take more
work. I've a good handle on that.

> partitioning the problem over multiple FPGAs and between FPGAs
> and software, making a usable user interface and libraries etc, without
> exponentially compounding the problem by throwing defect tolerance into
> the mix.  Baby steps are necessary to get through something as complex
> as this.

I've done systems level design for 35 years ... operating systems,
drivers, diagnostics, hardware design, and large applications. I do
everything with baby steps and KISS, but by tacking the tough problems
as early in a design as possible for risk management.

Again ... defect mangement may be scarry to you, because of how it
impacts YOUR projects, in this project it is NOT a problem. Reserving
defect resources is very similar to having the same resource already
allocated. OK?

Article: 99173
Subject: Re: Disk/LCD defect tolerant models for FPGA sales
From: fpga_toys@yahoo.com
Date: 20 Mar 2006 23:59:37 -0800
Links: << >> << T >> << A >>

Jim Granville wrote:
>   FPGAs are great at distributed fabric, but not that good at memory
> bandwidth, especially at bandwidth/$.

A traditional multiprocessor shares one more moderately wide dram
systems, which are inheriently sequential from a performance
perspective, even when shared/interleaved. Caches for some applications
create N memory systems, but also can become even a worse bottleneck.

The basic building block with FPGAs is lots of 16x1 memories with a FF
... with FULL PARALLELISM. The trick, is to avoid serialization with
FSM's, and bulk memories (such as BRAM and external memories) which are
serial.

DSP and numerical applications are very similar data flow problems.
Ditto for certain classes of streaming problems, including wire speed
network servers.

Article: 99174
Subject: Ace file for design with dual ppc405
From: "my.king" <king.azman@gmail.com>
Date: 21 Mar 2006 00:10:17 -0800
Links: << >> << T >> << A >>

I'm currently using the Xilinx ML310 development board. Anybody knows
how to generate ace file for designs with dual ppc405 core? Thank you.

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search