Messages from 66550

Article: 66550
Subject: Re: Comparator and minimum value address
From: Ralf Hildebrandt <Ralf-Hildebrandt@gmx.de>
Date: Sun, 22 Feb 2004 14:24:16 +0100
Links: << >> << T >> << A >>

sunil wrote:

>        I have 16 values. I ahve to compare those values and i have to
> get minimum value and it's address. I am comparing those all 16 avlues
> with by 8 comaprators, next the 8 output values by 4 comparator and so
> on......

-> This leads to combinational logic (remember: every comparator is a 
subtractor).

>     at last i am getting minimum value but how i can get address of
> that value.

Everytime you compare two values A and B, set a bit, that indicates, if 
A or B is bigger. Do it for every comparator. This will result in a tree 
of comparison results, that can be evaluated back to the adress with the 
minimum value.

Last of all: Think about a serialized approach (compare only two values 
and do it step by step). This will result in a much smaller circuit, but 
will lead to slower computing.

Ralf

Article: 66551
Subject: Re: Comparator and minimum value address
From: "valentin tihomirov" <valentin_NOSPAM_NOWORMS@abelectron.com>
Date: Sun, 22 Feb 2004 16:59:01 +0200
Links: << >> << T >> << A >>

Do you know how associative memory, i.e. get address/value by key,  works?
You have a mux that chooses between val1 and val2 basing on their values
(btw, what do you select if they are equal, both is not possible), in other
words, value selection by comparator's output. You should have a second
(address) parallel channel controlled by the same comparator. In associative
memory you have two channels: value and operation status success bit. BTW,
this is not VHDL issue.

Article: 66552
Subject: Re: Spartan 3 - avaliable in small quantities?
From: johnjakson@yahoo.com (john jakson)
Date: 22 Feb 2004 07:14:16 -0800
Links: << >> << T >> << A >>

rickman wrote:

> I don't know that the Spartan 3 parts are a major step forward in
> FPGAs.  From what I can see, the main difference is the elimination of
> the huge startup currents on power up.  The marketing claim is that
> these will be much cheaper parts because of the small die.  But so far,
> I don't think anyone has seen the results of this.  
> 
> -- 

I don't know about that, Xilinx initially set expectations low except
on price. I heard Microblaze only ran at 85MHz on it compared to
120MHz or more on bigger Virtex.

But on a cpu project I am working on I am seeing synth reports of
311MHz on sp3-5 with the latest speed file v 320Mhz for v2pro-8 and
the -7s seem to be same speed as sp3-5 IIRC. The sp2-4s are way down
to 120MHz. Seems as if its v2 made dirt cheap (if and when we get
them) with a small cut in speed. Also the LUT counts are similar to
sp2 but the blockrams are 4x bigger.

I can still port back to sp2(e) with almost same floor plan but with
much smaller ram instances although lots of 4ks could still be more
usefull than equiv no of 16/18Ks but the speed cut would hurt.

For an oldtime VLSI guy, I couldn't imagine getting such performance
on an ASIC flow without 100x the design resources.

johnjakson_usa_com

Article: 66553
Subject: Re: Spartan 3 - avaliable in small quantities?
From: "B. Joshua Rosen" <bjrosen@polybus.com>
Date: Sun, 22 Feb 2004 11:31:35 -0500
Links: << >> << T >> << A >>

On Sun, 22 Feb 2004 07:14:16 -0800, john jakson wrote:

> rickman wrote:
> 
>> I don't know that the Spartan 3 parts are a major step forward in
>> FPGAs.  From what I can see, the main difference is the elimination of
>> the huge startup currents on power up.  The marketing claim is that
>> these will be much cheaper parts because of the small die.  But so far,
>> I don't think anyone has seen the results of this.  
>> 
>> -- 
> 
> 
> I don't know about that, Xilinx initially set expectations low except
> on price. I heard Microblaze only ran at 85MHz on it compared to
> 120MHz or more on bigger Virtex.
> 
> But on a cpu project I am working on I am seeing synth reports of
> 311MHz on sp3-5 with the latest speed file v 320Mhz for v2pro-8 and
> the -7s seem to be same speed as sp3-5 IIRC. The sp2-4s are way down
> to 120MHz. Seems as if its v2 made dirt cheap (if and when we get
> them) with a small cut in speed. Also the LUT counts are similar to
> sp2 but the blockrams are 4x bigger.
> 
> I can still port back to sp2(e) with almost same floor plan but with
> much smaller ram instances although lots of 4ks could still be more
> usefull than equiv no of 16/18Ks but the speed cut would hurt.
> 
> For an oldtime VLSI guy, I couldn't imagine getting such performance
> on an ASIC flow without 100x the design resources.
> 
> johnjakson_usa_com

John,

I'd check your report files closely if I were you. If you are seeing
311MHZ on a Spartan 3 something is very wrong. I suspect that your
synthesizer discarded most of your design. My experience sith Spartan
XC3S400-4s is that they are much slower than Virtex2Ps (-5 is the V2P that
I'm comparing it to). I'm able to get the Spartan 3s to meet 140MHz timing
but that is with very few logic levels between pipeline stages. I'm sure
that with lots of floorplanning it would be possible to push it higher
than that but certainly not to 300MHz, especially not on something as
complex as a CPU.

Article: 66554
Subject: Re: Spartan 3 - avaliable in small quantities?
From: rickman <spamgoeshere4@yahoo.com>
Date: Sun, 22 Feb 2004 11:32:59 -0500
Links: << >> << T >> << A >>

john jakson wrote:
> 
> rickman wrote:
> 
> > I don't know that the Spartan 3 parts are a major step forward in
> > FPGAs.  From what I can see, the main difference is the elimination of
> > the huge startup currents on power up.  The marketing claim is that
> > these will be much cheaper parts because of the small die.  But so far,
> > I don't think anyone has seen the results of this.
> >
> > --
> 
> I don't know about that, Xilinx initially set expectations low except
> on price. I heard Microblaze only ran at 85MHz on it compared to
> 120MHz or more on bigger Virtex.
> 
> But on a cpu project I am working on I am seeing synth reports of
> 311MHz on sp3-5 with the latest speed file v 320Mhz for v2pro-8 and
> the -7s seem to be same speed as sp3-5 IIRC. The sp2-4s are way down
> to 120MHz. Seems as if its v2 made dirt cheap (if and when we get
> them) with a small cut in speed. Also the LUT counts are similar to
> sp2 but the blockrams are 4x bigger.

I don't know why you would not expect the XC3S parts to be faster than
the XC2S parts.  Certainly going with a 2x reduction in feature size (or
close to it) *should* give you a huge increase in speed.  In fact, they
should outrun everything Xilinx makes given the feature size.  But they
cut a lot of corners to make the parts cheap so they don't follow the
curve.  So far, I have not seen the prices beat the older Spartan parts
either.  Sure, they are an improvement, but in this industry,
improvement is normal and part of the game.  But the XC3S parts seem to
be just the next new chip, not anything really special.  

If the XC3S parts were both faster than the Virtex line and cheaper than
the older Spartan line, *that* would be something to crow about.  But
they are *neither* at the moment.  They are just the standard improved
line that combines both (more or less).  

> I can still port back to sp2(e) with almost same floor plan but with
> much smaller ram instances although lots of 4ks could still be more
> usefull than equiv no of 16/18Ks but the speed cut would hurt.
> 
> For an oldtime VLSI guy, I couldn't imagine getting such performance
> on an ASIC flow without 100x the design resources.

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 66555
Subject: Re: Comparator and minimum value address
From: "valentin tihomirov" <valentin_NOSPAM_NOWORMS@abelectron.com>
Date: Sun, 22 Feb 2004 19:35:35 +0200
Links: << >> << T >> << A >>


> -> This leads to combinational logic (remember: every comparator is a
> subtractor).
How do you perform calculations without using logic?


> Last of all: Think about a serialized approach (compare only two values
> and do it step by step). This will result in a much smaller circuit, but
> will lead to slower computing.
x:= a + b + c + d   -> x := ((a + b) + c) + d
means N adders and N adder delays

The original idea
x := (a+b) + (c+d)
requires N adders and takes log2(N) adder delays; therefore, is considered
better. This is what optimizers should do.

Article: 66556
Subject: TCP offload fpga core
From: nahum_barnea@yahoo.com (Nahum Barnea)
Date: 22 Feb 2004 09:40:30 -0800
Links: << >> << T >> << A >>

Hi.

I am looking for a TCP offload engine core to be implemented on fpga.
Does anyone know such commercial core ?

ThankX,
NAHUM

Article: 66557
Subject: Altera ACEX chip wide reset
From: rickman <spamgoeshere4@yahoo.com>
Date: Sun, 22 Feb 2004 14:52:02 -0500
Links: << >> << T >> << A >>

I am trying to decide if I should use a chip wide reset on an Altera
ACEX 1K part.  In reading how this works, it appears that the FFs are
actually only able to be reset or "loaded with a '1'" using an async
signal.  So if I am not using a signal to preset a FF in my design, but
just want the power on/chip wide reset state to be a '1', how would I
code that in VHDL?  Is this like the Xilinx tools where you code in a
chip wide reset and then drive it with a special module?  Or do I
explicitly drive it from the chip wide reset pin and the tool figures it
out?  

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 66558
Subject: Re: Comparator and minimum value address
From: Ralf Hildebrandt <Ralf-Hildebrandt@gmx.de>
Date: Sun, 22 Feb 2004 21:10:17 +0100
Links: << >> << T >> << A >>

valentin tihomirov wrote:

>>-> This leads to combinational logic (remember: every comparator is a
>>subtractor).
> 
> How do you perform calculations without using logic?

Maybe I am misunderstood.
Combinational logic is "just a buch of gates" without any memory element.
Sequential logic includes memory elements (latch, flipflops, RAM...)

>>Last of all: Think about a serialized approach (compare only two values
>>and do it step by step). This will result in a much smaller circuit, but
>>will lead to slower computing.
> 
> x:= a + b + c + d   -> x := ((a + b) + c) + d
> means N adders and N adder delays
> 
> The original idea
> x := (a+b) + (c+d)
> requires N adders and takes log2(N) adder delays; therefore, is considered
> better. This is what optimizers should do.

But what about using just *one* adder to compare two values step by 
step? It takes N steps to compute the result and some registers are 
needed to store some information (which operand was bigger during the 
last comparison plus a simple state machine), but it could be much 
smaller than using N adders.

-> It depends on the constraints, which way is the best (parallel or 
serial). Therefor I said "think about".

Ralf

Article: 66559
Subject: ModelSim, Virtex DCM, and clk0 phase problem
From: dbraunstein@comcast.net (Dan Braunstein)
Date: 22 Feb 2004 12:48:01 -0800
Links: << >> << T >> << A >>

Has anyone experienced the clk0 being 180 deg out of phase with the
DCM input clock during simulation (wave view) in ModelSim? I have clk0
going to clkfb through a bufg, just like what is described in the V-II
Platform Handbook (jellybean simple implementation), but after lock,
clk0 is 180 out of phase, so I do not get the wave that is shown in
the modelsim wave view in the Handbook.

My fix is to run clk180 to clkfb. Then clk0 is in phase with clkin.
But this makes no sense to me. I have not implemented on chip to see
what really happens.

As an aside, my input clock is 27MHz, but I need 13.5 MHz for my logic
with various phase relationships. Any advice? What are the
implications of running various phased 27MHz clocks into flops to get
various phased 13.5 MHz clocks?

Thanks. I am new to all this, but have to say the DCM/simulation bit
has been infinitely frustrating.

Danny

Article: 66560
Subject: Re: Dual-stack (Forth) processors
From: Bernd Paysan <bernd.paysan@gmx.de>
Date: Sun, 22 Feb 2004 22:33:43 +0100
Links: << >> << T >> << A >>

Martin Euredjian wrote:
> Regarding using C with an FPGA.  Implementing something as simple as
> an 8051 core opens the door to using a large number of C compilers,
> tools, libraries and capable programmers to write your control code.

I wonder how someone can call the 8051 "simple". On my last task, we had
an 8051 (customer choice), which took about 3000LEs on an Altera FPGA
(I think that's even more than the 32 bit NIOS core takes ;-). Being
less than happy about that, I developed a simple Forth processor in a
few days (see www.b16-cpu.de), which did fit into about 600LEs. This is
a 16 bit processor, and much faster than the 8051 (but too late to
convince our customers). Recently, I stripped down a few not absolutely
necessary features such as fast divide and add, and fast mem-mem copy,
and now I'm at about 300LEs with the simplified version (for the
current project, where the customer wantet a "state machine").

I'm also working on a GCC backend, but it looks like GCC 3.5's SSA tree
representation will make the job much simpler to generate stack code
than going through the machine description aproach.

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Article: 66561
(removed)

Article: 66562
Subject: Re: Spartan 3 - avaliable in small quantities?
From: johnjakson@yahoo.com (john jakson)
Date: 22 Feb 2004 16:27:27 -0800
Links: << >> << T >> << A >>

"B. Joshua Rosen" <bjrosen@polybus.com> wrote in message news:<pan.2004.02.22.16.31.34.651568@polybus.com>...
> On Sun, 22 Feb 2004 07:14:16 -0800, john jakson wrote:
> 
> > rickman wrote:
> > 
> >> I don't know that the Spartan 3 parts are a major step forward in
> >> FPGAs.  From what I can see, the main difference is the elimination of
> >> the huge startup currents on power up.  The marketing claim is that
> >> these will be much cheaper parts because of the small die.  But so far,
> >> I don't think anyone has seen the results of this.  
> >> 
> >> -- 
> > 
> > 
> > I don't know about that, Xilinx initially set expectations low except
> > on price. I heard Microblaze only ran at 85MHz on it compared to
> > 120MHz or more on bigger Virtex.
> > 
> > But on a cpu project I am working on I am seeing synth reports of
> > 311MHz on sp3-5 with the latest speed file v 320Mhz for v2pro-8 and
> > the -7s seem to be same speed as sp3-5 IIRC. The sp2-4s are way down
> > to 120MHz. Seems as if its v2 made dirt cheap (if and when we get
> > them) with a small cut in speed. Also the LUT counts are similar to
> > sp2 but the blockrams are 4x bigger.
> > 
> > I can still port back to sp2(e) with almost same floor plan but with
> > much smaller ram instances although lots of 4ks could still be more
> > usefull than equiv no of 16/18Ks but the speed cut would hurt.
> > 
> > For an oldtime VLSI guy, I couldn't imagine getting such performance
> > on an ASIC flow without 100x the design resources.
> > 
> > johnjakson_usa_com
> 
> John,
> 
> I'd check your report files closely if I were you. If you are seeing
> 311MHZ on a Spartan 3 something is very wrong. I suspect that your
> synthesizer discarded most of your design. My experience sith Spartan
> XC3S400-4s is that they are much slower than Virtex2Ps (-5 is the V2P that
> I'm comparing it to). I'm able to get the Spartan 3s to meet 140MHz timing
> but that is with very few logic levels between pipeline stages. I'm sure
> that with lots of floorplanning it would be possible to push it higher
> than that but certainly not to 300MHz, especially not on something as
> complex as a CPU.

Hi Rick

I know what you are saying. When I first presented my paper cpu
architecture to XST, the situation looked hopeless. I backed of and
built a no of test projects that only included 1 object that was
pushed to the max bringing all IOs to the pads. The synth reports are
then crystal clear even for someone with little exp of the tool
before. I also look at the layout and placement to see if it looks
kosher. It did. From that I had a feel for what each Xilinx device

Article: 66563
Subject: Help with Xilinx EDK 6.1
From: mahim+google@cs.cmu.edu (Mahim Mishra)
Date: 22 Feb 2004 17:27:25 -0800
Links: << >> << T >> << A >>

I am trying to generate a configuration bitstream for a small design
using Xilinx's Embedded Development Kit. The design has a small
PowerPC component and a small FPGA component. When I run bitgen on the
FPGA component, it crashes silently without producing any error
messages. I have tried doing this using the EDK GUI front-end (Xilinx
Platform Studio) as well as from the command line, with the same
result. Bitgen works fine if I try to compile a design that does not
use the embedded PowerPC core.

Has anyone seen this before? Does anyone know how I can make bitgen
tell me more about why it is unhappy? All I have is the crash log that
XP generated and offers to send to Microsoft, and that is not telling
me anything about what may be wrong. I have tried to search on
xilinx.com as well as through google, but not found anything. I am a
complete newbie to Xilinx tools, so right now I am completely lost.

Here is some more information about my setup: 

I am using an xc2vp20-ff1152 chip mounted on a Xilinx AFX prototyping
board. I am using ISE 6.1.03i and EDK 6.1.2, on Windows XP.

The EDK front-end runs bitgen with the command:

bitgen -w -f bitgen.ut system

Bitgen produces this output before crashing:

<snip>
Release 6.1.03i - Bitgen G.26
Copyright (c) 1995-2003 Xilinx, Inc.  All rights reserved.

Loading device database for application Bitgen from file "system.ncd".
   "system" is an NCD, version 2.38, device xc2vp20, package ff1152,
speed -6
Loading device for application Bitgen from file '2vp20.nph' in
environment
C:/Xilinx.
Opened constraints file system.pcf.

Sun Feb 22 19:28:24 2004
</snip>

Here is the bitgen.ut file:

<snip>
-g ConfigRate:4
-g CclkPin:PULLUP
-g TdoPin:PULLNONE
-g M1Pin:PULLDOWN
-g DonePin:PULLUP
-g DriveDone:No
-g StartUpClk:JTAGCLK
-g DONE_cycle:4
-g GTS_cycle:5
-g M0Pin:PULLUP
-g M2Pin:PULLUP
-g ProgPin:PULLUP
-g TckPin:PULLUP
-g TdiPin:PULLUP
-g TmsPin:PULLUP
-g DonePipe:No
-g GWE_cycle:6
-g LCK_cycle:NoWait
-g Security:NONE
-m
-g Persist:No
</snip>

I would very much appreciate any pointers anyone could give me about
how to diagnose my problem.

Thanks,
Mahim

Article: 66564
Subject: Re: Dual-stack (Forth) processors
From: "Martin Euredjian" <0_0_0_0_@pacbell.net>
Date: Mon, 23 Feb 2004 01:38:49 GMT
Links: << >> << T >> << A >>

Bernd Paysan wrote:

> I wonder how someone can call the 8051 "simple". On my last task, we had
> an 8051 (customer choice), which took about 3000LEs on an Altera FPGA
> (I think that's even more than the 32 bit NIOS core takes ;-). Being
> less than happy about that, I developed a simple Forth processor in a
> few days (see www.b16-cpu.de), which did fit into about 600LEs.

I can buy a core today and have any 8051 code running in it tomorrow.
That's simple.

Even better.  If I have a design that uses an external 8051 and need to
reduce BOM cost (and have FPGA resources) I can buy a core and fold the
processor and peripherals into the FPGA with little if any changes to the
8051 source code.  Again, that's simple.  And, BTW, I have exactly that
situation in one of our designs right now.

Simple isn't always smaller, faster, cheaper, less LE's, etc.  "Simpler" is
defined by the application and the circumstances at hand.

Now, to address your 600 vs. 3000 LE comparison.  Well, of course, if you
have a constraint that does not allow you the luxury of a 3000 LE processor
you have to look elsewhere.  This might mean adopting something like a small
Forth CPU implementation (such as yours), a small state machine or simply
moving the processor off-chip.  I mean, these days, for two bucks you can
put a tiny 25+ MHz (Cygnal and others) processor on a board.

So, yes, context is important, of course.  In your context your choice made
perfect sense.  No question about it.

Now, in these days of 6 million gate FPGA's it might be OK to trade device
utilization for time to market, flexibility, portability or other
parameters.  Of course, each project is different.  Each company is
different.  Each designer is different and each circumstance is different.
You have to keep an open mind.  That's all.

The other interesting choice today is V2 Pro.  You might, for example, want
to use one to take advantage of the high-speed serial I/O capabilities and,
as a result, have free PowerPC processors ready to blast away.

BTW, your b16 looks to be very useful and compact.

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Martin Euredjian

To send private email:
0_0_0_0_@pacbell.net
where
"0_0_0_0_"  =  "martineu"

Article: 66565
Subject: Re: Spartan 3 - avaliable in small quantities?
From: johnjakson@yahoo.com (john jakson)
Date: 22 Feb 2004 18:45:24 -0800
Links: << >> << T >> << A >>

Joshua replied:

> > johnjakson_usa_com
> 
> John,
> 
> I'd check your report files closely if I were you. If you are seeing
> 311MHZ on a Spartan 3 something is very wrong. I suspect that your
> synthesizer discarded most of your design. My experience sith Spartan
> XC3S400-4s is that they are much slower than Virtex2Ps (-5 is the V2P that
> I'm comparing it to). I'm able to get the Spartan 3s to meet 140MHz timing
> but that is with very few logic levels between pipeline stages. I'm sure
> that with lots of floorplanning it would be possible to push it higher
> than that but certainly not to 300MHz, especially not on something as
> complex as a CPU.

Hi Joshua, Rick

Hopefully 4th time lucky, my girls are helping me way too much. With
google I don't know what happened for several hours, I am sure a
couple of half posts are infront. Apologies. Long replay warning.

I know what you are saying. My 1st paper cpu arch when presented to
XST gives me little clue where to start. I always used to work on
ASICs in teams where I write Verilog & C models and someone else (far
less speed/area motivated) bangs the FPGA tool. With Virtex800 exp
only at <30MHz I never had that great an expectation to start with, I
always had way too much logic in each pipeline but we only needed
30MHz. There was no time to explore speediac style and reduce logic as
it was ASIC prototyping.

Ray Andraka's work on super pipeling everything DSP left me wondering
if a cpu could also go as fast. Usually not so because there are way
too many random blocks of logic covering many adjacent pipelines. This
is why MicroBlaze is stuck in the 120MHz zone, I could probably guess
(reverse engineer) the code used for the datapath if I really studied
the ISA.

But the Alpha chip and ofcourse now the x86s are also deeply
superpipelined but more complex than can fit in any FPGA (or maybe
not). Now I am free to explore the boundaries and see what can be done
on a clean sheet at max freq.

I am also following very late after Philip and Jans work on FPGA cpus
from the 4000 days but even Jan got 30MHz on 4000s along time ago.
Since I am coming from cpu & DSP background, I wanted Alpha speed but
on a better architecture for par programming ie a modern Transputer.

I built a no of test projects that only included 1 instance of a real
pipelined blockram, or adders of varying widths, and so on. I also
play through the device type list and try sp2s through to v2pro with
varying speed grades and even different packages since the reports
only take 20s for such simple models. The last speed file posted by
Austin made a huge difference bringing sp3 close enough to v2pro that
the differences is marginal, only -8 pulled ahead another 5%. The sp2s
remain at the lower end of 100-200MHz which is what I expect for these
simple pipes.

I always study the report and generate the layout. Everything looks
kosher but the layout always looks haphazard. So I learn to use the
floorplanner and write C code to make the .ucf file for FF placement.
On occasion a stupid typo would whip up the speed to 700Mhz or
something, and voila most of the top level would be missing but then
the report usually says as much in bright red or yellow. I only allow
a few yellow marks for known issues beyond my control like the unused
parity bits of blockram instance. Any more than that requires
immediate fixing.

Now that I have my expectations set right I know that a Blockram can
cycle at around 320MHz on various sp3 -5 devices. Infact the ds99.pdf
IIRC says as much. A 32b plain adder is 250MHz, that needs pipelining
work to get to 300MHz plus. I ended up with a 12,10,10.msb width
3stage 32b add. I really wanted to do a faster 2 stage carry select
design but XST always seem to hack it into something less. Trivial
things like generating CVNZ flags become trouble at that speed, I end
up piping that as well since you can only do 3 LUT layers of logic or
a 12b registered add or 12b logic fn() or a BRam cycle and ZERO
combinations of these.

This is only possible because the cpu design is 4 way hyperthreaded
with 1 nice hazard path, so that all the datapath pipes are as
decoupled as they would be in any DSP engine. Only the instruction
decode has some local coupling but again it has no wide adds or big
rams so its looking doable and it is also Nway threaded. I have more
work to do but I never add more logic in series with my critical
blocks. If I get to 4 LUT/mux levels I immediately drop out of warp
speed back to 250MHz or even way less and that makes the other stuff
that is fully pipelined redundant. Any time my speed drops below
311MHz, I know I just added a 4th LUT level, track it down and redo it
till its 3 or less. This usually requires working on that module in
isolation, keeping its speed as much as possible over my target.
Further I can not allow any module to have unregistered IOs however
painful that is with out tracking that at a global level. The 3 levels
of LUT logic is almost always in one place inside a module between 2
pipes. The Verilog code is a mix of structural & RTL style, assigns
for wiring and always @ for the FFing.

This is really the same deal with the fastest VLSI cpus that are
limited to 10 levels of low fanout gate level logic. Seymour was doing
this in ECL 40yrs ago. A LUT counts as 3 levels of gate logic so close
enough 10gates.

I will report on the work as it gets closer to live results. I know I
can download to an sp2e dev board for about 200MHz or way less but by
the time the cpu C & Verilog models can run code and I have the lcc
compiler done, gee I might have a sp3 -5 dev board to play with. The
intended market is licensing to high end users for embedded & par
computing. I am even tempted to max the datapath to 64b as it only
adds 3-4 pipestages and not much to the control.

The LUT count is still below 500. and is mostly going to control, a
64b Alpha path would balance it more to computing, but thats another
story. My only concern is how much power 1 cpu <800 LUTs or FFs will
dump. I use 2BRams per cpu instance, so I am just about to lose having
2 in an sp 50. The bigger sp's though are more on the LUT side.

Regards all

johnjakson_usa_com

Article: 66566
Subject: Re: Spartan 3 - avaliable in small quantities?
From: Jim Granville <no.spam@designtools.co.nz>
Date: Mon, 23 Feb 2004 20:11:04 +1300
Links: << >> << T >> << A >>

john jakson wrote:
<interesting stuff snipped>
> 
> If I get to 4 LUT/mux levels I immediately drop out of warp
> speed back to 250MHz or even way less and that makes the other stuff
> that is fully pipelined redundant. Any time my speed drops below
> 311MHz, I know I just added a 4th LUT level, track it down and redo it
> till its 3 or less. This usually requires working on that module in
> isolation, keeping its speed as much as possible over my target.
> Further I can not allow any module to have unregistered IOs however
> painful that is with out tracking that at a global level. The 3 levels
> of LUT logic is almost always in one place inside a module between 2
> pipes. The Verilog code is a mix of structural & RTL style, assigns
> for wiring and always @ for the FFing.
> 
> This is really the same deal with the fastest VLSI cpus that are
> limited to 10 levels of low fanout gate level logic. Seymour was doing
> this in ECL 40yrs ago. A LUT counts as 3 levels of gate logic so close
> enough 10gates.
> 
> I will report on the work as it gets closer to live results. 

  Sounds to me like something you could negotiate
a job at Xilinx doing :)

  Their marketing dept would just LOVE to boast about 300+ MHz
CPU cores, even if that is 'very peaky'. (after all, so are the
alternatives)

  Key question is what code size is this working from ?

-jg

Article: 66567
Subject: Re: EDK 6.1 vs 3.2 and OPB Bus resets
From: Sean Durkin <23@iis.42.de>
Date: Mon, 23 Feb 2004 08:12:20 +0100
Links: << >> << T >> << A >>

Carlos Villalpando wrote:
> The new system has a PLB->OPB bridge and on the OPB bridge is 3 OPB UART 
> Lites and the custom OPB peripheral.  The PPC is running a simple "hello 
> world" type program.  The program runs fine without the custom core 
> connected.  With the core connected and not accessed, the OPB bus resets 
> after any of the UARTS tries to spit out more than 16 characters at a time.
Not sure if this will help in your case, but I've had some problems with 
the combination "UART Lite" and "Custom IP Core" as well. The bus didn't 
reset, but the system just stopped working altogether after 16 bytes 
were sent over UART. In my case the solution was to explicitely set the 
C_MIR_BASEADDR and C_MIR_HIGHADDR parameters for my IP-core, after that 
it all worked fine. The strange thing is that I need to set this even if 
I turn off the MIR completely.

I don't know if this will help you, but trying won't hurt.

BTW, I'm using EDK3.2 and ISE5.2, but since your problem sounds so 
similar to mine I though I'd respond anyway.


-- 
Sean Durkin
Fraunhofer Institute for Integrated Circuits (IIS)
Am Wolfsmantel 33, 91058 Erlangen, Germany
http://www.iis.fraunhofer.de

mailto:23@iis.42.de
([23 , 42] <=> [durkinsn , fraunhofer])

Article: 66568
Subject: Barrel shifter synthesis in QuartusII
From: ALuPin@web.de (ALuPin)
Date: 22 Feb 2004 23:58:58 -0800
Links: << >> << T >> << A >>

Hi,

I tried to compile the code presented some days ago in this newsgroup.
I use Altera QuartusII v3.0 SP2

and got the following warning:

Warning: VHDL Subtype or Type Declaration warning at
numeric_std.vhd(878):
subtype or type has null range Switching left and right bound of
range.

Was does that mean?


Apart from that I get the Info
"No valid register-to-register paths exist for clock Clk"

What does go wrong with timing calculation?

Rgds


library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;         

entity barrelshifter is
port( Quantity : in  unsigned(31 downto 0);
      Amount   : in  unsigned(4 downto 0);
      Reset    : in  std_logic;
      Clk      : in  std_logic;
      Output   : out std_logic_vector(31 downto 0)
     );
end barrelshifter;

architecture ro_lft of barrelshifter is
signal rotated   : std_logic_vector(31 downto 0);
signal rotate_by : unsigned(4 downto 0);

begin
rotate_by <= Amount;

process (Clk)
begin
  if Reset='1' then
     rotated <= (others => '0');
  elsif clk = '1' and clk'event then
     rotated <= std_logic_vector(shift_right(quantity,to_integer(unsigned(rotate_by))));
  end if;
end process;

Output <= rotated;

end ro_lft;

Article: 66569
Subject: Re: altera, xilinx susceptible to power transients?
From: visualfor@yahoo.com (Naveed)
Date: 23 Feb 2004 00:17:51 -0800
Links: << >> << T >> << A >>

"Jeff" <koebrich@sbcglobal.net> wrote in message news:<pijZb.11786$PY.278@newssvr26.news.prodigy.com>...
> Just wondering...how susceptible are these RAM based FPGA devices to power
> supply transients, brownouts, etc?  I am looking on Altera's website and
> have not found much yet.
> 
> Thanks,
> 
> Jeff

I have been using 1.5V cyclone.  Cyclone will reset some of its FF for
a very fast transient on 1.5V rails.  I am taling about voltage
dipping well below 1V for more than 5-10nsec multiple times.  I have
never seen it going into the reconfiguration.

If transient is the problem, avoid using low voltage ICs, go for 2.5V
or even 3.3V cores.

Naveed

Article: 66570
Subject: Re: Help with Xilinx EDK 6.1
From: Sean Durkin <23@iis.42.de>
Date: Mon, 23 Feb 2004 09:40:20 +0100
Links: << >> << T >> << A >>

Mahim Mishra wrote:
> I am trying to generate a configuration bitstream for a small design
> using Xilinx's Embedded Development Kit. The design has a small
> PowerPC component and a small FPGA component. When I run bitgen on the
> FPGA component, it crashes silently without producing any error
> messages. I have tried doing this using the EDK GUI front-end (Xilinx
> Platform Studio) as well as from the command line, with the same
> result. Bitgen works fine if I try to compile a design that does not
> use the embedded PowerPC core.
Have you tried opening the .NCD file bitgen tries to convert in FPGA 
Editor? My guess is it's the underlying .NCD that is corrupt. In that 
case, FPGA Editor will crash if you try to open it...

John Williams described somnething like this not long ago. I too have 
encoutered corrupt .NCD-files from time to time, but I can't put my 
finger on it, i.e. I can't reproduce it or narrow down where it comes from.

Have a look at the logfiles, especially from the par stage, and see if 
there's anything unusual there. In my case par seems to finish without 
errors, bit generates a corrupt .NCD...

If you can reproduce this reliably, maybe you could open a WebCase with 
Xilinx. Since you're now the third person with the same problem, I'm 
beginning to believe it's not just me being too stupid to use the tools. :)

-- 
Sean Durkin
Fraunhofer Institute for Integrated Circuits (IIS)
Am Wolfsmantel 33, 91058 Erlangen, Germany
http://www.iis.fraunhofer.de

mailto:23@iis.42.de
([23 , 42] <=> [durkinsn , fraunhofer])

Article: 66571
(removed)

Article: 66572
Subject: Re: Dhrystone figures - Was: Microblaze instruction timings
From: Goran Bilski <goran@xilinx.com>
Date: Mon, 23 Feb 2004 11:22:36 +0100
Links: << >> << T >> << A >>

Hi,

Sorry, I sent the answers as HTML only so I resent this as text only.

See below.


Jon Beniston wrote:

>Goran Bilski <goran@xilinx.com> wrote in message news:<c12r85$l611@cliff.xsj.xilinx.com>...
>  
>
>>Hi,
>>
>>The multicycle instruction always take multiple cycles.
>>This is due to the pipeline of MicroBlaze.
>>MicroBlaze has only 3 pipestages, Instruction Fetch (IF), Operand Fetch 
>>(OF) and Execution Stage (EX)
>>    
>>
>
>Thanks for the explaination.
>
>  
>
>>The current MicroBlaze is a good tradeoff between area and performance.
>>    
>>
>
>Sure. 
>
>  
>
>>The 950 LUT figure includes the basic features no caches or debug.
>>The caches is quite cheap on LUTs, around 50 LUTs for the instruction cache.
>>The cost is that BRAM is needed to handle the caches.
>>    
>>
>
>Does "basic features" include the h/w divider? I've been trying to
>reproduce the quoted Dhrystone figures on the simulator, and only get
>0.63 MIPS/MHz without it. If I add it, I can get 0.77.
>  
>
To get 0.8 MIPS/MHz, you need to enable the HW divider. The size of the 
HW divider is around 60-80 LUTs.
I can't remember correctly but the implementation is a basic 
shift-compare design which only needs a compare block and a shift block. 
The divide will take 35 clock cycles. 2 clock cycles to setup the 
operands, 32 clock cycles for the division and 1 clock cycle for writing 
the result.

>It seems strange that on the Web page
>(http://www.xilinx.com/ipcenter/processor_central/microblaze/performance.htm),
>the Spartan 3 is rated at 0.8 and the Spartan II is rated at 0.65, yet
>they are both listed as requiring the same number of logic cells. I
>would presume that either the performance figure for the Spartan II is
>too low, or the number of logic cells required by the Spartan 3 and
>Virtex II's to acheive the quoted figure is actually higher.
>  
>
The difference is that S3 and VII has embedded multiplier so MicroBlaze 
will have a HW multiplier while the S2 doesn't have the HW multiplier so 
multiplication is done using SW (which takes many more clock cycles)

>Incidentally, I've been trying to get the Dhrystone numbers for NIOS
>as well. Can anybody clarify if their instruction set simulator is
>cycle accurate? If it is, the figures appear to be 0.64 for a 32-bit
>implementation and 0.15 for a 16-bit implementation, but I have a
>feeling that this should be lower.
>
>Cheers,
>JonB
>  
>

Article: 66573
Subject: erasing a MAX device
From: Rene Tschaggelar <none@none.net>
Date: Mon, 23 Feb 2004 11:26:13 +0100
Links: << >> << T >> << A >>

A feature forgotten by the developpers of the MaxPlus2
software is the ability to just erase a (MAX) device.
Having some troubles with some other hardware which
should have defined output pins going to defined input pins
on a MAX3128 leads to this device not being initialized.
This makes the MAX3128 have open inputs drawing too much power.

There wasn't any problem before the MAX3128 was programmed,
and I therefore wish to erase the MAX3128. Just erase.

Rene
-- 
Ing.Buero R.Tschaggelar - http://www.ibrtses.com
& commercial newsgroups - http://www.talkto.net

Article: 66574
Subject: Inova Semiconductor Gigastar Link between two FPGAs
From: pjtwomey@hotmail.com (Patrick Twomey)
Date: 23 Feb 2004 02:57:16 -0800
Links: << >> << T >> << A >>

I am trying to send live video data (a data rate of 160 Mbps) over a
10m cable between two FPGAs. I am using a pair of Gigastar Piggyback
boards with Transmitter and Receiver (ING_TRC) to achieve this. Has
anyone used these boards to acieve this?
Also, the transmitter and receiver work on 33 MHz clock so as to send
up to a data rate of 1.32 Gbps. I have this clock as an read clock to
an asynchronous fifo (the write clock to the fifo is 13.5 MHz and is
generated on the FPGA). Does this 33 MHz external clock have to bo a
global clock buffer or will a standard IBUF will do?

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search