Messages from 20675

Article: 20675
Subject: Re: Xilinx hold time problems...
From: jhallen@world.std.com (Joseph H Allen)
Date: Thu, 17 Feb 2000 16:35:43 GMT
Links: << >> << T >> << A >>

In article <Fq31J5.57K@world.std.com>,
Joseph H Allen <jhallen@world.std.com> wrote:
>In article <38ab3f43.13235782@nntp.best.com>,
>Bob Perlman <bobperl@best_no_spam_thanks.com> wrote:
>>On Wed, 16 Feb 2000 13:44:04 -0800, Peter Alfke <peter@xilinx.com>
>>wrote:
>
>>>The classical solution to this old problem is to utilize the input flip-flop with
>>>its input delay, but configured as a latch, and hold it permanently transparent.
>
>>>Peter Alfke
>
>>I tried the latch trick some years ago.  It worked, but one
>>significant complication was that PPR (this was back in '92-'93) kept
>>trying to help by optimizing out the always-transparent latch.  I
>>don't know if M2.1 has the same problem.  
>
>Acutally I think it's map which is doing it.  It has the same problem in
>M2.1, no matter how many KEEP and NOREDUCE attributes you add to the input
>or gate (however I was able to get leonardo verilog synthesizer to keep it
>with the 'dont_touch' attribute: //exemplar attribute delay dont_touch
>true).
>
>Then I realized that you can just attach the latch gate to the clock for the
>same effect, which is by far the easiest solution:
>
>  ild_1 delay(.Q(original_input), .D(new_input), .G(clk));
>
>  always @(clk)
>    begin
>     ... state machine which uses original_input ...
>    end

Note however that this should only be done on signals which actually have
hold time issues (or more exactly, only to signals with a setup time lower
than 1/2 the clock period), since it adds a huge delay to the input (while
clk is high).  It would be better to tie .G low, but I don't no how to
prevent map from optimizing the latch out.  You can use the fpga editor, but
it's a pain.
-- 
/*  jhallen@world.std.com (192.74.137.5) */               /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}

Article: 20676
Subject: GLSVLSI-2000
From: ahf@watson.ibm.com
Date: Thu, 17 Feb 2000 16:47:03 GMT
Links: << >> << T >> << A >>

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
            10-th Great Lakes symposium on VLSI Design
            March 2-4, 2000,  Chicago, Illinois, U.S.A.
                      http://www.glsvlsi.com
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Dear Colleague,

On behalf of the organizing committee for the 10-th Great Lakes
Symposium on VLSI, I would like to draw your attention to this
year's very exciting program.  Please refer to the new symposium
web-page at URL:

    http://www.glsvlsi.com

for the advance program, registeration and hotel information.
Prospective participants are encouraged to register and make
their travel arrangements at their earliest convenience.

Please disseminate this announcement further among your colleagues.
We look forward to a seeing you at the Symposium.


Regards,
Amir H. Farrahi
Publicity Chair

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
            10-th Great Lakes symposium on VLSI Design
            March 2-4, 2000,  Chicago, Illinois, U.S.A.
                      http://www.glsvlsi.com
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Sent via Deja.com http://www.deja.com/
Before you buy.

Article: 20677
Subject: Re: RLOC_RANGE property.
From: Steve Gross <gross@pa.msu.edu>
Date: Thu, 17 Feb 2000 12:21:59 -0500
Links: << >> << T >> << A >>

George wrote:
> 
> Hi Folks,
> 
> I am using Foundation 2.1i-SP4 to target XC4k series. I am having
> difficulties with 'RLOC_RANGE' property. I am using schematic design entry.
> I attach a property 'RLOC_RANGE=Rr1Cc1:Rr2Cc2' on a symbol if I wanted it to
> be placed between Rows r1 and r2, and Columns c1 and c2. However, when I map
> the design,  I do not get what I expect. This directive seems to be toatally
> ignored.
> Has anybody anybody out there used this property? How did it work for you?
> 
> I appreciate your help.

I'm not sure if this is the problem, but remember that RLOC constraints
only control relative locations, not absolute.  If you are trying to get
absolute positioning, use the constraint:

	LOC = Rr1Cc1 : Rr2Cc2 

Or apply an RLOC_ORIGIN constraint.

Another concern is getting the RLOC sets right, there are a number of ways
to control which elements are part of which RLOC sets.  The automatic
hierarchy-based processing may not be doing what you want, I can't tell
without knowing more about your schematic.

-Steve Gross

Article: 20678
Subject: Re: Suggested prototyping boards < $200
From: "Sergio A. Cuenca Asensi" <sergio@dtic.ua.es>
Date: Thu, 17 Feb 2000 19:03:20 +0100
Links: << >> << T >> << A >>



Matt Billenstein wrote:

> All, I'm interested in purchasing a prototyping board based on a Xilinx FPGA
> and I have about $200 to spend.  I've looked a little at the boards at
> www.xess.com so far.  Does anyone have any recommendations?
>
> thx in advance,
>
> m
>
> Matt Billenstein
> REMOVEhttp://w3.one.net/~mbillens/
> REMOVEmbillens@one.net

I think Xess board is the best for this price. Additionally Xess provides an
excellent support and help and they have and very good web with a lot of useful
tutorials.

--
===================================================================
Sergio A. Cuenca Asensi
Dept. Tecnologia Informatica y Computacion (TIC)
Escuela Politecnica Superior, Campus de San Vicente
Universidad de Alicante
Ap. Correos 99, E-03080 ALICANTE
ESPAÑA (SPAIN)
email   : sergio@dtic.ua.es
Phone : +34 96 590 39 34
Fax     : +34 96 590 39 02
===================================================================

Article: 20679
Subject: Re: Xilinx hold time problems...
From: Hernan Saab <hernan@synplicity.com>
Date: Thu, 17 Feb 2000 10:09:18 -0800
Links: << >> << T >> << A >>

Hello Josheph,

I was tired and didnt realize what I wrote to you before.
One way to reduce the hold time is by "advancing the phase of the clock" using
a  roboclock or any other clock manager.

Joseph H Allen wrote:

> The M2.1i software now reports hold times on input pads in the data sheet
> timing report file, and, of course, I have some significant (up to 2.5 ns)
> hold times relative to the system clock.
>
> This does not happen when using the IOB flip flop, with its delay line.  It
> does happen when there is small amount of logic between the input and the
> first flip flop (so that the IOB flip flop can not be used), and when both
> are placed together in a CLB near the pad.
>
> What is the best (easy+automatic) way to eliminate these hold times?  Has
> anyone else noticed this?
>
> --
> /*  jhallen@world.std.com (192.74.137.5) */               /* Joseph H. Allen */
> int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
> +r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
> ]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}

--

Hernan Javier Saab
Western Area Applications Engineer
Email: HERNAN@synplicity.com
Direct Phone: 408-215-6139
Main Phone: 408-215-6000
Pager: 888-712-5803
FAX: 408-990-0295
Synplicity, Inc.
935 Stewart Drive
Sunnyvale, CA  94086  USA
Internet <http://www.synplicity.com >

Article: 20680
Subject: Re: How to manage projects with Xilinx?
From: Dave Vanden Bout <devb@xess.com>
Date: Thu, 17 Feb 2000 13:22:19 -0500
Links: << >> << T >> << A >>

This is a multi-part message in MIME format.
--------------4F9241746F17810F1561C2C3
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

jlamorie@engsoc.carleton.ca wrote:

> In article <38AAEE78.9B969870@xess.com>,
>   Dave Vanden Bout <devb@xess.com> wrote:
> > You can take a look at http://www.xess.com/fndmake.pdf.  This document
> > shows you how to implement Xilinx projects using a makefile and batch
> > mode processing.  You can store the makefiles and VHDL files in a CVS
> > tree and recall them to regenerate your project bit files.
> >
> > The makefile described in the document is a bit simple, but you can
> > probably modify it to make it smarter.
>
> This is absolutely wonderful!!  I'll try it out today, and try to figure
> out how to include the source files for FSM, schematics and logiblox.
>
>

I know some of these tools dump out VHDL, so you can just include the VHDL files from those in the makefile.




--------------4F9241746F17810F1561C2C3
Content-Type: text/x-vcard; charset=us-ascii;
 name="devb.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Dave Vanden Bout
Content-Disposition: attachment;
 filename="devb.vcf"

begin:vcard 
n:Vanden Bout;David
tel;fax:(919) 387-1302
tel;work:(919) 387-0076
x-mozilla-html:FALSE
url:http://www.xess.com
org:XESS Corp.
adr:;;2608 Sweetgum Drive;Apex;NC;27502;USA
version:2.1
email;internet:devb@xess.com
title:FPGA Product Manager
x-mozilla-cpt:;28560
fn:Dave Vanden Bout
end:vcard

--------------4F9241746F17810F1561C2C3--

Article: 20681
Subject: Re: Xilinx Virtex Reset
From: mark.luscombe@lineone.net (Mark Luscombe)
Date: Thu, 17 Feb 2000 18:48:45 GMT
Links: << >> << T >> << A >>

Hi Peter,

Can you tell me, with a Virtex 1000E, using the STARTUP-VIRTEX, GSR
net, and your clever little LUT flip-flop, what is the maximum
propagation delay from, i.e. can i use GSR with a 74MHz system clock.

Thanks, Mark.

On Wed, 16 Feb 2000 09:19:31 -0800, Peter Alfke <peter@xilinx.com>
wrote:

>
>
>Rick Filipkiewicz wrote:
>
>> Looking at the Virtex data sheet there's a timing parameter for the GSR->IOB/CLB FF
>> outputs given. For a -4 part its 12.5nsec. The question is whether this includes GSR
>> routing. If it doesn't its got to be the slowest async reset since LS TTL.
>
>Of course it includes the max routing delay.
>But it's a max delay, and some flip-flops are closer to the source and have a much
>shorter delay. So this delay ( different from all other delays in the data sheet) has
>an enormous spread, you really should assume anywhere between almost zero to the max
>value. That's what causes the problems that Ray and I discussed before.
>
>Peter Alfke
>
>

Article: 20682
Subject: Re: coregen-bug produces bad blockram > 16 bit
From: "Mark Hillers" <Mark.Hillers@Informatik.Uni-Oldenburg.DE>
Date: Thu, 17 Feb 2000 19:38:37 +0000
Links: << >> << T >> << A >>

hi jeff,

that's not possible bacause i use the synopsys behavioral-compiler.
but - we could fix the problem which seems to be a language-problem
between coregen-edif-writer and design-compiler-edif-reader.



fjz001@email.mot.com wrote:
> 
> Mark,
> 
> Why not directly instantiate the RAMB4_S16_S16 in your HDL? In this
> case, Coregen just adds a layer of unnecessary complexity.
> 
> Jeff
> 
> In article <88c4df$4r2@news.Informatik.Uni-Oldenburg.DE>,
>   "Mark Hillers" <Mark.Hillers@Informatik.Uni-Oldenburg.DE> wrote:
> > Hello,
> >
> > i think i have found a bug in xilinx-tool coregen 2.1i.
> > it happens when creating single-port-blockrams with words larger than
> 16
> > bit.
> >
> > the resulting ".edn"-file (for synopsys) uses one RAMB4_S16_S16
> > component where the
> > lower 16 bit of the 24-bit-word are mapped to port A (DOA[15:0]) and
> the
> > upper 8 bit are mapped to port B (DOA[15:8]).
> > But - and here comes the bug - the address of the desired word is
> simply
> > mapped to both address-ports (ADDRA and ADDRB (8 bit wide)) the
> > following way:
> >
> > ADDRA(4 downto 0) = myaddress(4 downto 0)
> > ADDRA(7 downto 5) = "000"
> > ADDRB(4 downto 0) = myaddress(4 downto 0)
> > ADDRB(7 downto 5) = "000"
> >
> > The problem is now that always both ports load the same address and
> with
> > it the same data. The Result is an output which has the form CDABCD
> > where A,B,C,D are hex-ciphers.
> >
> > In application-note XAPP130 (V1.1) is a solution to this problem. The
> > mapping of the address-ports should be:
> >
> > ADDRA(4 downto 0) = myaddress(4 downto 0)
> > ADDRA(7 downto 5) = "000"
> > ADDRB(4 downto 0) = myaddress(4 downto 0)
> > ADDRB(7 downto 5) = "100"
> >
> > Now I am looking for a simple patch. The simples would be a new
> version
> > of coregen because i am not good in writing ".edn"-files :-(.
> >
> > greetings
> > mark
> >
> 
> Sent via Deja.com http://www.deja.com/
> Before you buy.

Article: 20683
Subject: Re: RECONFIGURABLE board for image processign
From: mrauf@nova-eng.com
Date: Thu, 17 Feb 2000 20:14:37 GMT
Links: << >> << T >> << A >>

Nova Engineering has two different Altera FPGA development boards:
Constellation and Constellation-E.  See <http://www.nova-
eng.com/constellation.html>

Both are very similar, but the "Constellation-E" adds a USB interface
and uses the newer 10KE FPGAs from Altera
<http://www.altera.com/html/products/f10ke.html>.  The Constellation-E
can utilize up to an EPF10K200S.

For other boards see http://www.optimagic.com/boards

Michael Rauf
Nova Engineering, Inc.
1.800.341.NOVA (6682)
1.513.860.3456
1.513.860.3535 (fax)
mailto:mrauf@nova-eng.com
http://www.nova-eng.com
5 Circle Freeway Drive
Cincinnati, Ohio, USA 45246

In article <38A1C9F2.9BDDD89B@dtic.ua.es>,
  "Sergio A. Cuenca Asensi" <sergio@dtic.ua.es> wrote:
> Hello all,
> I´m looking for a reconfigurable board (PCI, ISA) to develop REAL
image
> processing projects.
> Any ideas?
>
> --
> ===================================================================
> Sergio A. Cuenca Asensi
> Dept. Tecnologia Informatica y Computacion (TIC)
> Escuela Politecnica Superior, Campus de San Vicente
> Universidad de Alicante
> Ap. Correos 99, E-03080 ALICANTE
> ESPAÑA (SPAIN)
> email   : sergio@dtic.ua.es
> Phone : +34 96 590 39 34
> Fax     : +34 96 590 39 02
> ===================================================================
>
>

Sent via Deja.com http://www.deja.com/
Before you buy.

Article: 20684
Subject: Re: Looking for a small, fast CPU core for FPGA
From: djacobow@slip.net (David Jacobowitz)
Date: 17 Feb 2000 12:52:01 -0800
Links: << >> << T >> << A >>

Take a look at Xtensa

www.tensilica.com

-dave

In article <3898D606.6BA00B3E@cmt.co.il>, Irit  <irit@cmt.co.il> wrote:
>Hello,
>I am looking for a CPU core which can be placed in an FPGA. It should
>have the following features:
>
>1. 32-bit registers and ALU, integer only; don't need multiply or
>divide.
>
>2. Fast enough to run at 66 MHz on a Virtex or Apex FPGA.
>
>3. Not too big (a single instance should fit in less than 100K system
>gates, whatever it means).
>
>4. Possible to have more than one instance in a single chip; not locked
>to specific cells or I/O pins.
>
>5. Must have code development tools (assembler, linker, debugger)
>available; C compiler is nice-to-have but not mandatory.
>
>6. Preferably synthesizable VHDL or Verilog; if available as netlist or
>routed block, must have a VHDL simulation model.
>
>7. Can be converted later to an ASIC cell.
>
>I would greatly appreciate any pointers; after all replies (if any) have
>been sent, I will post a summary in the relevant NGs.
>
>Please send replies to my email (assaf_sarfati@yahoo.com) as well as
>posting them; I suspect my NG server either loses posts or deletes them
>after a few minutes.
>
>               Thanks in Advance
>               Assaf Sarfati
>
>

Article: 20685
Subject: Re: Looking for a small, fast CPU core for FPGA
From: nweaver@boom.CS.Berkeley.EDU (Nicholas C. Weaver)
Date: 17 Feb 2000 21:21:54 GMT
Links: << >> << T >> << A >>

	Another possibility is the LEON VHDL sparc compatable
processor.  It is in VHDL, at http://www.estec.esa.nl/wsmwww/leon/ .
It doesn't QUITE meet your timing requirements, but they claim a
synthesised performance of 45 MHz on an XCV300E-8

integer only, SPARC compatable, 32 bit memory bus, Icache, dcache,
etc.  Results in synthesis are 5300 LUTs on a Virtex 300E.  It
wouldn't suprise me if hand laying out the datapath would improve
things (the John Henry FPGA pannel discussion showed some impressive
results for hand laying out datapaths).  It is under the LGPL, so you
can use this in a commercial product.
-- 
Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Article: 20686
Subject: Re: Xilinx hold time problems...
From: Paul Urbanus <urb@ti.com>
Date: Thu, 17 Feb 2000 15:22:02 -0600
Links: << >> << T >> << A >>

If you have a spare I/O pin, configure it with a pullup and then connect this
to the strobe input of the latch. That way, the backend tools can't optimize
away your transparent latch without changing your design.

Using a spare I/O pin to keep the optimizers from being too smart is a
technique I use on occasion. Although it's not the prettiest solution, it does
have the benefit of being quick, and works across various synthesis tools
vendors as well as different revisions of the place-and-route software.

Urb

Bob Perlman wrote:
> 
> On Wed, 16 Feb 2000 13:44:04 -0800, Peter Alfke <peter@xilinx.com>
> wrote:
> 
> >The classical solution to this old problem is to utilize the input flip-flop with
> >its input delay, but configured as a latch, and hold it permanently transparent.
> >
> >Peter Alfke
> 
> I tried the latch trick some years ago.  It worked, but one
> significant complication was that PPR (this was back in '92-'93) kept
> trying to help by optimizing out the always-transparent latch.  I
> don't know if M2.1 has the same problem.
> 
> If you try this approach, go into FPGA editor after the route and
> confirm that the latch(es) didn't disappear.
> 
> Finally, thanks to Joseph for posting the issue.  I've been looking
> for the simple, automatic solution for a long time.  It's easy to say,
> "Always go through the IOB FF," but in practice there are those
> situations where the additional latency isn't tolerable.
> 
> Good luck,
> Bob Perlman
> 
> -----------------------------------------------------
> Bob Perlman
> Cambrian Design Works
> Digital Design, Signal Integrity
> http://www.best.com/~bobperl/cdw.htm
> Send e-mail replies to best<dot>com, username bobperl
> -----------------------------------------------------

Article: 20687
Subject: Re: Xilinx hold time problems...
From: Ray Andraka <randraka@ids.net>
Date: Thu, 17 Feb 2000 23:47:45 GMT
Links: << >> << T >> << A >>

That's well and good as long as you have the budget.  Last time I checked, those
roboclock thingies cost as much as the FPGA.

Hernan, weren't you at Lattice before?

Hernan Saab wrote:

> Hello Josheph,
>
> I was tired and didnt realize what I wrote to you before.
> One way to reduce the hold time is by "advancing the phase of the clock" using
> a  roboclock or any other clock manager.
>
> Hernan Javier Saab
> Western Area Applications Engineer
> Email: HERNAN@synplicity.com
> Direct Phone: 408-215-6139
> Main Phone: 408-215-6000
> Pager: 888-712-5803
> FAX: 408-990-0295
> Synplicity, Inc.
> 935 Stewart Drive
> Sunnyvale, CA  94086  USA
> Internet <http://www.synplicity.com >

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20688
Subject: Re: multiplier
From: Ray Andraka <randraka@ids.net>
Date: Thu, 17 Feb 2000 23:52:51 GMT
Links: << >> << T >> << A >>

It depends on your requirements as well as the relative cost of the memories
_for your design_.  Typical block RAMs are only on the order of 4K bits, so one
memory only makes a 4x5 multiplier (9bits address to 8 bits data).  Not too
impressive.  If you don't need the memory for something else in the design, you
can use the memories for partial products, but be aware of that it may actually
be slower than a pipelined multiplier implemented in CLBs.

"Keith Jasinski, Jr." wrote:

> The new FPGAs that have RAM that can be configured/initialized like a ROM
> are touting the ability to use the RAM as a single clock cycle multiplier by
> using it as a look-up table.  Maybe that might be your answer.
>
> --
> Keith F. Jasinski, Jr.
> kfjasins@execpc.com
> Pradeep Rao <pradeeprao@planetmail.com> wrote in message
> news:88fc5e$b84$1@news.vsnl.net.in...
> > Hi,
> >
> > Which would be the best implimentation of a multiplier in VHDL
> > (synthesisable) in terms of speed/area?
> > I know of array implimentation and the ragister configuration using a
> single
> > adder. Are there any other better ones ?
> > Thanks in anticipation,
> >
> > Pradeep Rao
> >
> >
> >
> >

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20689
Subject: Re: coregen-bug produces bad blockram > 16 bit
From: Ray Andraka <randraka@ids.net>
Date: Thu, 17 Feb 2000 23:56:01 GMT
Links: << >> << T >> << A >>

You still should be able to instantiate the RAMB4 primitives as a black box,
no?

Mark Hillers wrote:

> hi jeff,
>
> that's not possible bacause i use the synopsys behavioral-compiler.
> but - we could fix the problem which seems to be a language-problem
> between coregen-edif-writer and design-compiler-edif-reader.
>
> fjz001@email.mot.com wrote:
> >
> > Mark,
> >
> > Why not directly instantiate the RAMB4_S16_S16 in your HDL? In this
> > case, Coregen just adds a layer of unnecessary complexity.

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20690
Subject: Re: RECONFIGURABLE board for image processign
From: Ray Andraka <randraka@ids.net>
Date: Thu, 17 Feb 2000 23:59:08 GMT
Links: << >> << T >> << A >>

I'd pick a Xilinx board for image processing rather than Altera.  I think
you'll find the altera architecture considerably more limiting, as it is
not as adept as the xilinx architectures for arithmetic or delay queues
(needed in the imaging filters for example).  See my previous posts
regarding altera vs xilinx for signal processing applications.

mrauf@nova-eng.com wrote:

> Nova Engineering has two different Altera FPGA development boards:
> Constellation and Constellation-E.  See <http://www.nova-
> eng.com/constellation.html>
>
> Both are very similar, but the "Constellation-E" adds a USB interface
> and uses the newer 10KE FPGAs from Altera
> <http://www.altera.com/html/products/f10ke.html>.  The Constellation-E
> can utilize up to an EPF10K200S.
>
> For other boards see http://www.optimagic.com/boards
>
> Michael Rauf
> Nova Engineering, Inc.
> 1.800.341.NOVA (6682)
> 1.513.860.3456
> 1.513.860.3535 (fax)
> mailto:mrauf@nova-eng.com
> http://www.nova-eng.com
> 5 Circle Freeway Drive
> Cincinnati, Ohio, USA 45246
>
> In article <38A1C9F2.9BDDD89B@dtic.ua.es>,
>   "Sergio A. Cuenca Asensi" <sergio@dtic.ua.es> wrote:
> > Hello all,
> > I´m looking for a reconfigurable board (PCI, ISA) to develop REAL
> image
> > processing projects.
> > Any ideas?
> >
> > --
> > ===================================================================
> > Sergio A. Cuenca Asensi
> > Dept. Tecnologia Informatica y Computacion (TIC)
> > Escuela Politecnica Superior, Campus de San Vicente
> > Universidad de Alicante
> > Ap. Correos 99, E-03080 ALICANTE
> > ESPAÑA (SPAIN)
> > email   : sergio@dtic.ua.es
> > Phone : +34 96 590 39 34
> > Fax     : +34 96 590 39 02
> > ===================================================================
> >
> >
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20691
Subject: Re: multiplier
From: mwojko@collpits.newcastle.edu.au (Mathew Wojko)
Date: 18 Feb 2000 00:27:17 GMT
Links: << >> << T >> << A >>

Ray Andraka (randraka@ids.net) wrote:
: Wallace trees are not generally the fastest multipliers in FPGAs.  See the

If you pipeline them they generally are. 

However, it depends on how you define speed. If you are referring to the 
clocking rate, then a fully pipelined Wallace tree multiplier will provide
the best results - over vector and array based techniques. However, 
Wallace trees require a large amount of device resource to do so (CLB count). 

If you are interested in pipelined structures and associated clocking 
rates, be prepared to experience an area/time tradeoff for multiplication 
implementations. Thats is, the faster you wish to clock the implementation,
the more area you will have to use.

If you are interested in the functional density of the implementation,
I'd say that vector based approaches (which add partial products in parallel
- using fast carry logic) provide best utilisation results. 

--
Mathew Wojko

Article: 20692
Subject: Re: xilinx
From: "rodger" <brownsco@frii.com>
Date: Thu, 17 Feb 2000 17:35:22 -0700
Links: << >> << T >> << A >>

The app note is based on a system with an 8051 that has non-volatile
storage, rather than just RAM. It could be adapted to your application,
though usually you want to store the configuration for the FPGA in some
non-volatile storage rather than just RAM. You would re-work the part that
reads the data from the EPROM and read it from your in-memory data
structures.

The JTAG programmer is not used. The programming method is JTAG.
The "meat" of the app note is the control of the FPGA programming in
JTAG mode using a microcontroller. JTAG programming requires you
to send data and control flow in very specific bit lengths and sequences.
The majority of the C-code in the app note is the implementation of this
and does not need to be modified. Many people are using JTAG on their
systems to program FPGAs and CPLDs today. You can chain them up and
program many devices using JTAG as well.

Hope that helps,

-r

<elynum@my-deja.com> wrote in message news:88d7j9$h9g$1@nnrp1.deja.com...
> I read the app note. It didn't mention about how to program the fpga
> with an 8051 with just a buffer and ram.  Can you do that?  Can you do
> it without using the jtag programmer?  It was kind of confusing and the
> C code was fairly long.  It seems it would be faster to program with
> just an eeprom.
>
> n article <2X5q4.52$mjh.185876992@news.frii.net>,
>   "rodger" <brownsco@frii.com> wrote:
> > Try this:
> >
> > http://www.xilinx.com/xapp/xapp058.pdf
> >
> > The App Note is titled:
> >
> > Xilinx In-System Programming Using an Embedded Microcontroller -
> XAPP058,
> > v2.0 (06/99)
> >
> > It will get you started. The programming mode is JTAG and the
> included code
> > example is for a 8051, with minor modifications needed for other
> > architectures.
> >
> > -r
> >
> > <elynum@my-deja.com> wrote in message
> news:881ajg$c3l$1@nnrp1.deja.com...
> > > How would I go about programming 2 xilinx fpga's on a single board?
> > > Would I need 2 separate EEPROM chips(ATMEl) or just one?  How would
> > > I go about doing it with a microprocessor 8051 or 860?  What would I
> > > need to do this?
> > >
> > >
> > > Sent via Deja.com http://www.deja.com/
> > > Before you buy.
> >
> >
>
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.

Article: 20693
Subject: Re: Suggested prototyping boards < $200
From: John Rible <postmaster@sandpipers.com>
Date: Thu, 17 Feb 2000 19:10:18 -0800
Links: << >> << T >> << A >>

Try the $150 Atmel starter kit at <http://www.kanda.com>. A bit more on the
board than the Xess ones.

-John

Matt Billenstein wrote:
> 
> All, I'm interested in purchasing a prototyping board based on a Xilinx FPGA
> and I have about $200 to spend.  I've looked a little at the boards at
> www.xess.com so far.  Does anyone have any recommendations?
> 
> thx in advance,
> 
> m
> 
> Matt Billenstein
> REMOVEhttp://w3.one.net/~mbillens/
> REMOVEmbillens@one.net

Article: 20694
Subject: Re: multiplier
From: Ray Andraka <randraka@ids.net>
Date: Fri, 18 Feb 2000 03:34:27 GMT
Links: << >> << T >> << A >>

Mathew Wojko wrote:

> Ray Andraka (randraka@ids.net) wrote:
> : Wallace trees are not generally the fastest multipliers in FPGAs.  See the
>
> If you pipeline them they generally are.
>

No, they are not.  A wallace tree produces a sum vector and a carry vector.
Those have to be added together to obtain the full sum.  The tree portion of the
wallace tree can be clocked quite fast if it is pipelined.  However, that final
adder determines the maximum clock rate of the multiplier.  In an ASIC, that
adder can be made quite a bit faster than a ripple carry adder using any of a
number of fast adder schemes.

A row ripple array multiplier can be made about as fast as a wallace tree if it
is rearranged into a tree and all the adders in the tree are the same as the fast
adder used to combine the sum and carry vectors from the wallace tree.  That of
course, comes at a considerable cost in area.  The real advantage to the wallace
tree is that it allows you to use cheap full adders for the array, and only one
copy of the expensive fast adder.  It comes at a cost of a very complicated
routing pattern.

Now fade to the FPGA.  The fast carry chain logic in modern FPGAs is a highly
optimized dedicated path that is about an order of magnitude faster than logic
implemented in the LUT logic and connected via the general routing resources.
That fact makes it extremely difficult to improve upon the performance of the
carry chain ripple carry adder.  This non-homogenous mix of logic means that the
cheap ripple carry adder is about as fast as you're gonna get in the FPGA (short
of pipelining the carry) for word widths up to around 24-32 bits.  The result is
a wallace tree buys you nothing in terms of area, and in fact is twice as big as
a a row-ripple tree because the ripple carry adders use one LUT per bit (the
carry is in dedicated logic in xilinx or splits the lut in altera) where the full
adders in the wallace tree need two luts per bit (one for sum, one for carry).
The larger area costs clock cycle time since the routing in FPGAs has substantial
delay.  Now pipelining will get back the performance (requires a register
immediately in front of the final adder for best clock speed), but the fact of
the matter is you are still limited by the speed of that final adder.  So a
wallace tree gets you at best, the same performance as a row-ripple tree with
double the area (more if you use partial product techniques at the front layer).
This is why a wallace tree multiplier is not appropriate for an FPGA.

That said, the column route delay penalty in Altera 10K devices does make a
wallace tree a little more attractive for pipelined trees that cannot fit in one
row.  The reason for that is the clock period is limited by the delay from the
output register on one level of the tree through the carry chain to the msb
output register of the next level.  If the levels cross a row boundary, there is
a significant delay hit which will reduce the clock frequency unless additional
registers are added ahead of and in the same row of the carry chain.  If the tree
extends across several rows, several layers of pipeline registers are needed if
the tree is all ripple carry adds.  A wallace tree can reduce the hit, but again
at the expense of a considerable amount of area...and that is only true for trees
that extend across more than two rows.  You get the same clock cycle performance
in less area by simply adding the extra pipeline registers instead of doing a
wallace tree, but at the expense of a little clock latency.  Note that this is a
special case.  The other special case occurs in FPGAs without carry chains, where
in order to get an advantage by using a wallace tree, your final adder should use
a fast carry scheme.

> However, it depends on how you define speed. If you are referring to the
> clocking rate, then a fully pipelined Wallace tree multiplier will provide
> the best results - over vector and array based techniques. However,
> Wallace trees require a large amount of device resource to do so (CLB count).
>
> If you are interested in pipelined structures and associated clocking
> rates, be prepared to experience an area/time tradeoff for multiplication
> implementations. Thats is, the faster you wish to clock the implementation,
> the more area you will have to use.
>
> If you are interested in the functional density of the implementation,
> I'd say that vector based approaches (which add partial products in parallel
> - using fast carry logic) provide best utilisation results.

For FPGAs with fast carry chains, these partial product techniques also provide
the fastest multipliers short of pipelining the carries.

>
>
> --
> Mathew Wojko

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20695
Subject: Re: Xilinx M2.1 Floorplanner Question
From: Ray Andraka <randraka@ids.net>
Date: Fri, 18 Feb 2000 03:55:49 GMT
Links: << >> << T >> << A >>

Was the movement limited to within a CLB?  If not check the placement
report to see what and why it happened.

Bob Perlman wrote:

> Hi -
>
> I don't know how many of you use the Xilinx M2.1 floorplanner.  If you
> do, I have a question for you.
>
> Yesterday I used the floorplanner to place portions of a
> schematic-based XCS30XL design, and managed to go from a design that
> failed route after 1-1/2 hours (didn't complete route and didn't meet
> timing on the routed nets)  to a design that routed and met all timing
> constraints in 40 minutes.  So, I'm happy with the results, but was
> puzzled by the fact that the Xilinx tools moved some of the cells that
> I'd placed.  Any RPMs that I placed stayed put, but cells that I'd
> moved individually into the placement window were sometimes in new
> places after routing.  You could see that the place and route tools
> had kept the cells more or less where I'd placed them, but moved some
> cells around.
>
> Is this expected behavior when using the floorplanner?  If so, what's
> to keep I/O pin assignments from moving?
>
> Thanks,
> Bob Perlman
>
>
> -----------------------------------------------------
> Bob Perlman
> Cambrian Design Works
> Digital Design, Signal Integrity
> http://www.best.com/~bobperl/cdw.htm
> Send e-mail replies to best<dot>com, username bobperl
> -----------------------------------------------------

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20696
Subject: Re: Using a programable logic device to search a huge number field
From: Ray Andraka <randraka@ids.net>
Date: Fri, 18 Feb 2000 04:09:27 GMT
Links: << >> << T >> << A >>

Yes, you can significantly accelerate this with an FPGA if I am reading this
right.  I think the hash function is basically a PN sequence, which you can undo
using finite field arithmetic.  It more or less reduces to a shift register and
xor gates, which can be quite compactly realized in xilinx FPGAs (using the clb
ram feature), or a bit less compactly in Altera parts.  In current FPGAs, you
can do a PN generate at bit rates well over 200MHz.  At 64 bits per, that is
over 3 million/sec with a single copy, and you have room in an FPGA for lots of
these.  It will take more logic to distill the results than to do the hashes.
The big numbers in your polynomial look a little suspect.  I would expect the
upper exponents to be more like 64 and 59?

Neill Clift wrote:

> Could anyone give me an idea if the following would be possible to do with
> one of the many
> programable logic device I see mentioned here?
>
> VMS hashes user passwords using a polynomial over Zp. p = 2^64-59 and the
> polynomial
> looks like this:
>
> f(x) = x ^16777213 + A * x ^16777153 + B * x ^3 + C * x^2 + D * x + E (mod
> p)
>
> On say a 600Mhz PIII I can evaluate 0.4 million values of this polynomial /
> sec.
>
> Noting that 2 is a primitive root mod p we can make a search of the whole
> space much faster by calculating like this:
>
> f(0), f(1), f(2), f(4), ... f(2^r),..., f(2^(p-2)
>
> We can use each term calculated for f(2^r) to calculate the terms for
> f(2^(r+1)) just by multiplying
> by the constants 2^16777213, 2^16777153, 8,4,2 and 1 (mod p).
>
> So to search then entire 64 bit space of the problem involves doing / point:
>
> 2 x 64 bit multiplies mod p where the multiplier is a constant with no
> special structure and 3
> small constant multiplies (that can even be converted to additions) followed
> by 5 additions (all
> mod p).
>
> Once again on a 600Mhz PIII I can do something like 4 million points / sec.
>
> Is this the kind of problem thats easily done with an FPGA etc? I would need
> to be able to fit
> many of these on a single device to divide the problem space up.
>
> What current devices are available that would be best suited to such a task?
> Are they affordable
> by someone who just wants to play about like this?
> Thanks.
> Neill.

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20697
Subject: Re: Using a programable logic device to search a huge number field
From: Ray Andraka <randraka@ids.net>
Date: Fri, 18 Feb 2000 04:12:47 GMT
Links: << >> << T >> << A >>

For devices, depends on how many you want to do at once.  Even the small FPGAs can
handle several of the PN generators.  Perhaps a good starting point would be the
XESS board, which is available with a 4000 series xilinx, software and a student
manual for about $200.

Ray Andraka wrote:

> Yes, you can significantly accelerate this with an FPGA if I am reading this
> right.  I think the hash function is basically a PN sequence, which you can undo
> using finite field arithmetic.  It more or less reduces to a shift register and
> xor gates, which can be quite compactly realized in xilinx FPGAs (using the clb
> ram feature), or a bit less compactly in Altera parts.  In current FPGAs, you
> can do a PN generate at bit rates well over 200MHz.  At 64 bits per, that is
> over 3 million/sec with a single copy, and you have room in an FPGA for lots of
> these.  It will take more logic to distill the results than to do the hashes.
> The big numbers in your polynomial look a little suspect.  I would expect the
> upper exponents to be more like 64 and 59?
>
> Neill Clift wrote:
>
> > Could anyone give me an idea if the following would be possible to do with
> > one of the many
> > programable logic device I see mentioned here?
> >
> > VMS hashes user passwords using a polynomial over Zp. p = 2^64-59 and the
> > polynomial
> > looks like this:
> >
> > f(x) = x ^16777213 + A * x ^16777153 + B * x ^3 + C * x^2 + D * x + E (mod
> > p)
> >
> > On say a 600Mhz PIII I can evaluate 0.4 million values of this polynomial /
> > sec.
> >
> > Noting that 2 is a primitive root mod p we can make a search of the whole
> > space much faster by calculating like this:
> >
> > f(0), f(1), f(2), f(4), ... f(2^r),..., f(2^(p-2)
> >
> > We can use each term calculated for f(2^r) to calculate the terms for
> > f(2^(r+1)) just by multiplying
> > by the constants 2^16777213, 2^16777153, 8,4,2 and 1 (mod p).
> >
> > So to search then entire 64 bit space of the problem involves doing / point:
> >
> > 2 x 64 bit multiplies mod p where the multiplier is a constant with no
> > special structure and 3
> > small constant multiplies (that can even be converted to additions) followed
> > by 5 additions (all
> > mod p).
> >
> > Once again on a 600Mhz PIII I can do something like 4 million points / sec.
> >
> > Is this the kind of problem thats easily done with an FPGA etc? I would need
> > to be able to fit
> > many of these on a single device to divide the problem space up.
> >
> > What current devices are available that would be best suited to such a task?
> > Are they affordable
> > by someone who just wants to play about like this?
> > Thanks.
> > Neill.
>
> --
> -Ray Andraka, P.E.
> President, the Andraka Consulting Group, Inc.
> 401/884-7930     Fax 401/884-7950
> email randraka@ids.net
> http://users.ids.net/~randraka

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20698
Subject: Re: Suggested prototyping boards < $200
From: Ray Andraka <randraka@ids.net>
Date: Fri, 18 Feb 2000 04:15:02 GMT
Links: << >> << T >> << A >>

What are you planning to do with the board.  The best choice of devices really
depends on the application.

John Rible wrote:

> Try the $150 Atmel starter kit at <http://www.kanda.com>. A bit more on the
> board than the Xess ones.
>
> -John
>
> Matt Billenstein wrote:
> >
> > All, I'm interested in purchasing a prototyping board based on a Xilinx FPGA
> > and I have about $200 to spend.  I've looked a little at the boards at
> > www.xess.com so far.  Does anyone have any recommendations?
> >
> > thx in advance,
> >
> > m
> >
> > Matt Billenstein
> > REMOVEhttp://w3.one.net/~mbillens/
> > REMOVEmbillens@one.net

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 20699
Subject: Re: multiplier
From: mwojko@hartley.newcastle.edu.au (Mathew Wojko)
Date: 18 Feb 2000 07:49:14 GMT
Links: << >> << T >> << A >>

In article <38ACBDDF.CA310F0A@ids.net> you wrote:
: Mathew Wojko wrote:

: > Ray Andraka (randraka@ids.net) wrote:
: > : Wallace trees are not generally the fastest multipliers in FPGAs.  See the
: >
: > If you pipeline them they generally are.
: >

: No, they are not.  A wallace tree produces a sum vector and a carry vector.
: Those have to be added together to obtain the full sum.  

: However, that final adder determines the maximum clock rate of the 
multiplier.  

Precisely. The Wallace tree is a carry-save architecture. When pipelined,
carry values only ever propagate one bit-position within each stage of 
processing (no carry propagation latencies are experienced). Thus fast 
clocking rates for this 'tree-part' of the multiplier can be acheived.

However, when combining the carry and sum vectors, you do not want to 
compromise the performance obtained thus-far from the 'tree-part' of
the multiplier. A simple ripple adder implemented using fast carry logic
will not yeild the same performance as acheived by the wallace tree.
Thus overall performance will be affected.

: Now fade to the FPGA.  The fast carry chain logic in modern FPGAs is a highly
: optimized dedicated path that is about an order of magnitude faster than logic
: implemented in the LUT logic and connected via the general routing resources.
: That fact makes it extremely difficult to improve upon the performance of the
: carry chain ripple carry adder.  

This is the point that I dont necessarily agree on. I agree that you
cannot improve on the performance of a ripple carry adder. Using the
fast-carry logic provides unparallel results for their implemenation.
However, their exist other addition techniques that will provide better
pipeline performance when implemented on an FPGA. The trick is not
to ripple or propagate the carry great lengths between successive
pipeline stages.


: This non-homogenous mix of logic means that the
: cheap ripple carry adder is about as fast as you're gonna get in the FPGA (short
: of pipelining the carry) for word widths up to around 24-32 bits.

Exactly. If you pipeline the carry then you can acheive a matching
performance result to that of the wallace tree. Remember that the
Wallace tree pipelines the carry result at every stage of processing.
Thats why its called a carry-save technique. Why you would want to
use a carry ripple adder after expending the extra logic to implement
a Wallace tree to reduce partial products is beyond me. 


: The result is
: a wallace tree buys you nothing in terms of area, and in fact is twice as big as
: a a row-ripple tree because the ripple carry adders use one LUT per bit (the
: carry is in dedicated logic in xilinx or splits the lut in altera) where the full
: adders in the wallace tree need two luts per bit (one for sum, one for carry).

I agree that the wallace requires more area than the row-ripple tree. As
you have pointed out, thats true because you do not pipeline the carry
values in a row-ripple tree (what I call vector based computation),
whereas in the wallace tree you do. As such, the wallace tree *does* give
you added performance for area. The clocking speed is substantially faster
since carry values only propagate one bit position between pipeline stages
rather than up to 2n bits as in the row-ripple technique. 


: The larger area costs clock cycle time since the routing in FPGAs has substantial
: delay.  Now pipelining will get back the performance (requires a register
: immediately in front of the final adder for best clock speed), but the fact of
: the matter is you are still limited by the speed of that final adder.

But thats my point. Why include a carry ripple adder at the final
stage? This is the obvious performance limiting factor. By using carry
lookahead techniques you can obtain better performance results than
the carry ripple adder. Regardless of the carry ripple adder implemented
by the fast-carry logic.


 So a
: wallace tree gets you at best, the same performance as a row-ripple tree with
: double the area (more if you use partial product techniques at the front layer).
: This is why a wallace tree multiplier is not appropriate for an FPGA.

Sorry, but I disagree. A wallace tree multiplier is appropriate for
an FPGA *if* you use the appropriate adder to combine the sum and carry
results. The BCLA adder is a perfect addition technique to combine with the
wallace tree. Using this, (implemented correctly) the pipeline latency
at every stage of processing will only be from one 4-input LUT output to a
register. Thus this technique matches well to both ALTERA and Xilinx FPGA
architectures.



: That said, the column route delay penalty in Altera 10K devices does make a
: wallace tree a little more attractive for pipelined trees that cannot fit in one
: row.  The reason for that is the clock period is limited by the delay from the
: output register on one level of the tree through the carry chain to the msb
: output register of the next level.  If the levels cross a row boundary, there is
: a significant delay hit which will reduce the clock frequency unless additional
: registers are added ahead of and in the same row of the carry chain.  If the tree
: extends across several rows, several layers of pipeline registers are needed if
: the tree is all ripple carry adds.  A wallace tree can reduce the hit, but again
: at the expense of a considerable amount of area...and that is only true for trees
: that extend across more than two rows.  You get the same clock cycle performance
: in less area by simply adding the extra pipeline registers instead of doing a
: wallace tree, but at the expense of a little clock latency.  Note that this is a
: special case.  The other special case occurs in FPGAs without carry chains, where
: in order to get an advantage by using a wallace tree, your final adder should use
: a fast carry scheme.


: > However, it depends on how you define speed. If you are referring to the
: > clocking rate, then a fully pipelined Wallace tree multiplier will provide
: > the best results - over vector and array based techniques. However,
: > Wallace trees require a large amount of device resource to do so (CLB count).
: >
: > If you are interested in pipelined structures and associated clocking
: > rates, be prepared to experience an area/time tradeoff for multiplication
: > implementations. Thats is, the faster you wish to clock the implementation,
: > the more area you will have to use.
: >
: > If you are interested in the functional density of the implementation,
: > I'd say that vector based approaches (which add partial products in parallel
: > - using fast carry logic) provide best utilisation results.

: For FPGAs with fast carry chains, these partial product techniques also provide
: the fastest multipliers short of pipelining the carries.

: >
: >
: > --
: > Mathew Wojko

: --
: -Ray Andraka, P.E.
: President, the Andraka Consulting Group, Inc.
: 401/884-7930     Fax 401/884-7950
: email randraka@ids.net
: http://users.ids.net/~randraka



--
Mathew Wojko

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search