Messages from 17150

Article: 17150
Subject: Re: FW: Xilinx Acquisition of CoolRunners
From: s_clubb@NOSPAMnetcomuk.co.uk (Stuart Clubb)
Date: Sat, 03 Jul 1999 23:53:30 GMT
Links: << >> << T >> << A >>

On Fri, 02 Jul 1999 10:21:16 -0700, Peter Alfke <peter@xilinx.com>
wrote:

>I forgot them completely, sorry.
>So, Lucent and Atmel are now the only remaining  companies that handle their PLD
>business as a sideline, deriving most of their revenues from other product lines. All
>other PLD companies are "pure players".

Considering that Lucent Technologies had something like a $23 BILLION
revenue in 1997 (as I recall) it would be a little difficult for even
the combined revenue of Xilinx, Altera and everybody else in the
quaint little PLD industry to amount to anything more than a
"sideline" of that size of revenue!

However, as I'm sure Peter knows, the key in this game is MARGIN,
right? Most PLD "players" manage to scrape by on margins that silicon
foundries and vendors of standard products can only dream of.

Cheers
Stuart
For Email remove "NOSPAM" from the address

Article: 17151
Subject: Re: Simple PCI card prototyping.
From: "Austin Franklin" <austin@dark88room.com>
Date: 4 Jul 1999 18:23:58 GMT
Links: << >> << T >> << A >>

> > Steven Casselman <sc@vcc.com> wrote in article
> > <376FF120.6A53D036@vcc.com>...
> > > A PCI target on an computer using an Intel PCI
> > > bridge can expect 80MBytes/sec on transfers
> > > going to a board and about 10-12MBytes/sec
> > > comming from a board.
> > >
> > > These numbers vary.
> >
> > Do you have more specific data?  As you said, these numbers can be all
over
> > the place, so including a bit more about your 'results' would certainly
be
> > helpful...
> >
> > What was the size of the transfer?
> > Sustained or burst?
> > From where to where?
> > What chip set?
> > What CPU?
> >
> > Using the CPU to do the transfer, you might see that for a single
(whose
> > transfer size would be limited by the x86 instruction set) transfer,
but
> > certainly not sustained.
> >
> > Austin
> 
> The program takes an word (32-bits) buffer( 256 to 4K bytes on
> word boundaries) then the CPU transfers this to board. Just
> WinTell boxes. You have to write a little assembly
> program (posted long ago) and call that for each of the transfers.

Not quite what I was looking for.  You said you have seen 80M writing from
the CPU to a PCI target.  I would like to know the specifics under which
you saw 80M/sec.  I have never seen anything close to that, except for a
single CPU 'sized' burst transfer.  Your read numbers seem more in line.

Austin

Article: 17152
Subject: Re: Using Block SelectRAM+ in Virtex
From: Le mer Michel <michel.lemer@ago.fr>
Date: Mon, 05 Jul 1999 09:51:52 +0200
Links: << >> << T >> << A >>

ronak@hclt.com wrote:

> Hi,
>
>    Does anybody know how to use Virtex Block ram in Verilog
> HDL based Synopsys synthesis flow. The datasheets and app
> notes gives only info about architecture and symbols but
> not about how to put it in HDL based flow.
>
> Thanks in advance
> -ronak
>
> Sent via Deja.com http://www.deja.com/
> Share what you know. Learn what you don't.

Hello

If you have the symbols, you can just instatiate them and create the
macro with Coregen.

If Synopssys can recognize it, an other solution is to write the Verilog
source code. It is a classic memory with appropriated depth and I/O.

kunze@fbh_berlin.de

Article: 17153
Subject: Virtex: Excessive PAR run-times without user-feedback?
From: koch@ultra1.eis.cs.tu-bs.de (Andreas Koch)
Date: 5 Jul 1999 14:19:16 GMT
Links: << >> << T >> << A >>

As an experiment, I am trying to prototype SUN's picoJava-II processor
(sans caches and FPU) on a Virtex 1000.  However, PAR has already been
working on the problem for 73h on a 300Mhz UltraSPARC-II machine and
appears to be stuck after placement and the detection/disabling of
circuit loops.

I am very much willing to continue running PAR, but at the moment, it
is not clear that anything useful is happening at all. The process has
grown to over 700MB (no problem, this is a 1GB RAM machine) and
does not perform any system calls (checked with truss).

Should I be more patient, or is the tool just spinning its wheels?

Thanks,
  Andreas Koch


  --  --  --  --  --  --  --  -- BEGIN PAR OUTPUT --  --  --  --  -- --  -- 


PAR: Xilinx Place And Route M1.5.25.
Copyright (c) 1995-1998 Xilinx, Inc.  All rights reserved.

Fri Jul  2 15:44:16 1999

par -w -n 5 -l 5 -c 2 -d 2 picojava picojava.dir


Constraints file: picojava.pcf

Loading device database for application par from file "picojava.ncd".
   "cpu" is an NCD, version 2.27, device xcv1000, package bg560, speed -6
Loading device for application par from file 'v1000.nph' in environment
/cad/xilinx.
Device speed data version:  x1_0.80 1.81 Advanced.


Writing design to file "/var/tmp/xil_AAAa00117".
Device utilization summary:

   Number of External GCLKIOBs         1 out of 4      25%
   Number of External IOBs           171 out of 404    42%

   Number of SLICEs                 9056 out of 12288  73%

   Number of GCLKs                     1 out of 4      25%
   Number of TBUFs                    96 out of 12544   1%



Overall effort level (-ol):   5 (set by user)
Placer effort level (-pl):    5 (default)
Placer cost table entry (-t): 1
Router effort level (-rl):    5 (default)
Timing method (-kpaths|-dfs): -kpaths (default)

Starting initial Timing Analysis.  REAL time: 1 mins 11 secs 
10607 circuit loops found and disabled.
Finished initial Timing Analysis.  REAL time: 15 mins 28 secs 

Starting initial Placement phase.  REAL time: 15 mins 31 secs 
Finished initial Placement phase.  REAL time: 17 mins 43 secs 

Writing design to file "picojava.dir/5_5_1.ncd".

Starting the placer. REAL time: 17 mins 54 secs 
Placer score = 15545624
Placer score = 16969702
Placer score = 15249491
Placer score = 14640617
Placer score = 14266472
Placer score = 14042333
Placer score = 13540962
Placer score = 12933388
Placer score = 12478199
Placer score = 11958896
Placer score = 11519577
Placer score = 11223801
Placer score = 10854976
Placer score = 10546610
Placer score = 10270522
Placer score = 9925386
Placer score = 9577246
Placer score = 9242148
Placer score = 8931156
Placer score = 8628203
Placer score = 8308718
Placer score = 8056360
Placer score = 7820600
Placer score = 7567580
Placer score = 7221818
Placer score = 7067358
Placer score = 6841047
Placer score = 6652239
Placer score = 6496303
Placer score = 6298430
Placer score = 6121752
Placer score = 5995660
Placer score = 5840426
Placer score = 5681994
Placer score = 5575427
Placer score = 5440834
Placer score = 5331105
Placer score = 5236587
Placer score = 5139598
Placer score = 4988285
Placer score = 4919016
Placer score = 4832865
Placer score = 4731888
Placer score = 4639983
Placer score = 4552282
Placer score = 4491128
Placer score = 4413904
Placer score = 4336166
Placer score = 4263983
Placer score = 4204713
Placer score = 4142192
Placer score = 4072188
Placer score = 4019799
Placer score = 3959659
Placer score = 3925500
Placer score = 3867834
Placer score = 3820891
Placer score = 3778140
Placer score = 3735613
Placer score = 3698179
Placer score = 3650582
Placer score = 3617989
Placer score = 3590920
Placer score = 3552244
Placer score = 3527766
Placer score = 3490500
Placer score = 3462919
Placer score = 3445960
Placer score = 3370117
Placer score = 3304339
Placer score = 3255391
Placer score = 3218705
Placer score = 3194370
Placer score = 3175918
Placer score = 3164362
Placer score = 3156328
Placer score = 3151332
Placer score = 3148281
Placer score = 3146661
Placer completed in real time: 4 hrs 44 mins 7 secs 

Writing design to file "picojava.dir/5_5_1.ncd".

Starting Optimizing Placer.  REAL time: 4 hrs 44 mins 24 secs 
Optimizing  .
Swapped 471 comps.
Xilinx Placer [1]   3138123   REAL time: 4 hrs 45 mins 23 secs 
Optimizing  .
Swapped 42 comps.
Xilinx Placer [2]   3137363   REAL time: 4 hrs 46 mins 18 secs 
Optimizing  .
Swapped 10 comps.
Xilinx Placer [3]   3137216   REAL time: 4 hrs 47 mins 12 secs 
Finished Optimizing Placer.  REAL time: 4 hrs 47 mins 12 secs 

Writing design to file "picojava.dir/5_5_1.ncd".

Starting IO Improvement.  REAL time: 4 hrs 47 mins 30 secs 
Placer score = 3088022
Finished IO Improvement.  REAL time: 4 hrs 47 mins 32 secs 

Total REAL time to Placer completion: 4 hrs 48 mins 
Total CPU time to Placer completion: 4 hrs 47 mins 22 secs 

10607 circuit loops found and disabled.
-- 
Andreas Koch                                  Email  : koch@eis.cs.tu-bs.de
Technische Universit"at Braunschweig          Phone  : x49-531-391-2384
Abteilung Entwurf integrierter Schaltungen    Phax   : x49-531-391-5840
Gaussstr. 11, D-38106 Braunschweig, Germany   * PGP key available on request *

Article: 17154
Subject: Re: Altera 10K prices
From: "lewis chen" <lewis@galaxyfareast.com.tw>
Date: Mon, 5 Jul 1999 23:31:58 +0800
Links: << >> << T >> << A >>

How many quantity for your order and which your country
                        Lewis
Karim LIMAM ¼¶¼g©ó¤å³¹ <7lfnhf$oa1$1@arcturus.ciril.fr>...
>Hi,
>
>I'm looking for the prices of the Altera Flex 10K (10K40 .. 10K130E). Has
>somebody an idea ?
>
>Thanks.
>
>
>kerim el imem
>
>

Article: 17155
Subject: Re: newbie -- What's the best way to get started?
From: "Steven K. Knapp" <sknapp@optimagic.com>
Date: Mon, 5 Jul 1999 09:12:20 -0700
Links: << >> << T >> << A >>

Ouch, I stand corrected!

-----------------------------------------------------------
Steven K. Knapp
OptiMagic, Inc. -- "Great Designs Happen 'OptiMagic'-ally"
E-mail:  sknapp@optimagic.com
   Web:  http://www.optimagic.com
-----------------------------------------------------------

Barry Gershenfeld wrote in message <377BD95B.468D@centercomm.com>...
>Steven K. Knapp wrote:
>>
>> There are several companies that provide free or low-cost software for
FPGAs
>> and CPLDs.  Check out The Programmable Logic Jump Station at
>> http://www.optimagic.com/lowcost.shmtl.  The site also has links to
>                                    ^^^
>                                   shtml
>
>Moral: NEVER type a URL--paste it!
>
>:-)  Barry
>

Article: 17156
Subject: Re: Q: Floating point on fpga?
From: muzok@nospam.pacbell.net (muzo)
Date: 05 Jul 1999 12:41:54 PDT
Links: << >> << T >> << A >>

Operand size is one problem. You need a 56 bit multiplier to do single
precision FP multiplication. In addition/subtraction you need to align
the operands, add them and then normalize. This operation takes up a
large number of logic levels, much larger than integer addition. FPUs
are usually done in full custom designs. Even standard cell is not
that performance/area efficient for a full IEEE compliant FPU.

Roland Paterson-Jones <rolandpj@bigfoot.com> wrote:

>Hi
>
>It has been variously stated that fpga's are no good for floating point
>operations. Why? As I see it, floating point operations are typically
>just shifted integer operations. Is the bit-width problematic?
>
>Thanks for any help/opinion
>Roland

muzo

Verilog, ASIC/FPGA and NT Driver Development Consulting (remove nospam from email)

Article: 17157
Subject: A better way to access this newsgroup
From: "David Heller" <davidh@dacafe.com>
Date: 5 Jul 1999 18:20:55 -0800
Links: << >> << T >> << A >>


Hi All,

The web newsgroup interface at http://www.dacafe.com/newsgroups
is the best way I've found to access newsgroups.

It lets me view every available newsgroup, see and listen to 
binaries, and its really easy to use.

Try it out.  You'll find that it will save you time and effort.

Best regards,

David Heller

Article: 17158
Subject: Re: Virtex: Excessive PAR run-times without user-feedback?
From: Le mer Michel <michel.lemer@ago.fr>
Date: Tue, 06 Jul 1999 09:18:45 +0200
Links: << >> << T >> << A >>

Andreas Koch wrote:

> As an experiment, I am trying to prototype SUN's picoJava-II processor
> (sans caches and FPU) on a Virtex 1000.  However, PAR has already been
> working on the problem for 73h on a 300Mhz UltraSPARC-II machine and
> appears to be stuck after placement and the detection/disabling of
> circuit loops.
>
> I am very much willing to continue running PAR, but at the moment, it
> is not clear that anything useful is happening at all. The process has
> grown to over 700MB (no problem, this is a 1GB RAM machine) and
> does not perform any system calls (checked with truss).
>
> Should I be more patient, or is the tool just spinning its wheels?
>
> Thanks,
>   Andreas Koch
>
>
> 10607 circuit loops found and disabled.
> --
> Andreas Koch                                  Email  : koch@eis.cs.tu-bs.de
> Technische Universit"at Braunschweig          Phone  : x49-531-391-2384
> Abteilung Entwurf integrierter Schaltungen    Phax   : x49-531-391-5840
> Gaussstr. 11, D-38106 Braunschweig, Germany   * PGP key available on request *

Hello

I would stop it and try to understand why some part of the design is disabled.
Anyway, do you think the result will be convenient after disabled logic.

Hope this helps,

Michel Le Mer
Gerpi sa (Xilinx Xpert)
3, rue du Bosphore
Alma city
35000 Rennes (France)
(02 99 51 17 18)
http://www.xilinx.com/company/consultants/partdatabase/europedatabase/gerpi.htm

Article: 17159
Subject: Xilink FPGA
From: lingleq@my-deja.com
Date: Tue, 06 Jul 1999 07:42:22 GMT
Links: << >> << T >> << A >>

Hi, all

I need some help about the Xilink FPGA.

I notice that when I modify a circuit module in a FPGA (Sparten), the other
part in the same chip is subject to change too. After the carefully
examining, I found that  it is because some clock driven signal cause a
unexpected large delay which is produced by different route path. For
example, a global clock is chosen to drive a divider or a counter, and an
output of this divider or counter is used to directly trigger or switch the
other function modules, this signal normally cause an extremely large delay,
what's more, the delay is varying with each new implantation (cause the
change of the  route path), so at last it  produces some undesirable result.
I was annoyed to put fire in the circuit design. If I also connect this
signal to the global clock, the problem can be solved. But the global
resource is limited. So can somebody give some useful advice? If I change to
vhdl language not the schematic editor , can the problem be solved?

I appreciate your kindness help!


Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.

Article: 17160
Subject: Benchmark circuits - in VHDL for FPGA
From: csoolan@dso.org.sg
Date: Tue, 06 Jul 1999 16:11:22 +0800
Links: << >> << T >> << A >>

Hello,

Does anyone have recommendations to find benchmark circuits for 
FPGA - preferrably in VHDL ?

Thanks in advance.

email:  csoolan@dso.org.sg

Article: 17161
Subject: Re: Xilink FPGA
From: Ilia Oussorov <fliser6@fli.sh.bosch.de>
Date: Tue, 06 Jul 1999 13:52:58 +0200
Links: << >> << T >> << A >>

This is a multi-part message in MIME format.
--------------F0F52631316D31D2B5D62486
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

I think, your problem that you use for second clock  not dedicated resources.
You have
to locate all your clocks on PRI or SEC buffers(with appropriate synthesis
attributes). You can also minimize the number of clocks using clock enable
inputs.
Hope this helps.


lingleq@my-deja.com wrote:

> Hi, all
>
> I need some help about the Xilink FPGA.
>
> I notice that when I modify a circuit module in a FPGA (Sparten), the other
> part in the same chip is subject to change too. After the carefully
> examining, I found that  it is because some clock driven signal cause a
> unexpected large delay which is produced by different route path. For
> example, a global clock is chosen to drive a divider or a counter, and an
> output of this divider or counter is used to directly trigger or switch the
> other function modules, this signal normally cause an extremely large delay,
> what's more, the delay is varying with each new implantation (cause the
> change of the  route path), so at last it  produces some undesirable result.
> I was annoyed to put fire in the circuit design. If I also connect this
> signal to the global clock, the problem can be solved. But the global
> resource is limited. So can somebody give some useful advice? If I change to
> vhdl language not the schematic editor , can the problem be solved?
>
> I appreciate your kindness help!
>
> Sent via Deja.com http://www.deja.com/
> Share what you know. Learn what you don't.



--------------F0F52631316D31D2B5D62486
Content-Type: text/x-vcard; charset=us-ascii; name="vcard.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Ilia Oussorov
Content-Disposition: attachment; filename="vcard.vcf"

begin:          vcard
fn:             Ilia Oussorov
n:              Oussorov;Ilia
org:            Robert Bosch GmbH, FV/FLI
adr:            P.O.Box 10 60 50;;;Stuttgart;;D-70049;Germany
email;internet: fliser6@fli.sh.bosch.de
tel;work:       +49-(0)-711-8117057
tel;fax:        +49-(0)-711-8117602
x-mozilla-cpt:  ;0
x-mozilla-html: TRUE
version:        2.1
end:            vcard


--------------F0F52631316D31D2B5D62486--

Article: 17162
Subject: Re: Xilink FPGA
From: Ray Andraka <randraka@ids.net>
Date: Tue, 06 Jul 1999 09:06:45 -0400
Links: << >> << T >> << A >>

It sounds like you are using logic outputs as clocks elsewhere in the design.
FPGAs are best designed as synchronous circuits with just one or a very small
number of clocks.  This means using synchronous counters instead of ripple
counters for example.  Using this design technique, all of the clocked logic is
clocked by a common clock.  The individual flip-flops can be controlled with the
clock enable at the CLB or in the CLB logic.

In cases where delays are still critical, it may be necessary to floorplan the
design.  Floorplanning means manually directing the placement of CLBs on the die
using either the RLOC attributes and FMAP and HMAP primitives, the constraints
file, or the graphical floorplanner tool.  Floorplanning will pretty much
eliminate timing variations each time the tool is run.

Changing to VHDL will only complicate the problem, as VHDL intentionally
insulates the user from the FPGAs  structure.

lingleq@my-deja.com wrote:

> Hi, all
>
> I need some help about the Xilink FPGA.
>
> I notice that when I modify a circuit module in a FPGA (Sparten), the other
> part in the same chip is subject to change too. After the carefully
> examining, I found that  it is because some clock driven signal cause a
> unexpected large delay which is produced by different route path. For
> example, a global clock is chosen to drive a divider or a counter, and an
> output of this divider or counter is used to directly trigger or switch the
> other function modules, this signal normally cause an extremely large delay,
> what's more, the delay is varying with each new implantation (cause the
> change of the  route path), so at last it  produces some undesirable result.
> I was annoyed to put fire in the circuit design. If I also connect this
> signal to the global clock, the problem can be solved. But the global
> resource is limited. So can somebody give some useful advice? If I change to
> vhdl language not the schematic editor , can the problem be solved?
>
> I appreciate your kindness help!
>
> Sent via Deja.com http://www.deja.com/
> Share what you know. Learn what you don't.

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 17163
Subject: Re: A better way to access this newsgroup
From: Steve Nordhauser <nords@intersci.com>
Date: Tue, 06 Jul 1999 10:06:38 -0400
Links: << >> << T >> << A >>

David Heller wrote:
> 
> Hi All,
> 
> The web newsgroup interface at http://www.dacafe.com/newsgroups
> is the best way I've found to access newsgroups.
> 
> It lets me view every available newsgroup, see and listen to
> binaries, and its really easy to use.
> 
> Try it out.  You'll find that it will save you time and effort.
> 
> Best regards,
> 
> David Heller

Gee, lots of advertisements.  Why not just point my web browser
to the ISP's mail server?  With Netscape I can sort by thread.
Sometimes I use Deja News for searches.

-- 
Steve Nordhauser      
Embedded Systems Manager   Phone: (518) 283-7500
InterScience, Inc.         Fax:   (518) 283-7502
105 Jordan Road            email: nords@intersci.com
Troy, NY 12180             web: http://www.intersci.com

   "Any sufficiently advanced technology is 
        indistinguishable from magic."
             - Arthur C. Clarke

Article: 17164
Subject: Need informations (articles, on-line) about fast adders and multipliers
From: "Victor Levandovsky" <vic@alpha.podol.khmelnitskiy.ua>
Date: 6 Jul 1999 06:26:46 -0800
Links: << >> << T >> << A >>

Article: 17165
Subject: Re: Floating point on fpga, Counters?
From: "Trevor Landon" <landont@ttc.com>
Date: Tue, 6 Jul 1999 11:03:02 -0400
Links: << >> << T >> << A >>

While we are on the FP in FPGA discussion...

What algorithms exist for floating point counters?  I would like to
impliment a 10 bit mantissa, 5 bit exponent FP counter that could incriment
in roughly 8 clock cycles.

I have very little interest in using a full FP adder for the obvious
reasons.

-Trevor Landon
landont@ttc.com


Jan Gray <jsgray@acm.org.nospam> wrote in message
news:DZrf3.21$qH2.1013@paloalto-snr1.gtei.net...
> Roland Paterson-Jones wrote in message <377DC508.D5F1D048@bigfoot.com>...
> >It has been variously stated that fpga's are no good for floating point
> >operations. Why? As I see it, floating point operations are typically
> >just shifted integer operations. Is the bit-width problematic?
>
> For 16-bit floats with (say) 10 bit mantissas, FPGAs should be *great* for
> floating point.  Indeed problems start with the wider bit-widths.  The
> area-expensive (and worse than linear scaling) FP components are the
barrel
> shifters needed for pre-add mantissa operand alignment and post-add
> normalization in the FP adder, and of course the FP multiplier array.
>
> The FCCM papers on this subject include:
>
> Ligon et al, A Re-evaluation of the practicality of floating-point
> operations on FPGAs, FCCM 1998
>
> Louca et al, Implementation of IEEE single precision floating point
addition
> and multiplication on FPGAs, FCCM 1996
>
> Shirazi et al, Quantitative analysis of floating point arithmetic on FPGA
> based custom computing machines, FCCM 1995
>
> and the neat Leong paper on avoiding the problem entirely,
>
> Leong et al, Automating floating to fixed point translation and its
> application to post-rendering 3D warping, FCCM 1999
>
>
> See the Ligon paper for a nice presentation of speed-area tradeoffs of
> various implementation choices.  Ligon estimates their single-precision FP
> adder resource use at between 563 and 629 LUTs -- 36-40% of a XC4020E.
Note
> this group used a synthesis tool; a hand-mapped design could be smaller.
>
> Put another way, that single precision FP adder is almost twice the area
of
> a pipelined 32-bit RISC datapath.  Ouch.
>
>
> The rest of this article explores ideas for slower-but-smaller FP adders.
>
> The two FP add barrel shifters are the problem.  They each need many LUTs
> and much interconnect.  For example, a w-bit-wide barrel shifter is often
> implemented as lg w stages of w-bit 2-1 muxes, optionally pipelined.
>
> Example 1: single-precision in << s, w=24
>   m0 = s[0] ? in[22:0] << 1 : in;
>   m1 = s[1] ? m0[21:0] << 2: m0;
>   m2 = s[2] ? m1[19:0] << 4 : m1;
>   m3 = s[3] ? m2[15:0] << 8 : m2;  // 16 wires 8 high
>   out = s[4] ? m3[7:0] << 16 : m3; // 8 wires 16 high
> ----
> 5*24 2-1 muxes = 120 LUTs
>
> Example 2: double-precision in << s, w=53
>   m0 = s[0] ? in[51:0] << 1 : in;
>   m1 = s[1] ? m0[50:0] << 2: m0;
>   m2 = s[2] ? m1[48:0] << 4 : m1;
>   m3 = s[3] ? m2[44:0] << 8 : m2; // 45 wires 8 high
>   m4 = s[4] ? m3[36:0] << 16 : m3; // 37 wires 16 high
>   out = s[5] ? m4[20:0] << 32 : m4; // 21 wires 32 high
> ----
> 6*53 2-1 muxes = 318 LUTs
>
> In a horizontally oriented datapath, the last few mux stages have many
> vertical wires, each many LUTs high.  This is more vertical interconnect
> than is available in one column of LUTs/CLBs, so the actual area can be
much
> worse than the LUT count indicates.
>
>
> BUT we can of course avoid the barrel shifters, and do FP
> denormalization/renormalization iteratively.
>
> Idea #1: Replace the barrel shifters with early-out iterative shifters.
For
> example, build a registered 4-1 mux: w = mux(in, w<<1, w<<3, w<<7).  Then
an
> arbitrary 24-bit shift can be done in 5 cycles or less in ~1/3 of the
area.
> For double precision, make it something like w = mux(in, w<<1, w<<4,
w<<12),
> giving an arbitrary 53-bit shift in 8 cycles.
>
>
> Idea #2: (half baked and sketchy) Do FP addition in a bit- or
nibble-serial
> fashion.
>
> To add A+B, you
>
> 1) compare exponents A.exp and B.exp;
> 2) serialize A.mant and B.mant, LSB first;
> 3) swap (using 2 2-1 muxes) lsb-serial(A.mant) and lsb-serial(B.mant) if
> A.exp < B.exp
> 4) delay lsb-serial(A.mant) in a w-bit FIFO for abs(A.exp-B.exp) cycles;
> 5) bit-serial-add delay(lsb-serial(A.mant)) + lsb-serial(B.mant) for w
> cycles
> 6) collect in a "sum.mant" shift register
> 7) shift up to w-1 cycles (until result mantissa is normalized).
>
> It may be that steps 4 and 6 are quite cheap, using Virtex 4-LUTs in shift
> register mode -- they're variable tap, right?
>
> It is interesting to consider eliminating steps 2, 6, and 7, by keeping
your
> FP mantissa values in the serialized representation between operations,
> counting clocks since last sum-1-bit seen, and then normalizing (exponent
> adjustment only) and aligning *both* operands (via swap/delay) on input to
> the next FP operation.  A big chained data computation might exploit many
> serially interconnected serial FP adders and serial FP multipliers...
>
> Is this approach better (throughput/area) than a traditional pipelined
> word-oriented FP datapath?  Probably not, I don't know.  But if your FP
> needs are modest (Mflops not 100 Mflops) this approach should permit quite
> compact FP hardware.
>
> (Philip Freidin and I discussed this at FCCM99.  Thanks Philip.)
>
> Jan Gray
>
>
>

Article: 17166
Subject: Re: Floating point on fpga, Counters?
From: Rickman <spamgoeshere4@yahoo.com>
Date: Tue, 06 Jul 1999 12:52:32 -0400
Links: << >> << T >> << A >>

Trevor Landon wrote:
> 
> While we are on the FP in FPGA discussion...
> 
> What algorithms exist for floating point counters?  I would like to
> impliment a 10 bit mantissa, 5 bit exponent FP counter that could incriment
> in roughly 8 clock cycles.
> 
> I have very little interest in using a full FP adder for the obvious
> reasons.

I don't know that a counter is compatible with a floating point format.
If you have a 10 bit mantissa, once you reach a count of 1023 how do you
continue to increment by one? At a value of 1024 (or 2048 if you use a
hidden bit I guess) the lsb value is 2. So incrementing by one will not
change the value of the counter. If the increment value is an arbitrary
size, then you need an adder. 

So maybe I don't understand what you are trying to do.

-- 

Rick Collins

rick.collins@XYarius.com

remove the XY to email me.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design

Arius
4 King Ave
Frederick, MD 21701-3110
301-682-7772 Voice
301-682-7666 FAX

Internet URL http://www.arius.com

Article: 17167
Subject: Re: Using Block SelectRAM+ in Virtex
From: Paulo Dutra <paulo@xilinx.com>
Date: Tue, 06 Jul 1999 10:23:03 -0700
Links: << >> << T >> << A >>

Check out app note 130, http://www.xilinx.com/xapp/xapp130.pdf

module MYMEM (CLK, WE, ADDR, DIN, DOUT);
input CLK, WE;
input [8:0] ADDR;
input [7:0] DIN;
output [7:0] DOUT;

wire logic0, logic1;

//synopsys dc_script_begin
//set_attribute ram0 INIT_00
"0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF" -type string
//set_attribute ram0 INIT_01
"FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210" -type string
//synopsys dc_script_end

assign logic0 = 1'b0;
assign logic1 = 1'b1;

RAMB4_S8 ram0 (.WE(WE), .EN(logic1), .RST(logic0), .CLK(CLK), .ADDR(ADDR),
.DI(DIN), .DO(DOUT));

//synopsys translate_off
defparam ram0.INIT_00 =
256'h0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF;
defparam ram0.INIT_01 =
256'hFEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210;
//synopsys translate_on

endmodule

Le mer Michel wrote:
> 
> ronak@hclt.com wrote:
> 
> > Hi,
> >
> >    Does anybody know how to use Virtex Block ram in Verilog
> > HDL based Synopsys synthesis flow. The datasheets and app
> > notes gives only info about architecture and symbols but
> > not about how to put it in HDL based flow.
> >
> > Thanks in advance
> > -ronak
> >
> > Sent via Deja.com http://www.deja.com/
> > Share what you know. Learn what you don't.
> 
> Hello
> 
> If you have the symbols, you can just instatiate them and create the
> macro with Coregen.
> 
> If Synopssys can recognize it, an other solution is to write the Verilog
> source code. It is a classic memory with appropriated depth and I/O.
> 
> kunze@fbh_berlin.de

-- 
Paulo                                      //\\\\ 
                                           | ~ ~ |
                                          (  O O  )
 __________________________________oOOo______( )_____oOOo_______
|                                             .                 |
| / 7\'7 Paulo Dutra (paulo@xilinx.com)                         |
| \ \ `  Xilinx                              hotline@xilinx.com |
| / /    2100 Logic Drive                    (800) 255-7778     |  
| \_\/.\ San Jose, California 95124-3450 USA                    | 
|                                                  Oooo         |
|________________________________________oooO______(  )_________|
                                         (  )       ) /
                                          \ (      (_/
                                           \_)

Article: 17168
Subject: Re: Simple PCI card prototyping.
From: Steven Casselman <sc@vcc.com>
Date: Tue, 06 Jul 1999 15:45:40 -0700
Links: << >> << T >> << A >>

>
>
> Not quite what I was looking for.  You said you have seen 80M writing from
> the CPU to a PCI target.  I would like to know the specifics under which
> you saw 80M/sec.  I have never seen anything close to that, except for a
> single CPU 'sized' burst transfer.  Your read numbers seem more in line.
>
> Austin

You have to use the assembly code to get the Intel chip set
to aggregate the writes otherwise you will see something
more in the range of 20-40 meg/sec.

// word is unsigned int

void PCICore::write(word addr, word data, word count)
{
// lines stuff up
  word addr = ((addr<<2) | _offset) + _memBase;

   word *dptr = &data;

   __asm
   {
      push edi
      push ecx
      push esi
      mov esi, dptr
      mov edi, addr
      mov ecx, count
      cld
      rep movsd
      pop esi
      pop ecx
      pop edi
   }

#endif

}// end write




--
Steve Casselman, President
Virtual Computer Corporation
http://www.vcc.com

Article: 17169
Subject: Re: Need informations (articles, on-line) about fast adders and multipliers
From: Ray Andraka <randraka@ids.net>
Date: Tue, 06 Jul 1999 18:57:03 -0400
Links: << >> << T >> << A >>

For FPGAs with carry chains, you'll find it extremely hard to beat the
performance of a ripple carry adder for widths to around 32 bits.  For
multiplication in FPGAs, you might look at the summary I have on my
website under the DSP in FPGAs page.

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 17170
Subject: Re: Floating point on fpga, Counters?
From: Ray Andraka <randraka@ids.net>
Date: Tue, 06 Jul 1999 19:23:42 -0400
Links: << >> << T >> << A >>

I think the counter has to be fixed point with as many bits as is required to
represent the maximum count desired.  If it were floating point, you'd need a
complementary counter underneath it to resolve increment values below the
precision of the mantissa as the count increases.  If you need a floating point
output from the counter, you would use a fixed point counter of the appropriate
width followed by a normalizing barrel shifter.

Trevor Landon wrote:

> While we are on the FP in FPGA discussion...
>
> What algorithms exist for floating point counters?  I would like to
> impliment a 10 bit mantissa, 5 bit exponent FP counter that could incriment
> in roughly 8 clock cycles.
>
> I have very little interest in using a full FP adder for the obvious
> reasons.
>
> -Trevor Landon
> landont@ttc.com
>
> Jan Gray <jsgray@acm.org.nospam> wrote in message
> news:DZrf3.21$qH2.1013@paloalto-snr1.gtei.net...
> > Roland Paterson-Jones wrote in message <377DC508.D5F1D048@bigfoot.com>...
> > >It has been variously stated that fpga's are no good for floating point
> > >operations. Why? As I see it, floating point operations are typically
> > >just shifted integer operations. Is the bit-width problematic?
> >
> > For 16-bit floats with (say) 10 bit mantissas, FPGAs should be *great* for
> > floating point.  Indeed problems start with the wider bit-widths.  The
> > area-expensive (and worse than linear scaling) FP components are the
> barrel
> > shifters needed for pre-add mantissa operand alignment and post-add
> > normalization in the FP adder, and of course the FP multiplier array.
> >
> > The FCCM papers on this subject include:
> >
> > Ligon et al, A Re-evaluation of the practicality of floating-point
> > operations on FPGAs, FCCM 1998
> >
> > Louca et al, Implementation of IEEE single precision floating point
> addition
> > and multiplication on FPGAs, FCCM 1996
> >
> > Shirazi et al, Quantitative analysis of floating point arithmetic on FPGA
> > based custom computing machines, FCCM 1995
> >
> > and the neat Leong paper on avoiding the problem entirely,
> >
> > Leong et al, Automating floating to fixed point translation and its
> > application to post-rendering 3D warping, FCCM 1999
> >
> >
> > See the Ligon paper for a nice presentation of speed-area tradeoffs of
> > various implementation choices.  Ligon estimates their single-precision FP
> > adder resource use at between 563 and 629 LUTs -- 36-40% of a XC4020E.
> Note
> > this group used a synthesis tool; a hand-mapped design could be smaller.
> >
> > Put another way, that single precision FP adder is almost twice the area
> of
> > a pipelined 32-bit RISC datapath.  Ouch.
> >
> >
> > The rest of this article explores ideas for slower-but-smaller FP adders.
> >
> > The two FP add barrel shifters are the problem.  They each need many LUTs
> > and much interconnect.  For example, a w-bit-wide barrel shifter is often
> > implemented as lg w stages of w-bit 2-1 muxes, optionally pipelined.
> >
> > Example 1: single-precision in << s, w=24
> >   m0 = s[0] ? in[22:0] << 1 : in;
> >   m1 = s[1] ? m0[21:0] << 2: m0;
> >   m2 = s[2] ? m1[19:0] << 4 : m1;
> >   m3 = s[3] ? m2[15:0] << 8 : m2;  // 16 wires 8 high
> >   out = s[4] ? m3[7:0] << 16 : m3; // 8 wires 16 high
> > ----
> > 5*24 2-1 muxes = 120 LUTs
> >
> > Example 2: double-precision in << s, w=53
> >   m0 = s[0] ? in[51:0] << 1 : in;
> >   m1 = s[1] ? m0[50:0] << 2: m0;
> >   m2 = s[2] ? m1[48:0] << 4 : m1;
> >   m3 = s[3] ? m2[44:0] << 8 : m2; // 45 wires 8 high
> >   m4 = s[4] ? m3[36:0] << 16 : m3; // 37 wires 16 high
> >   out = s[5] ? m4[20:0] << 32 : m4; // 21 wires 32 high
> > ----
> > 6*53 2-1 muxes = 318 LUTs
> >
> > In a horizontally oriented datapath, the last few mux stages have many
> > vertical wires, each many LUTs high.  This is more vertical interconnect
> > than is available in one column of LUTs/CLBs, so the actual area can be
> much
> > worse than the LUT count indicates.
> >
> >
> > BUT we can of course avoid the barrel shifters, and do FP
> > denormalization/renormalization iteratively.
> >
> > Idea #1: Replace the barrel shifters with early-out iterative shifters.
> For
> > example, build a registered 4-1 mux: w = mux(in, w<<1, w<<3, w<<7).  Then
> an
> > arbitrary 24-bit shift can be done in 5 cycles or less in ~1/3 of the
> area.
> > For double precision, make it something like w = mux(in, w<<1, w<<4,
> w<<12),
> > giving an arbitrary 53-bit shift in 8 cycles.
> >
> >
> > Idea #2: (half baked and sketchy) Do FP addition in a bit- or
> nibble-serial
> > fashion.
> >
> > To add A+B, you
> >
> > 1) compare exponents A.exp and B.exp;
> > 2) serialize A.mant and B.mant, LSB first;
> > 3) swap (using 2 2-1 muxes) lsb-serial(A.mant) and lsb-serial(B.mant) if
> > A.exp < B.exp
> > 4) delay lsb-serial(A.mant) in a w-bit FIFO for abs(A.exp-B.exp) cycles;
> > 5) bit-serial-add delay(lsb-serial(A.mant)) + lsb-serial(B.mant) for w
> > cycles
> > 6) collect in a "sum.mant" shift register
> > 7) shift up to w-1 cycles (until result mantissa is normalized).
> >
> > It may be that steps 4 and 6 are quite cheap, using Virtex 4-LUTs in shift
> > register mode -- they're variable tap, right?
> >
> > It is interesting to consider eliminating steps 2, 6, and 7, by keeping
> your
> > FP mantissa values in the serialized representation between operations,
> > counting clocks since last sum-1-bit seen, and then normalizing (exponent
> > adjustment only) and aligning *both* operands (via swap/delay) on input to
> > the next FP operation.  A big chained data computation might exploit many
> > serially interconnected serial FP adders and serial FP multipliers...
> >
> > Is this approach better (throughput/area) than a traditional pipelined
> > word-oriented FP datapath?  Probably not, I don't know.  But if your FP
> > needs are modest (Mflops not 100 Mflops) this approach should permit quite
> > compact FP hardware.
> >
> > (Philip Freidin and I discussed this at FCCM99.  Thanks Philip.)
> >
> > Jan Gray
> >
> >
> >



--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email randraka@ids.net
http://users.ids.net/~randraka

Article: 17171
Subject: Re: 100 Billion operations per sec.!
From: Steven Casselman <sc@vcc.com>
Date: Tue, 06 Jul 1999 16:26:51 -0700
Links: << >> << T >> << A >>



Tom Kean wrote:

> > >
> > > I would argue with that.  Its more like modern reconfigurable computers
> > > were not possible before CMOS technology got to a certain level of
> > > capability.  Ross Freeman's genius was building a team that could turn
> > > his idea into an industry and hitting the market just when process
> > > technology made the overhead of reconfiguration economically viable.
> > >
> > > Tom.
> >
> > I certainly agree that Ross put together a great team.
> > But by that logic Jules Verne invented interplanetary
> > flight but did not have the team to make it happen.
>
> I don't think thats a fair analogy: the guys in the 60's and 70's were
> engineers
> not science fiction writers and some of them built working
> systems.  They published literaly hundreds of technical papers.  They
> had configuration memory controlling programmable function units and
> switches.  Their big problem was that their configuration memory was
> shift registers
> built from logic gates and their multiplexers were also built from logic
> gates
> and you did not get that many logic gates on a chip at that time.
>
> Tom.

You have a point. Lots of great work has been done by
lots of great engineers.

The line of where something is new or not is very thin in
places.  Maybe you'll agree to this statement

"Modern reconfigurable computing could not begin
untill Ross Freeman invented the FPGA." With the
thought that "Modren" distinguishes the current
type (FPGA based) of reconfigurable computers from
projects that came before the FPGA.


--
Steve Casselman, President
Virtual Computer Corporation
http://www.vcc.com

Article: 17172
Subject: Re: Synplify problem - is it just me?
From: brian_m_davis@my-deja.com
Date: Wed, 07 Jul 1999 01:20:48 GMT
Links: << >> << T >> << A >>

 I believe the posted example code of division
by a 2^N constant can not be mapped to a right
shift of N bits due to the use of a negative
range in the integer operand.

  I don't have my VHDL references at hand (and
I haven't ever used integer types in code for
synthesis), but a quick check of the code with
ModelTech shows VHDL signed integer division to
be implemented using a "symmetrical around zero"
rather than a "floored to -infinity" signed
divide algorithm; as a result, the upper bits
of the "q" counter do not match the output port
"sigma" for negative "q" values.

                |-symmetrical-||-- floored --|
                    q/8             q/8
 int.  7 bit    int.   4 bit    int.   4 bit
  q      q      sigma  sigma    sigma  sigma
______________________________________________
  9   0001_001   1      0001     1     0001
  8   0001_000   1      0001     1     0001
  7   0000_111   0      0000     0     0000
  .       .      .       .       .      .
  1   0000_001   0      0000     0     0000
  0   0000_000   0      0000     0     0000
 -1   1111_111   0      0000    -1     1111
  .       .      .       .       .      .
 -7   1111_001   0      0000    -1     1111
 -8   1111_000  -1      1111    -1     1111
 -9   1110_111  -1      1111    -2     1110


  I'm not sure why Synplify has a go at building
the extra divider hardware for the version with
the '8' on the same line vs. dying with errors
when the '8' is defined as a generic; in any case,
the synthesized hardware is probably not what you
would be expecting: the seven bit counter is
followed by some sign dependent offset adder logic
to produce the four output bits.

  Changing the range of the integers to positive
values ("range 0 to 2*span-1") eliminates the
Synplify error message when using the generic, and
builds the 'expected' seven bit counter with the
upper four bits directly connected to "sigma"
port.

Brian Davis


In article <7ljcg9$nq5$1@nnrp1.deja.com>,
  ehiebert@my-deja.com wrote:
> I spoke to Synplicity about this exact problem.
I was trying to code a
> div, and here is what they said.
>
> "Division is only supported for compile time
constants that can be
> guarenteed to be a power of 2."
>
> The key here is compile time. In order to do a
division of vectors, I
> had to write my own divide unit. The problem
is, Synplify 5.1.4 has
> problems synthesizing my divide unit. It is
optimizing out the divisor
> register that I have coded. no word back from
Synplicity on this one
> yet, except to say this is a new bug.....
REALLY?!?!?
>
> Eldon.
>
> In article <376577f7.2542275@news.u-net.com>,
>   jonathan@oxfordbromley.u-net.com wrote:
> > Synplify version 5.1.1 (free with Actel
Desktop) can't cope with the
> > following code. A very old version of
Synplify (2.5) processes it
> > just fine;  FPGA Express has no problem.  The
compiler blows up
> > on the line indicated, but in fact the
culprit is a couple of
> > lines later:  if I make the divisor a hard-
coded constant instead
> > of the generic, all is well.  Before I whinge
to Synplicity, has
> > anyone come across this one?  BTW, I know
that the generic 'span'
> > has to be a power of 2;  in the real thing
there's an assert to
> > test that.  This example intentionally
lobotomised.
> >
> > library ieee;
> > use ieee.std_logic_1164.all;
> >
> > entity accum is
> >    generic (span: natural := 8);
> >    port (
> >       clk, rst, UpNotDown: in std_logic;
> >       sigma: out integer range -span to span-1
> >    );
> > end accum;
> >
> > architecture counter of accum is
> >    -- compiler blows up at the following line:
> >    signal q: integer range -span*span to
span*span-1;
> > begin
> >    sigma <= q/span;  -- change to "q/8" and
all is OK
> >    count_proc: process (clk, rst)
> >    begin
> >       if rst='1' then
> >          q <= 0;
> >       elsif rising_edge(clk) then
> >          if UpNotDown='1' then
> >             q <= q+1;
> >          else
> >             q <= q-1;
> >          end if;
> >       end if;
> >    end process;
> > end counter;
> >
> > Jonathan Bromley
> >
> >
>
> Sent via Deja.com http://www.deja.com/
> Share what you know. Learn what you don't.
>



Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.

Article: 17173
Subject: Can i use verilog write testbench in altera?
From: "shaw" <shaw@tmi.com.tw>
Date: 7 Jul 1999 02:33:57 GMT
Links: << >> << T >> << A >>

If can use it,how write ?
Thanks!

Article: 17174
Subject: Tristate Register in Xilinx 4000XLA IO block
From: "Hermann Winkler" <hermann@winkler.m.uunet.de>
Date: Wed, 7 Jul 1999 11:33:10 +0200
Links: << >> << T >> << A >>

The 4000XLA IOB contains a "tristate register" that can disable the
tristate output buffer (OBUFT). Opening a 4000XLA design in EPIC
and looking into any IO block shows this register. In EPIC it is called
"TRIFF".
But if I instantiate a primitive element TRIFF in an XNF file, then
'ngdbuild'
does not recognize ist.

Xilinx Support didn't even know anything about this register.

How can I use the new tristate flip-flop in the 4000XLA IOB?

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search