Messages from 67000

Article: 67000
Subject: Design never finish routing?
From: "Kelvin" <kelvin8157@hotmail.com>
Date: Wed, 3 Mar 2004 19:52:22 +0800
Links: << >> << T >> << A >>

Hi, there:

My design takes up a long time still not routed...

Here is the resources consumption and P&R log... How can I make it run
faster?
It is a simple 1-clock design, with 118 multipliers.



Selected Device : 2v6000bf957-6

 Number of Slices:                   19984  out of  33792    59%
 Number of Slice Flip Flops:         11278  out of  67584    16%
 Number of 4 input LUTs:             31877  out of  67584    47%
 Number of bonded IOBs:                 93  out of    684    13%
 Number of BRAMs:                       27  out of    144    18%
 Number of MULT18X18s:                 122  out of    144    84%
 Number of GCLKs:                        1  out of     16     6%


Phase 1: 141110 unrouted;       REAL time: 21 mins 14 secs

Phase 2: 117379 unrouted;       REAL time: 28 mins 10 secs

...

  Intermediate status: 265 unrouted;       REAL time: 3 days 8 hrs 16 mins
53 secs

  Intermediate status: 265 unrouted;       REAL time: 3 days 8 hrs 47 mins
21 secs

  Intermediate status: 260 unrouted;       REAL time: 3 days 9 hrs 18 mins 1
secs

  Intermediate status: 259 unrouted;       REAL time: 3 days 9 hrs 48 mins
37 secs

  Intermediate status: 257 unrouted;       REAL time: 3 days 10 hrs 19 mins
6 secs

  Intermediate status: 246 unrouted;       REAL time: 3 days 10 hrs 50 mins
16 secs

  Intermediate status: 254 unrouted;       REAL time: 3 days 11 hrs 20 mins
55 secs

  Intermediate status: 242 unrouted;       REAL time: 3 days 11 hrs 51 mins
34 secs

  Intermediate status: 244 unrouted;       REAL time: 3 days 12 hrs 22 mins
6 secs

  Intermediate status: 255 unrouted;       REAL time: 3 days 12 hrs 52 mins
50 secs

  Intermediate status: 250 unrouted;       REAL time: 3 days 13 hrs 23 mins
34 secs

  Intermediate status: 248 unrouted;       REAL time: 3 days 13 hrs 54 mins
39 secs

  Intermediate status: 239 unrouted;       REAL time: 3 days 14 hrs 25 mins
21 secs

  Intermediate status: 239 unrouted;       REAL time: 3 days 14 hrs 55 mins
51 secs

  Intermediate status: 233 unrouted;       REAL time: 3 days 15 hrs 26 mins
22 secs

  Intermediate status: 238 unrouted;       REAL time: 3 days 15 hrs 56 mins
50 secs

  Intermediate status: 228 unrouted;       REAL time: 3 days 16 hrs 27 mins
19 secs

  Intermediate status: 236 unrouted;       REAL time: 3 days 16 hrs 57 mins
46 secs

  Intermediate status: 237 unrouted;       REAL time: 3 days 17 hrs 28 mins
14 secs

Article: 67001
Subject: anyone using nios kit APEX?
From: chi_huageng@yahoo.com (chi)
Date: 3 Mar 2004 05:17:43 -0800
Links: << >> << T >> << A >>

Hello,

I'm using Nios Development Kit (general purpose, APEX). Is it
outdated? It seems hard to find reference. All references are talking
about Cyclone and Stratix. I even could not find the software
development tutorial for APEX. I'm new to this tool suit. Does anyone
have good suggestions to shorten time to hand on? Thanks!

Chi

Article: 67002
Subject: Re: Design never finish routing?
From: Ray Andraka <ray@andraka.com>
Date: Wed, 03 Mar 2004 09:32:51 -0500
Links: << >> << T >> << A >>

Are you sure you aren't using 32/36 bit wide brams with co-located multipliers.
A 32 or 36 bit wide BRAM shares data lines with one of the multiplier
multiplicand inputs.  It could be that some of your BRAMs are 32/6 bit and there
are not enough locations to put them all in places where the adjacent multiplier
is not used.  If the quantity of each indicate a fit is possible, then you may
have to resort to floorplanning the multipliers and BRAMs, as the placer doesn't
seem to do so well with either.

Kelvin wrote:

> Hi, there:
>
> My design takes up a long time still not routed...
>
> Here is the resources consumption and P&R log... How can I make it run
> faster?
> It is a simple 1-clock design, with 118 multipliers.
>
> Selected Device : 2v6000bf957-6
>
>  Number of Slices:                   19984  out of  33792    59%
>  Number of Slice Flip Flops:         11278  out of  67584    16%
>  Number of 4 input LUTs:             31877  out of  67584    47%
>  Number of bonded IOBs:                 93  out of    684    13%
>  Number of BRAMs:                       27  out of    144    18%
>  Number of MULT18X18s:                 122  out of    144    84%
>  Number of GCLKs:                        1  out of     16     6%
>
> Phase 1: 141110 unrouted;       REAL time: 21 mins 14 secs
>
> Phase 2: 117379 unrouted;       REAL time: 28 mins 10 secs
>
> ...
>
>   Intermediate status: 265 unrouted;       REAL time: 3 days 8 hrs 16 mins
> 53 secs
>
>   Intermediate status: 265 unrouted;       REAL time: 3 days 8 hrs 47 mins
> 21 secs
>
>   Intermediate status: 260 unrouted;       REAL time: 3 days 9 hrs 18 mins 1
> secs
>
>   Intermediate status: 259 unrouted;       REAL time: 3 days 9 hrs 48 mins
> 37 secs
>
>   Intermediate status: 257 unrouted;       REAL time: 3 days 10 hrs 19 mins
> 6 secs
>
>   Intermediate status: 246 unrouted;       REAL time: 3 days 10 hrs 50 mins
> 16 secs
>
>   Intermediate status: 254 unrouted;       REAL time: 3 days 11 hrs 20 mins
> 55 secs
>
>   Intermediate status: 242 unrouted;       REAL time: 3 days 11 hrs 51 mins
> 34 secs
>
>   Intermediate status: 244 unrouted;       REAL time: 3 days 12 hrs 22 mins
> 6 secs
>
>   Intermediate status: 255 unrouted;       REAL time: 3 days 12 hrs 52 mins
> 50 secs
>
>   Intermediate status: 250 unrouted;       REAL time: 3 days 13 hrs 23 mins
> 34 secs
>
>   Intermediate status: 248 unrouted;       REAL time: 3 days 13 hrs 54 mins
> 39 secs
>
>   Intermediate status: 239 unrouted;       REAL time: 3 days 14 hrs 25 mins
> 21 secs
>
>   Intermediate status: 239 unrouted;       REAL time: 3 days 14 hrs 55 mins
> 51 secs
>
>   Intermediate status: 233 unrouted;       REAL time: 3 days 15 hrs 26 mins
> 22 secs
>
>   Intermediate status: 238 unrouted;       REAL time: 3 days 15 hrs 56 mins
> 50 secs
>
>   Intermediate status: 228 unrouted;       REAL time: 3 days 16 hrs 27 mins
> 19 secs
>
>   Intermediate status: 236 unrouted;       REAL time: 3 days 16 hrs 57 mins
> 46 secs
>
>   Intermediate status: 237 unrouted;       REAL time: 3 days 17 hrs 28 mins
> 14 secs

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 67003
Subject: Re: Need to speed up Stratix compiles.
From: "Paul Leventis \(at home\)" <paul.leventis@utoronto.ca>
Date: Wed, 03 Mar 2004 14:38:48 GMT
Links: << >> << T >> << A >>

Hi Jim,

>   I think the OP was refering to the wider datapaths.
> I don't know the cycle level details of the AMD or Intel 64 bit
> but an obvious and simple speed gain can come from a wider HW fetch.
> (even running < 64 bit opcodes ) and then a simple check if the next
> opcode / next data value is in that block.

Yes, wider memory interfaces/cache data lines can help, but as you say, this
is independent of op-code size.  If I recall correctly, AMD and Intel
processors already fetch 64-bit blocks, but this may have been increased.
The latest m/b chipsets for both families of processors use dual-channel DDR
(128-bits wide) and so I would not be surprised if they've increased the
size of fetches.

As vendors introduce 64-bit capable processors (such as Opteron), they often
also enhance various aspects of the CPU architecture in ways that help both
32- and 64-bit code.  And while the 64-bitness of x86-64 may not matter much
for speed, the doubling of the register files etc. could result in faster
performance.

It's every computer engineers dream to be a processor architect, isn't it?
:-)

Regards,

- Paul

Article: 67004
Subject: Re: Design never finish routing?
From: "John Adair" <newsreply@loseinspace.co.uk>
Date: Wed, 3 Mar 2004 15:02:36 -0000
Links: << >> << T >> << A >>

Here are some things to check. (a) Check you are not running out of real
memory. If you are paging to disk then run times will get very large. You
can check the memory usage in "TASK MANAGER" if you are running NT4, WIN2K
or XP. (b) Check that you are not overconstrained or simply can never meet
the timing. Check this by running "TIMING ANALYSER" at map stage.

You may consider using incremental design. Have a look on the Xilinx website
for info. You may some pointers from our current TechTips
http://www.enterpoint.co.uk/techitips.html  but it is more aimed at
increment synthesis.

You may also wish to consider some floorplanning to help the tools on their
way.

John Adair
Enterpoint Ltd.
http://www.enterpoint.co.uk

This message is the personal opinion of the sender and not that necessarily
that of Enterpoint Ltd.. Readers should make their own evaluation of the
facts. No responsibility for error or inaccuracy is accepted.

"Kelvin" <kelvin8157@hotmail.com> wrote in message
news:4045c71c$1@news.starhub.net.sg...
> Hi, there:
>
> My design takes up a long time still not routed...
>
> Here is the resources consumption and P&R log... How can I make it run
> faster?
> It is a simple 1-clock design, with 118 multipliers.
>
>
>
> Selected Device : 2v6000bf957-6
>
>  Number of Slices:                   19984  out of  33792    59%
>  Number of Slice Flip Flops:         11278  out of  67584    16%
>  Number of 4 input LUTs:             31877  out of  67584    47%
>  Number of bonded IOBs:                 93  out of    684    13%
>  Number of BRAMs:                       27  out of    144    18%
>  Number of MULT18X18s:                 122  out of    144    84%
>  Number of GCLKs:                        1  out of     16     6%
>
>
> Phase 1: 141110 unrouted;       REAL time: 21 mins 14 secs
>
> Phase 2: 117379 unrouted;       REAL time: 28 mins 10 secs
>
> ...
>
>   Intermediate status: 265 unrouted;       REAL time: 3 days 8 hrs 16 mins
> 53 secs
>
>   Intermediate status: 265 unrouted;       REAL time: 3 days 8 hrs 47 mins
> 21 secs
>
>   Intermediate status: 260 unrouted;       REAL time: 3 days 9 hrs 18 mins
1
> secs
>
>   Intermediate status: 259 unrouted;       REAL time: 3 days 9 hrs 48 mins
> 37 secs
>
>   Intermediate status: 257 unrouted;       REAL time: 3 days 10 hrs 19
mins
> 6 secs
>
>   Intermediate status: 246 unrouted;       REAL time: 3 days 10 hrs 50
mins
> 16 secs
>
>   Intermediate status: 254 unrouted;       REAL time: 3 days 11 hrs 20
mins
> 55 secs
>
>   Intermediate status: 242 unrouted;       REAL time: 3 days 11 hrs 51
mins
> 34 secs
>
>   Intermediate status: 244 unrouted;       REAL time: 3 days 12 hrs 22
mins
> 6 secs
>
>   Intermediate status: 255 unrouted;       REAL time: 3 days 12 hrs 52
mins
> 50 secs
>
>   Intermediate status: 250 unrouted;       REAL time: 3 days 13 hrs 23
mins
> 34 secs
>
>   Intermediate status: 248 unrouted;       REAL time: 3 days 13 hrs 54
mins
> 39 secs
>
>   Intermediate status: 239 unrouted;       REAL time: 3 days 14 hrs 25
mins
> 21 secs
>
>   Intermediate status: 239 unrouted;       REAL time: 3 days 14 hrs 55
mins
> 51 secs
>
>   Intermediate status: 233 unrouted;       REAL time: 3 days 15 hrs 26
mins
> 22 secs
>
>   Intermediate status: 238 unrouted;       REAL time: 3 days 15 hrs 56
mins
> 50 secs
>
>   Intermediate status: 228 unrouted;       REAL time: 3 days 16 hrs 27
mins
> 19 secs
>
>   Intermediate status: 236 unrouted;       REAL time: 3 days 16 hrs 57
mins
> 46 secs
>
>   Intermediate status: 237 unrouted;       REAL time: 3 days 17 hrs 28
mins
> 14 secs
>
>
>

Article: 67005
Subject: Re: Need to speed up Stratix compiles.
From: Max <mtj2@btopenworld.com>
Date: Wed, 3 Mar 2004 15:20:49 +0000 (UTC)
Links: << >> << T >> << A >>

On Tue, 2 Mar 2004 19:57:58 -0600, Kenneth Land wrote:

>Seems to be a common misconception that 64bits just increases the amount of
>addressable memory.  

The only common misconception is that swapping for a 64-bit processor
in a desktop PC will lead to a large performance increase. It doesn't.
(Other than any gain from a higher clock speed, of course.)

Like to make a guess as to the extra overhead in a 64-bit version of
current OSs, btw?

>More importantly for most applications is that twice
>the data is moved or operated on per clock cycle.

Data is only data if it's meaningful. The use of 64-bit arithmetic
variables is comparatively rare in most applications. Certain
scientific and CAD packages do make heavy use of 64-bit floats, but I
doubt that's the case here (and high-end processors tend to use 80-bit
data paths around the FPU anyway). There's not a lot to be gained from
accessing memory in 64-bit chunks if you're only interested in 32 of
them (there is an effect on cache hits with vectors, but it's not
measurably worthwhile in practice).

There will be some effect on prefetch, but it depends on the state of
the L1 and L2 caches and the instruction pipeline(s) themselves. Tests
I've seen suggest an increase of memory bandwidth efficiency of only
around 1-2% at best.

If you want a 64-bitter to really earn it's corn, use it in something
like a database server with 64GB of RAM and a multi-TB disk farm. Give
the poor thing something *meaningful* to do with the extra 32 bits.
You'd still need 64-bit software though.

-- 
  Max

Article: 67006
Subject: Re: Need to speed up Stratix compiles.
From: Max <mtj2@btopenworld.com>
Date: Wed, 3 Mar 2004 15:33:03 +0000 (UTC)
Links: << >> << T >> << A >>

On Tue, 2 Mar 2004 20:05:37 -0600, Kenneth Land wrote:

>
>On the disk speed issue I have one data point.  I upgraded my 1GHz PIII-M
>laptop drive from a slow 4200 RPM to the fastest 7200 RPM available (for
>laptops) and my Nios system build went from about 16 min. to about 15 min.
>Not worth the pain and expense of swapping the drive.

Not in a low-spec machine like that, no. The options in a laptop are
limited, and there's no way to increase the disk controller bandwidth.
But the effect on a powerful workstation of installing a RAID with a
high-bandwidth controller and drives such as U-320 SCSI can have a
dramatic impact. As always though, it depends on the application.

>On memory, I upgraded the memory in my 3.2 GHz P4 from 512 to 1GB and there
>was no noticable difference until I set the memory from 333MHz to 400MHz
>dual channel.  Then my system build went from 5 min. to 4 min. - 20%.

That doesn't mean a lot. You only need to add more memory if you're
running out of it ;o) 

-- 
  Max

Article: 67007
Subject: Re: Need to speed up Stratix compiles.
From: Petter Gustad <newsmailcomp5@gustad.com>
Date: 03 Mar 2004 17:05:19 +0100
Links: << >> << T >> << A >>

Max <mtj2@btopenworld.com> writes:

> If you want a 64-bitter to really earn it's corn, use it in something
> like a database server with 64GB of RAM and a multi-TB disk farm. Give

Or running synthesis, place & route, static timing analysis etc. on an
ASIC design requiring 6GB RAM.

Petter
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Article: 67008
Subject: Re: Need to speed up Stratix compiles.
From: rickman <spamgoeshere4@yahoo.com>
Date: Wed, 03 Mar 2004 11:12:06 -0500
Links: << >> << T >> << A >>

"Paul Leventis (at home)" wrote:
> 
> Hi Jim,
> 
> >   I think the OP was refering to the wider datapaths.
> > I don't know the cycle level details of the AMD or Intel 64 bit
> > but an obvious and simple speed gain can come from a wider HW fetch.
> > (even running < 64 bit opcodes ) and then a simple check if the next
> > opcode / next data value is in that block.
> 
> Yes, wider memory interfaces/cache data lines can help, but as you say, this
> is independent of op-code size.  If I recall correctly, AMD and Intel
> processors already fetch 64-bit blocks, but this may have been increased.
> The latest m/b chipsets for both families of processors use dual-channel DDR
> (128-bits wide) and so I would not be surprised if they've increased the
> size of fetches.
> 
> As vendors introduce 64-bit capable processors (such as Opteron), they often
> also enhance various aspects of the CPU architecture in ways that help both
> 32- and 64-bit code.  And while the 64-bitness of x86-64 may not matter much
> for speed, the doubling of the register files etc. could result in faster
> performance.
> 
> It's every computer engineers dream to be a processor architect, isn't it?
> :-)

We can all speculate about the relative merits of processor
enhancements, but these machines are very complex and the only real way
to tell what helps is to try it.  Since we are not all ancient Greeks
philosophizing in our armchairs, it would be a good idea to pick a
design and to run it on a few different workstations, hopefully
including an AMD64.  

I have always been surprised that the FPGA vendors don't put some effort
into evaluating platforms and releasing the results.  I know this can be
a bit of a can of worms, but every time I look at buying a new machine,
the first question I research is how fast it will run the FPGA design
software.  Then I am often trying to speculate on my own since I don't
have much info to go on.  

I seem to recall that there at least used to be some available info on
how much memory was needed to optimize run time as a function of part
size.  But I haven't seen new info on that in quite a while.  

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 67009
Subject: Re: Need to speed up Stratix compiles.
From: "Pete Fraser" <pete@rgb.com>
Date: Wed, 3 Mar 2004 08:25:04 -0800
Links: << >> << T >> << A >>


"rickman" <spamgoeshere4@yahoo.com> wrote in message
news:404603D6.AA20F818@yahoo.com...

> We can all speculate about the relative merits of processor
> enhancements, but these machines are very complex and the only real way
> to tell what helps is to try it.  Since we are not all ancient Greeks
> philosophizing in our armchairs, it would be a good idea to pick a
> design and to run it on a few different workstations, hopefully
> including an AMD64.
>
> I have always been surprised that the FPGA vendors don't put some effort
> into evaluating platforms and releasing the results.

I had assumed that had happened already. Silly me.

Perhaps we'll just buy an AMD machine and see what it does,
but I thought somebody might have tried that already.

Anybody know how solid the Quartus II 4.0 Linux port is?
I can't get an answer out of Altera.

Article: 67010
Subject: Re: frame length, frame addressing ?
From: iokennedy@hotmail.com (Irwin Kennedy)
Date: 3 Mar 2004 08:40:40 -0800
Links: << >> << T >> << A >>

The introduction to the following paper by Li and Hauck might help
with your high level understanding of the configuration architecture:

http://www.ee.washington.edu/people/faculty/hauck/publications/VirtexCompressJ.pdf

There are 48 frames in a column, with the size of a frame dependent on
the number of (CLB) rows in the device. I'm not aware of any
documentation on the mapping between individual resources (e.g. a
LUT's content) and the configuration bitstream. As suggested by the
previous poster, delving into JBits is probably your best option.

Irwin.

Article: 67011
Subject: Re: Need to speed up Stratix compiles.
From: Max <mtj2@btopenworld.com>
Date: Wed, 3 Mar 2004 16:55:38 +0000 (UTC)
Links: << >> << T >> << A >>

On Wed, 03 Mar 2004 04:47:19 GMT, Paul Leventis (at home) wrote:

>Provided the peak memory consumption of Quartus for the compilation in
>question is less than the amount of physical memory in the system,
>increasing the amount of memory will not help compile time.  For non-trivial
>designs, a Quartus compile will be most heavily influenced by CPU speed, and
>then by memory sub-system speed -- disk speed will have little influence.

I suspected that might be the case, but I wasn't quite sure.
I'm more used to programming language tools that use library files
extensively, where a fast disk system (or a big ramdisk) can give very
worthwhile speed gains.

Is there any possibility of making Quartus multi-threaded? That
strikes me as the most likely way to get a dramatic performance
increase, though I know it's not always easy to achieve with heuristic
apps.

>CAD tools process a lot of data.  I don't know if a Xeon (bigger cache) is
>much faster than a normal P4 (smaller cache), but I wouldn't be surprised if
>this were the case for the same reason that a Xeon processor is supposedly
>better for server applications -- bigger cache helps applications whose data
>set doesn't fit into the cache.

While the extra cache is important in itself, much of the performance
gain of the Xeon is also due to the greater degree of parallelism and
deeper prefetch lookahead, thus making better use of memory bandwidth
throughout.

-- 
  Max

Article: 67012
Subject: Re: Need to speed up Stratix compiles.
From: Petter Gustad <newsmailcomp6@gustad.com>
Date: 03 Mar 2004 18:46:34 +0100
Links: << >> << T >> << A >>

Max <mtj2@btopenworld.com> writes:

> Is there any possibility of making Quartus multi-threaded? That
> strikes me as the most likely way to get a dramatic performance
> increase, though I know it's not always easy to achieve with heuristic
> apps.

I would like to get see synthesis and place and route tools I could
run on a cluster of cheap PC's. I would be happy with less than linear
speedups, e.g. using a 16-node cluster to get a 8x speedup.

Petter
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Article: 67013
Subject: Re: Need to speed up Stratix compiles.
From: Max <mtj2@btopenworld.com>
Date: Wed, 3 Mar 2004 19:23:50 +0000 (UTC)
Links: << >> << T >> << A >>

On 03 Mar 2004 18:46:34 +0100, Petter Gustad wrote:

>I would like to get see synthesis and place and route tools I could
>run on a cluster of cheap PC's. I would be happy with less than linear
>speedups, e.g. using a 16-node cluster to get a 8x speedup.

I doubt you'd get anywhere near. Trying to implement those algorithms
efficiently on the sort of loosely-coupled architecture you propose
would be nigh-on impossible. It's not easy on a single SMP box, but
it's doable.

A quad Xeon (8 x CPU) box would cost less than four single decent-spec
machines anyway.

-- 
  Max

Article: 67014
Subject: Re: Need to speed up Stratix compiles.
From: rickman <spamgoeshere4@yahoo.com>
Date: Wed, 03 Mar 2004 15:40:50 -0500
Links: << >> << T >> << A >>

Max wrote:
> 
> On 03 Mar 2004 18:46:34 +0100, Petter Gustad wrote:
> 
> >I would like to get see synthesis and place and route tools I could
> >run on a cluster of cheap PC's. I would be happy with less than linear
> >speedups, e.g. using a 16-node cluster to get a 8x speedup.
> 
> I doubt you'd get anywhere near. Trying to implement those algorithms
> efficiently on the sort of loosely-coupled architecture you propose
> would be nigh-on impossible. It's not easy on a single SMP box, but
> it's doable.
> 
> A quad Xeon (8 x CPU) box would cost less than four single decent-spec
> machines anyway.

Not if the four machines are sitting around all night running screen
savers.

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 67015
Subject: Re: Need to speed up Stratix compiles.
From: Jim Granville <no.spam@designtools.co.nz>
Date: Thu, 04 Mar 2004 10:13:34 +1300
Links: << >> << T >> << A >>

rickman wrote:
> 
> I have always been surprised that the FPGA vendors don't put some effort
> into evaluating platforms and releasing the results.  

Would seem a very good idea.

  On this topic, I see Intel released a new Xeon with 3GHz
and 4MB (!) cache, and they claim 25% faster.
  Of course, you pay - $3692 (Qty column not given ) :)
  The PR claims This is the last release before intel adds 64 bit 
extensions....

Article: 67016
Subject: Re: Need to speed up Stratix compiles.
From: Petter Gustad <newsmailcomp6@gustad.com>
Date: 03 Mar 2004 22:33:03 +0100
Links: << >> << T >> << A >>

Max <mtj2@btopenworld.com> writes:

> On 03 Mar 2004 18:46:34 +0100, Petter Gustad wrote:
> 
> >I would like to get see synthesis and place and route tools I could
> >run on a cluster of cheap PC's. I would be happy with less than linear
> >speedups, e.g. using a 16-node cluster to get a 8x speedup.
> 
> I doubt you'd get anywhere near. Trying to implement those algorithms
> efficiently on the sort of loosely-coupled architecture you propose
> would be nigh-on impossible. It's not easy on a single SMP box, but
> it's doable.

I disagree. Synthesis as well as P&R involve exploring many
alternatives and sort/explore by some underestimate of expense/delay
(typically using a A* search algorithm or similar). This can be done
in parallel. The datasets can be copied to each node and there will be
very little information which has to be exchanged over the
interconnect. Of course there is not much to gain if your P&R takes 1
minute, but for larger designs and/or more accurate wire delay models
(e.g. non-linear delay modeling and physical synthesis) the benefit
will be larger.

This has been implemented in some ASIC tools already. Actually Xilinx
has been doing some very simple parallel processing in ISE (on Solaris
and now Linux) for a long time. Multiple iterations of "par" can run
in parallel on multiple hosts, then you pick the best result. This is
of course, extremely coarse grained compared to what I indicated
above.

Petter
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Article: 67017
Subject: Re: Configuring Altera FLEX10KE using EPC2 device
From: "ron proveniers" <rprovo@xs4all.nl>
Date: Wed, 3 Mar 2004 22:38:20 +0100
Links: << >> << T >> << A >>

HI there.


Did you look at power supply.
During configuration the FPGA draws a lot of power ( can be > 500Ma!!).
If  PSU can not deliver this, you have aconfiguration problem.

success

ron proveniers



"Dimitris Kontodimopoulos" <dkonto@isd.gr> schreef in bericht
news:1609ee5e.0403010148.1b9c61df@posting.google.com...
> Hello there
>
> I'm having serious problems configuring my FPGA using EPC2. We have
> designed the circuit exactly as stated in the Altera datasheet and
> even played around with the pullups and buffering that's recommended.
> To be more specific, We have a board with a FLEX10KE
> (EPF10K200SBC356-1) and a EPC2LC20 for in-system configuration. We
> also have provided for direct Byteblaster configuration using a
> connector (using the same path towards the FPGA and selecting between
> them through enabling/disabling a buffer). Finally, we have a JTAG
> connector by which we can configure the FPGA directly using the SOF
> file generated by Quartus - pls read below.
> Anyway, what we're seeing is:
> The EPC2 gets progammed OK but then the problems start. When I turn
> the system off and on again to initiate configuration the nCONFIG pin
> comes out of reset and so does nSTATUS but only for a very small
> amount of time. During this time DCLK goes enabled and DATA transfers
> configuration data, as normally. Then nSTATUS goes low again and the
> configuration is interrupted as you would expect. There is nothing in
> the circuit that could pull this pin low - it is a point to point
> connection between FPGA and EPC2. It seems however that the EPC2 goes
> back on reset state hence pulling its OE pin low. From that point
> onwards these signals are going crazy, ie they randomly go high or low
> so the FPGA never gets configured. Tried using external pullups whilst
> disabling the internal ones through Quartus, but there was no change.
> One last point is that so far I've been configuring the FPGA through a
> direct JTAG connection using the SOF file - this works fine. Does this
> perhaps confuse the device, ie how does it know whether it should be
> programmed through JTAG or EPC2. Do I need to set something there?
> Finally, I'm using the POF file to program the EPC2 - I'm assuming
> this is correct?? Please give me some feedback because I'm really
> stuck with this. Any tips would be much welcome. Thanks in advance

Article: 67018
Subject: Re: Need to speed up Stratix compiles.
From: Petter Gustad <newsmailcomp6@gustad.com>
Date: 03 Mar 2004 22:58:46 +0100
Links: << >> << T >> << A >>

Max <mtj2@btopenworld.com> writes:

> A quad Xeon (8 x CPU) box would cost less than four single
> decent-spec machines anyway.

My experience is the opposite. I've heard from users in the high
performance computing industry that the most cost efficient systems
are clusters of dual CPU nodes (assuming your application will run
efficiently on a cluster).

A 4 CPU Xeon system like a Dell PowerEdge 6650 with 4x Xeon, 3.0GHz
and 4GB RAM costs $28,070. A single PowerEdge 750 (1U server) with
3.4GHz P4 (higher clock frequency, but smaller cache) with 1GB RAM
costs $3,165.

8 CPU Xeon SMP's (Profusion architecture) are very expensive. A
Proliant 8500 costs $100,000+ if memory serves me right.

Petter
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Article: 67019
Subject: Different Finite Field Multipliers!!!
From: omnipresent@hotmail.com (OP)
Date: 3 Mar 2004 14:05:24 -0800
Links: << >> << T >> << A >>

Hello,

I would like to the different Finite Field Multipliers used for doing
Finite Field Multiplication? Also specifically ones that do in
Polynomial Basis??

and for FPGA Implementation too!!!!

any help regards to references to these design of multipliers and if
possible a reference to the comparison of different multipliers would
greatly appreciated.

Thanx. OP.

Article: 67020
Subject: Re: Need to speed up Stratix compiles.
From: Max <mtj2@btopenworld.com>
Date: Wed, 3 Mar 2004 22:42:54 +0000 (UTC)
Links: << >> << T >> << A >>

On 03 Mar 2004 22:58:46 +0100, Petter Gustad wrote:

>A 4 CPU Xeon system like a Dell PowerEdge 6650 with 4x Xeon, 3.0GHz
>and 4GB RAM costs $28,070. A single PowerEdge 750 (1U server) with
>3.4GHz P4 (higher clock frequency, but smaller cache) with 1GB RAM
>costs $3,165.

The hyperthreaded Xeons run as two processors, so a quad Xeon board
appears to a HT-aware OS as an 8-CPU system.

Why pay for all the extra high-end hardware in a top-end server if you
don't need it? When I was last looking at building systems like this,
about 18 months or so ago, a quad-Xeon mobo from Supermicro was
<$2000, and the processors were around $450 apiece.

-- 
  Max

Article: 67021
Subject: Re: Need to speed up Stratix compiles.
From: Petter Gustad <newsmailcomp6@gustad.com>
Date: 04 Mar 2004 00:30:13 +0100
Links: << >> << T >> << A >>

Max <mtj2@btopenworld.com> writes:

> On 03 Mar 2004 22:58:46 +0100, Petter Gustad wrote:
> 
> >A 4 CPU Xeon system like a Dell PowerEdge 6650 with 4x Xeon, 3.0GHz
> >and 4GB RAM costs $28,070. A single PowerEdge 750 (1U server) with
> >3.4GHz P4 (higher clock frequency, but smaller cache) with 1GB RAM
> >costs $3,165.
> 
> The hyperthreaded Xeons run as two processors, so a quad Xeon board
> appears to a HT-aware OS as an 8-CPU system.

Then you would call a system with single P4 with HyperThreading a dual
processor system as well then? This would be a little "unfair" when
comparing to a full dual-core CPU like the rumored UltraSparc-IV.

> Why pay for all the extra high-end hardware in a top-end server if you
> don't need it? When I was last looking at building systems like this,

My point was that you usually get lots of extra high-end hardware when
you buy large SMP systems, especially when you need to go beyond
4-way. Also, it's usually cheaper to get 4x4GB RAM rather than 16GB
RAM for a single MB (unless you have a large enough number of DIMM
slots).

> about 18 months or so ago, a quad-Xeon mobo from Supermicro was
> <$2000, and the processors were around $450 apiece.

This is pretty good, I was not aware of the low cost of the Supermicro
MB. You would end up at close to $4000, e.g. in the same ballpark as
buying 4 P4 systems. So if the application was performing better on
the SMP than on the to the cluster I would definitely go with the SMP.

Petter
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Article: 67022
Subject: Re: Xilinx Spartan 3 configuration
From: Chen Wei Tseng <chenwei.tseng@xilinx.com>
Date: Wed, 03 Mar 2004 17:28:29 -0700
Links: << >> << T >> << A >>

Rick,

abort happens asynchronously (doesn't need to see CCLK rising edge for 
it to take place). And to avoid this, CS must be deasserted first.

The data pins will be asynchrounously driven by the FPGA with the 
WRITE_B changes to high, but the status words (cfgerr_b, dalign, rip, 
in_abort_b, 4'b1111) is clocked out to data pin [7:0] by CCLK

Regards, Wei

rickman wrote:
> I am looking at the data sheet for the Spartan 3 parts trying to figure
> out how to configure them.  It seems like it is the same as most of the
> other families, but there is one note that I don't completely
> understand.  Section 3, page 12, has the following text...
> 
> Figure 5: Waveforms for Master and Slave Parallel Configuration
> Notes:
> 1. In a given CCLK cycle, when RDWR_B transitions High or Low while
> holding CS_B Low, the next rising edge on the CCLK pin will abort
> configuration.
> 
> 
> 
> This is not exactly the same as XAPP176 describing the Spartan II
> configuration, page 14...  
> 
> While CS is High, the Slave Parallel interface does not expect any data
> and ignores all CCLK transitions.  However, WRITE must continue to be
> asserted while CS is asserted. If WRITE is High during a positive CCLK
> transition while CS is asserted, the FPGA aborts the operation.
> 
> 
> 
> In the first case it sounds as if the abort condition is created by CS-
> being low and an edge on the RDWR- signal followed by a rising edge on
> CCLK (without making it clear if this also has to be during CS- low).  
> 
> In the second case, it is just the state of the two signals, sampled at
> the rising edge of CCLK which will create an abort.  
> 
> If I am trying to use the CS- WR- and IO signals from an MCU to control
> this, the difference between these two descriptions is significant.  Am
> I making this more difficult than it is?  Can I connect the signals as
> shown below and make this work ok?  
> 
> MCU   FPGA        Write          NO
> ---   ----         Byte         Write
> CS-   RD_WR-  ----_______------______---
> WR-   CCLK    -----_____--------____----
> IO    CS-     -_____________------------
> 
> 
> The other thing I am not clear about is how to use these same signals
> after configuration.  It looks like I have to set "persist" to off if I
> want to put these signals on the MCU bus after config in order to have a
> bus interface to the chip.  But if I want to perform partial
> reconfiguration, I think I have to have "persist" set to on, no?  Does
> this mean I will have to double up on all these signals, one for
> (re)configuration and one for operation?  
> 
> I seem to recall that the Lucent chips allowed you to use the MCU
> interface after configuration.  Do the Xilinx chips have that as well?  
>

Article: 67023
Subject: Re: XST ff merging - how do I "preserve" flip flops
From: Allan Herriman <allan.herriman.hates.spam@ctam.com.au.invalid>
Date: Thu, 04 Mar 2004 12:01:26 +1100
Links: << >> << T >> << A >>

On Tue, 2 Mar 2004 13:34:26 +0800, "Peng Cong" <pc_dragon@sohu.com>
wrote:

>Why you need follow 2 attribute?
>// synthesis attribute keep of e is "true"
>// synthesis attribute keep of f is "true"
>e and f are not flip-flop.
>
>Remove them should be OK

Remove them and watch XST merge the two flip flops.  (I know; I tried
it.)

I suggest you read the XST documentation.

Regards,
Allan.

Article: 67024
Subject: Re: Need to speed up Stratix compiles.
From: "Kenneth Land" <kland1@neuralog1.com1>
Date: Wed, 3 Mar 2004 19:11:37 -0600
Links: << >> << T >> << A >>

I was just trying to be helpful by sharing my experience.
We're only interested in speeding up Quartus builds in this thread and some
have been suggesting more memory (32 GB in some instances) and faster
drives.  I've done both in two different machines and the biggest
improvement came from tweaking the memory subsystem, not adding more memory
above 512MB or a faster drive.

The 7200 RPM drive is very much faster as can be seen with much much faster
boot times.  Didn't mean much on Quartus builds though.  Seems Quartus needs
(for my Nios system) a fast CPU with at least several hundred MB's of
tweaked memory.

I write not slow image processing algorithms and use as many wires as the
system can provide.  If its an 8 bit cpu then I use 8 bit optimizations, if
its 32 bit then 32 bit optimizations.  Haven't tried 64 bit yet, but I plan
too.  Can't imagine any developer worth their salt that wouldn't.

Ken

"Max" <mtj2@btopenworld.com> wrote in message
news:5vtb40lq1kmtcfqefbhdr69ei29kpq6h60@4ax.com...
> On Tue, 2 Mar 2004 20:05:37 -0600, Kenneth Land wrote:
>
> >
> >On the disk speed issue I have one data point.  I upgraded my 1GHz PIII-M
> >laptop drive from a slow 4200 RPM to the fastest 7200 RPM available (for
> >laptops) and my Nios system build went from about 16 min. to about 15
min.
> >Not worth the pain and expense of swapping the drive.
>
> Not in a low-spec machine like that, no. The options in a laptop are
> limited, and there's no way to increase the disk controller bandwidth.
> But the effect on a powerful workstation of installing a RAID with a
> high-bandwidth controller and drives such as U-320 SCSI can have a
> dramatic impact. As always though, it depends on the application.
>
> >On memory, I upgraded the memory in my 3.2 GHz P4 from 512 to 1GB and
there
> >was no noticable difference until I set the memory from 333MHz to 400MHz
> >dual channel.  Then my system build went from 5 min. to 4 min. - 20%.
>
> That doesn't mean a lot. You only need to add more memory if you're
> running out of it ;o)
>
> -- 
>   Max

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search