Messages from 53775

Article: 53775
Subject: Re: how do implement the algorithm in verilog?
From: johnjakson@yahoo.com (john jakson)
Date: 22 Mar 2003 08:45:38 -0800
Links: << >> << T >> << A >>

chen.songwei@mail.zte.com.cn (Apollo) wrote in message news:<913ddf38.0303212038.77b299ac@posting.google.com>...
> 1>(1-0.001953125)
> 2>(1-0.001953125)^6
> 3>(1-0.001953125)^1/2

Start off by going to school and learn to use English to express yourself.

Article: 53776
Subject: Re: Altera FLEX10K100E voltage?
From: "Pete Ormsby" <faepeteDELETETHIS@attbi.com>
Date: Sat, 22 Mar 2003 18:38:15 GMT
Links: << >> << T >> << A >>

Shridhar,

The FLEX 10KE devices are 2.5V core voltage with I/O at either 2.5V or 3.3V.
When running I/O at 3.3V, the inputs are 5V tolerant.

-Pete-

Shridhar Patil <patilshridhar@indiatimes.com> wrote in message
news:7ca82cdb.0303212322.492facaa@posting.google.com...
> hi everybody,
>
> please can anybody tell me whether Altera FLEX10K100E is only a 3.0V
device?
> i am bit confused as somewhere i have read that it's a multivoltage
device.
>
> will it work on 5.0V?
>
> regards,
> shridhar.

Article: 53777
Subject: Re: how to implement the bidir in Altera AHDL?
From: Muzaffer Kal <kal@dspia.com>
Date: Sat, 22 Mar 2003 20:19:05 GMT
Links: << >> << T >> << A >>

On Sat, 22 Mar 2003 23:32:08 +0800, "Jian Ju" <01901790r@polyu.edu.hk>
wrote:

>Hi, all
>
>I defined a bidir port d[7..0] to send commands and receive data in Maxplus
>II. However, it does not work, the help messages tell me that I should use
>tri to feed the bidir. I added tri (buffer[7..0] : tri) before d[] with
>buffer[].oe=vcc and d[]=buffer[].out, but problems are still exist.
>
>Any suggestions? Thank you.
>
>Jian
>

You problem is with "buffer[].oe=vcc". If you want bi-directional
communication, you can't tie off the OE. You have to decide when you
are receiving and disable OE and turn it on when you're transmitting.
So you need another signal to control the OE. Also don't forget to
have a dead-cycle between receive and transmit states in your protocol
so that there is no time when both sides could be driving the bus.

Muzaffer Kal

http://www.dspia.com
ASIC/FPGA design/verification consulting specializing in DSP algorithm implementations

Article: 53778
Subject: Re: FPGA FFT Questions
From: "Glen Herrmannsfeldt" <gah@ugcs.caltech.edu>
Date: Sat, 22 Mar 2003 21:05:44 GMT
Links: << >> << T >> << A >>

> Glen Herrmannsfeldt wrote:
>
> > FFT is usually done in floating point, though that isn't required.   We
> > would need to know more about the real goal to answer the question.   If
the
> > goal is to speed up a program that uses an FFT subroutine then it would
need
> > to be done in a similar data representation.  Both floating point
addition
> > and multiplication are hard to do fast in FPGA's, and division is even
> > worse.
> >

> "Ray Andraka" <ray@andraka.com> wrote in message
> news:3E7BD24E.59887520@andraka.com...
> That may be true for software implementations of the FFT, but not
necessarily
> for hardware. Depending on the size of the FFT, floating point may not buy
> anything.   For smaller FFT's it is much more economical to work with
wider
> fixed point to get the needed dynamic range than it is to do floating
point.
> For larger FFT's, the FFT is generally accomplished using small FFTs
combined
> using the mixed radix algorithm.  In these cases, it often makes sense to
do the
> small FFTs in fixed point and then normalize and adjust the exponent
between
> passes.  Often a block floating point scheme is sufficient, in which case
the
> common part of the exponent is stripped off before denormalizing the data
before
> each path.  That common part of the exponent is then used to scale the
final
> result.

I agree that for specific cases fixed point, even many more bits than most
often used in software, would be best.

I was considering the case of a hardware coprocessor, maybe on a PCI card,
that would accelerate the FFT part of an existing program.  A Fortran or C
callable routine would be written that would replace the normal call, pass
the data to the hardware, and then return the results.  One would expect
very similar results to the software implementation, and so would likely
require a similar representation.   The OP didn't state the requirement very
well, so it is hard to know which would be better.

-- glen

Article: 53779
Subject: Re: how do implement the algorithm in verilog?
From: Ray Andraka <ray@andraka.com>
Date: Sat, 22 Mar 2003 21:25:30 GMT
Links: << >> << T >> << A >>

Thanks Jon.  I was going to suggest he read this  on how to ask a question on the newsgroup if you want an answer:
http://users.erols.com/jyavins/procfaq.htm

For the bit of info he's presented here, it could just as well be a 3 entry table lookup.

john jakson wrote:

> chen.songwei@mail.zte.com.cn (Apollo) wrote in message news:<913ddf38.0303212038.77b299ac@posting.google.com>...
> > 1>(1-0.001953125)
> > 2>(1-0.001953125)^6
> > 3>(1-0.001953125)^1/2
>
> Start off by going to school and learn to use English to express yourself.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 53780
Subject: Re: FPGA FFT Questions
From: Ray Andraka <ray@andraka.com>
Date: Sat, 22 Mar 2003 21:28:26 GMT
Links: << >> << T >> << A >>

Even in that case it would probably be desirable to at least move away from IEEE
floats inside the co processor, as they are a pain to deal with in hardware.
But you are correct, without knowing the bounds of the problem it is very
difficult to recommend an optimal course of action.

Glen Herrmannsfeldt wrote:

> > Glen Herrmannsfeldt wrote:
> >
> > > FFT is usually done in floating point, though that isn't required.   We
> > > would need to know more about the real goal to answer the question.   If
> the
> > > goal is to speed up a program that uses an FFT subroutine then it would
> need
> > > to be done in a similar data representation.  Both floating point
> addition
> > > and multiplication are hard to do fast in FPGA's, and division is even
> > > worse.
> > >
>
> > "Ray Andraka" <ray@andraka.com> wrote in message
> > news:3E7BD24E.59887520@andraka.com...
> > That may be true for software implementations of the FFT, but not
> necessarily
> > for hardware. Depending on the size of the FFT, floating point may not buy
> > anything.   For smaller FFT's it is much more economical to work with
> wider
> > fixed point to get the needed dynamic range than it is to do floating
> point.
> > For larger FFT's, the FFT is generally accomplished using small FFTs
> combined
> > using the mixed radix algorithm.  In these cases, it often makes sense to
> do the
> > small FFTs in fixed point and then normalize and adjust the exponent
> between
> > passes.  Often a block floating point scheme is sufficient, in which case
> the
> > common part of the exponent is stripped off before denormalizing the data
> before
> > each path.  That common part of the exponent is then used to scale the
> final
> > result.
>
> I agree that for specific cases fixed point, even many more bits than most
> often used in software, would be best.
>
> I was considering the case of a hardware coprocessor, maybe on a PCI card,
> that would accelerate the FFT part of an existing program.  A Fortran or C
> callable routine would be written that would replace the normal call, pass
> the data to the hardware, and then return the results.  One would expect
> very similar results to the software implementation, and so would likely
> require a similar representation.   The OP didn't state the requirement very
> well, so it is hard to know which would be better.
>
> -- glen

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 53781
Subject: synthesizability question
From: "Jimmy Zhang" <crackeur@attbi.com>
Date: Sat, 22 Mar 2003 21:52:13 GMT
Links: << >> << T >> << A >>

Hi,

  I have a simple question regarding to the synthesizability of the
following expression
     Data[index+1]  ?

 I was under the impression that the expression is not synthesizable, but
somehow I was told that there is
a switch I can flip so FPGA compiler can synthesize it into gates.

  Can someone explain how FPGA compiler does so?
Thanks,
Jimmy

Article: 53782
Subject: Re: Xpower problems - can't load vcd
From: Mike Treseler <tres@fluke.com>
Date: Sat, 22 Mar 2003 17:25:08 -0800
Links: << >> << T >> << A >>

David wrote:

> I'm using Xilinx ISE webpack 5.1i with modelsim XE starter and I'm trying to
> use Xpower. I first synthesise my design and do the place and route process.
> Then I click on my testbench .vhd file and run the 'post place and route
> simulation '. I chose the option to 'automatically generate vcd file'.
> Modelsim starts and the simulation begins. However, it takes forever to
> simulatea small amount of time. I think this is due because my design
> exceeds the Modelsim starter maximum gate count. Anyway, after a day or so,
> I have a small vcd file (100k).

Consider running modelsim on your testbench instancing your hdl code
instead of the netlist. This will run ten times faster and you
don't have to run place+route to debug each logical error.
Once the logical errors are gone, and your testbench coverage
is high enough, you might have more luck with "xpower".

           -- Mike Treseler

Article: 53783
Subject: Re: how do implement the algorithm in verilog?
From: chensw@dacafe.com (Apollo)
Date: 22 Mar 2003 18:29:39 -0800
Links: << >> << T >> << A >>

sorry!,sir:

in fact,it is:
1>(1-2^-n)
2>(1-2^-n)^6
3>(1-2^-n)^1/3

Article: 53784
Subject: Re: Leonardo Spectrum: Synthesis without optimization
From: TigerMole <none@nowhere.de>
Date: Sun, 23 Mar 2003 03:47:07 +0100
Links: << >> << T >> << A >>


attribute PRESERVE_SIGNAL: boolean;
attribute PRESERVE_SIGNAL of osc_res: signal is TRUE; 
attribute OPT : string;
attribute OPT OF osc_res: signal is "KEEP";

works with lattice CPLDs ....

On Fri, 21 Mar 2003 13:53:47 -0800, "Eduardo Wenzel Brião"
<briao@inf.pucrs.br> wrote:

>Hi
>I need to synthesis a HDL Design, but some components are removed and
>optimized when Leonardo Spectrum are running. How do I proceed (attributes
>in VHDL or Leonardo commands) to running Leonardo Spectrum without optimization?
>Eduardo Wenzel Brião
>Catholic University of Rio Grande do Sul State - PUCRS
>Porto Alegre city 
>Brazil

Article: 53785
Subject: Need Advice on using Orcad 9.2 with Xilinx ISE 4.2
From: "Kang Liat Chuan" <kanglc@starhub.net.sg>
Date: Sun, 23 Mar 2003 11:51:14 +0800
Links: << >> << T >> << A >>

Dear experts,

I have a problem when with a legacy project done using Orcad 7 and
Foundation 3.1(?). I have no previous experience with Orcad. My schematic
entry experience was using Mentor Graphics DA back in 1998.

I open (and save) the schematics with Orcad 9.2, generate the top level edif
file,
and read into Xilinx ISE 4.2 (target Spartan). Some of the libraries (ngo)
files are missing, so I created them using logiblox. However, when I
translate (ngdbuild) the design, it complained of pin mismatch/not found.
Apparently, the Orcad parts have pins named different from those generated
by logiblox.

I had to change the Orcad parts' pin names and regerate the top level edif
to get by this error. My question is: is there a CAE library provided by
Xilinx for Orcad schematic entry? If so, I could replace all the Xilinx
parts in the schematic together, instead of changing part by part.

Thanks for your advice in advance.

Regards,
LC
liat-chuan_kang@agilent.com

Article: 53786
Subject: Re: Using FPGAs as coprocessors in a PC - findings
From: roy hansen <royhansen@removethis_norway.online.no>
Date: Sun, 23 Mar 2003 12:43:10 +0100
Links: << >> << T >> << A >>


Thanks for the response. 

To sum up my understanding of what you say about using 
FPGAs as coprocessors on PCI cards for my particular problem:

1) Fixed point math is the way to go considering speed and complexity.
   This is probably not any practical limitation for me - I will try 
   it on my C-code.

2) The PCI bus is a fundamental bottle-neck in the system. In order to 
   see any speed increase, I will have to process each data point into
   at least 50 beams. This is also not a problem - a typical large image
   contains on the order of 1000 or more beams. 

Given that we choose fixed point and avoid the PCI-bus problem, there
is a clear possibility for speed increase. 

Michael S questioned - do I really want FPGAs for this kind of work, 
and not DSPs or SIMD CPUs (SMP/NUMA). True. What I want is a machine 
that is ~10 times faster than a single CPU 3GHz P4. My program takes 
days in matlab and hours in optimized C (called from Matlab), but I want
it to run in minutes. My choices/understanding up to now have been:

1) Buy a SMP machine with enough CPUs (i.e. an Itanium 2 with 8 cpus)
   This is (at least up to now) a very expensive choice when it comes to
   price for the machine. However, it is very simple to get almost full 
   speed out of the machine using multithreading (openMP).
2) Buy a stack of 1U 2CPU Xeons (i.e. 4 x 2 3GHz Xeons). This is 
   one order of magnitude cheaper than choice 1 - and not particularly
   slower, but slightly more complicated to program, since each machine has
   its own operating system. 
3) Buy a PCI card with a stack of DSPs and put it into a 2CPU Xeon. 
   My understanding (please correct me if I am wrong) has been that there 
   is not much to get from a TI C67 compared to a 3GHz P4 - I'll probably
   look into this one again. 
4) Buy a PCI card with an "optimal" mix of DSPs, FPGAs and I/O-modules. 

Using FPGAs is still a good candidate, so I will pursue this.  


-Roy

Article: 53787
Subject: Re: FPGA specs
From: "Andre Powell" <andre.powell@ntlworld.com>
Date: Sun, 23 Mar 2003 18:30:16 -0000
Links: << >> << T >> << A >>

The difference is Engineering and Marketting

"Graeme" <graeme@spamoff.fsnet.co.uk> wrote in message
news:3e78dfa1$0$59846$65c69314@mercury.nildram.net...
> "geeko" <jibin@ushustech.com> wrote in message
> news:b59525$26k0i5$1@ID-159027.news.dfncis.de...
> > hi all
> > What is the difference between he Typical gates and Maximum System gates
> > specifications
> > For Altera EPXA1 these ratings are 100K and 263K respectively  what may
be
> > the available gates for programming
> > regards
> > geeko
> >
>
> The other groups that you've cross-posted to should certainly have the
info
> you need. As would Altera if you ask them!
>
>
>

Article: 53788
Subject: Re: Using FPGAs as coprocessors in a PC - findings
From: johnjakson@yahoo.com (john jakson)
Date: 23 Mar 2003 10:31:59 -0800
Links: << >> << T >> << A >>

roy hansen <royhansen@removethis_norway.online.no> wrote in message news:<s5hfa.43818$Rc7.645391@news2.e.nsc.no>...
> Thanks for the response. 
> 
.
.
.
.
> 
> Using FPGAs is still a good candidate, so I will pursue this.  
> 
> 
> -Roy

There is one other solution possible but it seems highly undeveloped
as yet.

Clearspeed, PicoChip and no doubt several others are offering massive
cpu cnt chips. In PicoChips case 460x160MHz which if ever made
available on a general purpose board with massive IO support might
suggest 24x faster than 3GHz P4 on simple integer freq comparison.

However they are all chasing base station solutions for now so this is
for the future!

BOPs also comes to mind, but I just found out they ceased operations,
a common fate for massive par cpu companies. Maybe XScale bares
looking at, can't think of any others.

So FPGAs look like your best bet.

JJ

Article: 53789
Subject: Re: how do implement the algorithm in verilog?
From: johnjakson@yahoo.com (john jakson)
Date: 23 Mar 2003 12:38:08 -0800
Links: << >> << T >> << A >>

chensw@dacafe.com (Apollo) wrote in message news:<cd10137b.0303221829.4cb9e0c2@posting.google.com>...
> sorry!,sir:
> 
> in fact,it is:
> 1>(1-2^-n)
> 2>(1-2^-n)^6
> 3>(1-2^-n)^1/3

So 0.001953125 was 1/512
So 1-0.001953125 is 0.1111,1111,1000

Well 2^-n is just a binary decoder or in Verilog 1.00000 >> n, 
And 1-(2^-n) is a mask generator for n 1's below binary pt
So n=0,1,2,3,4... => 0.0, ,0.10, 0.110, 0.1110, 0.11110,...

The ^ operator in Verilog is similar to C, an xor operator.
Verilog is not good for Fortran style math with exponents and general
operations on fractions and requires basic understanding of binary
fraction math.

^ on powers of 2 is easy as that can be << or >> for unsigned values
^ on other numbers esp fractions requires real study or some design
effort

.999 ^ 6 only makes for a slightly smaller no so the 2> relation can
only be more true
.999 ^ 1/2 can only make it closer to 1 so still less than 3

raising any fractional value to any power of +ve j will still leave a
value <1

Sounds like a nonsense assignment, trick question.

You should ask real Verilog questions in 
comp.lang.verilog


I wish student email address domains would reflect their true
background or just say upfront I have a question for my homework that
I am too lazy to figure out myself.

JJ

Article: 53790
Subject: Re: Xpower problems - can't load vcd
From: gretzteam@hotmail.com (David)
Date: 23 Mar 2003 13:00:55 -0800
Links: << >> << T >> << A >>

This is what I did first. My design is working and ready to be
programmed. I'd like to know how much power it will require so I have
to run a place+route simulation in order to import the .vcd into
'Xpower' (or else I won't have all my nodes in the vcd). I guess the
error message I get is due to a too small testbench coverage. I'd like
to be sure of this before acquiring a full version of modelsim.

Thanks
David

Mike Treseler <tres@fluke.com> wrote in message 
> Consider running modelsim on your testbench instancing your hdl code
> instead of the netlist. This will run ten times faster and you
> don't have to run place+route to debug each logical error.
> Once the logical errors are gone, and your testbench coverage
> is high enough, you might have more luck with "xpower".
> 
>            -- Mike Treseler

Article: 53791
Subject: Can ModelSim PE/SE and XE coexist?
From: kenm@morro.co.uk (Ken Morrow)
Date: 23 Mar 2003 13:55:07 -0800
Links: << >> << T >> << A >>

I am currently using ModelSim 5.6e PE on a PC licensed from a dongle.

Unfortunately, this will only allow one instance of ModelSim to simulate
at a time. This is currently proving a problem as I am running some big sims.
I cannot do any other sims on small blocks for the few hours that my big sims
are running.

A year or so ago I tried using ModelSim XE along with PE on the same PC,
but never managed to get them both to work at the same time.

Has anyone tried this recently with any sucess?

Many Thanks,

Ken Morrow,
Morrow Electronics Limited.

Article: 53792
Subject: Re: Using FPGAs as coprocessors in a PC - findings
From: already5chosen@yahoo.com (Michael S)
Date: 23 Mar 2003 17:31:41 -0800
Links: << >> << T >> << A >>

How many units do you plan to produce ? What is target time to market
? Sometimes it sounds like you just want to make fun...

BTW, C67 is a wrong DSP to compare with FPGA. Compare apples with
apples (fix-point FPGA with fix-point DSP). C64 is a DSP to compare.
x8 600MHz C64 boards are not uncommon.

Article: 53793
Subject: Rake receiver IP core
From: prashantj@usa.net (Prashant)
Date: 23 Mar 2003 18:05:43 -0800
Links: << >> << T >> << A >>

Hi,
Does anyone have a pointer to a website/company which offers CDMA 2000
system IP cores ? I'm specifically looking for a rake receiver IP
core. I have been looking around on the web and have not been very
successful as yet. But I do remember seeing a few of these available
some time back. Just can't find those sites anymore.

Thanks,
Prashant

Article: 53794
Subject: Chipscope pro Tools
From: rathanon99@yahoo.com (ron)
Date: 23 Mar 2003 19:20:33 -0800
Links: << >> << T >> << A >>

Hello!

I'm planning to but Chipscope Pro Tools to expedite my debugging
process. I just would like to ask what hardware tools (cables, etc) do
I need to buy to be able to use Chipscope effectively. Please
recommend the hardware tools you have used with your chipscope and if
you know the current price (and vendor), please let me know. Thank
you.

-ron

Article: 53795
Subject: quartus II error/warning: cannot generate netlist output file -- unsupported port type (std_logic_2d)
From: kommandantklink@hotmail.com (Wilhelm Klink)
Date: 23 Mar 2003 21:16:08 -0800
Links: << >> << T >> << A >>

Has anyone experienced this problem?
What would be the best way to fix this apart from the undesirable
approach of converting my std_logic_2d port to a group of
std_logic_vectors?

Article: 53796
Subject: Re: Chipscope pro Tools
From: "Neeraj Varma" <neeraj@cg-coreel.com>
Date: Mon, 24 Mar 2003 11:11:19 +0530
Links: << >> << T >> << A >>

Hi Ron - You should go for Chipscope Pro + Parallel Cable IV

If you already have the ChipScope Pro, you should buy PC-IV for $95 (from
Xilinx, available online or through your local distributor).

Chipscope Pro makes use of the available BlockRAM in the FPGA for storing
the samples. However, the BRAM size in any FPGA would be limited. If you
want to achieve deep sample storage for some very complex design debugging,
and keeping your BRAMs free for your application, Agilent Trace Port
analyzer allows you to do that. AFAIR Agilent Trace port analyzer costs
~$7K, and works with ChipScope Pro. A link to this Agilent tool is availble
from http://www.xilinx.com/ise/verification/chipscope_pro.htm

--Neeraj







"ron" <rathanon99@yahoo.com> wrote in message
news:c661162.0303231920.aaff439@posting.google.com...
> Hello!
>
> I'm planning to but Chipscope Pro Tools to expedite my debugging
> process. I just would like to ask what hardware tools (cables, etc) do
> I need to buy to be able to use Chipscope effectively. Please
> recommend the hardware tools you have used with your chipscope and if
> you know the current price (and vendor), please let me know. Thank
> you.
>
> -ron

Article: 53797
Subject: Re: synthesizability question
From: assaf_sarfati@yahoo.com (Assaf Sarfati)
Date: 23 Mar 2003 22:16:28 -0800
Links: << >> << T >> << A >>

"Jimmy Zhang" <crackeur@attbi.com> wrote in message news:<hW4fa.204098$qi4.98083@rwcrnsc54>...
> Hi,
> 
>   I have a simple question regarding to the synthesizability of the
> following expression
>      Data[index+1]  ?
> 
>  I was under the impression that the expression is not synthesizable, but
> somehow I was told that there is
> a switch I can flip so FPGA compiler can synthesize it into gates.
> 
>   Can someone explain how FPGA compiler does so?
> Thanks,
> Jimmy

The logic is as follows:

assuming that index is some sort of register, the synthesizer will
create an incrementer from it; the incrementer output will be used to
drive the Data multiplexer's Select inputs.

If index is a counter, the incrementer may also be fed back to the
register.

If index is wide or the data multiplexer is big, the combinatorial
path may be pretty slow; if possible, add a register between (index +
1) and the Data multiplexer.

Article: 53798
Subject: Difference between static and active partial reconfiguration of Xilinx
From: Rainer Schmidt <rainersc@hni.upb.de>
Date: Mon, 24 Mar 2003 12:25:47 +0100
Links: << >> << T >> << A >>

Hi,

What is the difference between static and active partial 
reconfiguration? Xilinx application notes say the remaining part of the 
design does still work in active reconfiguration mode. But what happens 
to the remaining part in static reconfiguration mode? Is the global 
clock stopped? Are the IOBs disabled during reconfiguration? Anything else?

Thanks,
Rainer

Article: 53799
Subject: Re: Using FPGAs as coprocessors in a PC - findings
From: Roger Larsson <roger.larsson@norran.net>
Date: Mon, 24 Mar 2003 14:11:52 GMT
Links: << >> << T >> << A >>

roy hansen wrote:

> 
> Thanks for the response.
> 
> To sum up my understanding of what you say about using
> FPGAs as coprocessors on PCI cards for my particular problem:
> 
> 1) Fixed point math is the way to go considering speed and complexity.
>    This is probably not any practical limitation for me - I will try
>    it on my C-code.
> 
> 2) The PCI bus is a fundamental bottle-neck in the system. In order to
>    see any speed increase, I will have to process each data point into
>    at least 50 beams. This is also not a problem - a typical large image
>    contains on the order of 1000 or more beams.
> 
> Given that we choose fixed point and avoid the PCI-bus problem, there
> is a clear possibility for speed increase.
> 
> Michael S questioned - do I really want FPGAs for this kind of work,
> and not DSPs or SIMD CPUs (SMP/NUMA). True. What I want is a machine
> that is ~10 times faster than a single CPU 3GHz P4. My program takes
> days in matlab and hours in optimized C (called from Matlab), but I want
> it to run in minutes. My choices/understanding up to now have been:
> 
> 1) Buy a SMP machine with enough CPUs (i.e. an Itanium 2 with 8 cpus)
>    This is (at least up to now) a very expensive choice when it comes to
>    price for the machine. However, it is very simple to get almost full
>    speed out of the machine using multithreading (openMP).
> 2) Buy a stack of 1U 2CPU Xeons (i.e. 4 x 2 3GHz Xeons). This is
>    one order of magnitude cheaper than choice 1 - and not particularly
>    slower, but slightly more complicated to program, since each machine
>    has its own operating system.

There are solutions to this problem.

http://www.lanl.gov/projects/pink/
"BProc itself is a set of kernel modifications to Linux that provide a
single process space across the entire cluster. Jobs running on nodes of
the cluster are visible (via ps and the like) on the master and they are
also controlable (via standard UNIX signals) from the master. "

> 3) Buy a PCI card with a stack of DSPs and put it into a 2CPU Xeon.
>    My understanding (please correct me if I am wrong) has been that there
>    is not much to get from a TI C67 compared to a 3GHz P4 - I'll probably
>    look into this one again.
> 4) Buy a PCI card with an "optimal" mix of DSPs, FPGAs and I/O-modules.
> 
> Using FPGAs is still a good candidate, so I will pursue this.
> 
> 

Why not FPGAs with C interface? (from information in Swedish)
* Own C dialect - need to modify your source (parallelize?)
* Compiles to FPGA (Xilinx).
* User never sees FPGA.

http://www.flowcomputing.com/technology.shtml

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search