Messages from 91675

Article: 91675
Subject: Re: Is this even true???
From: air_bits@yahoo.com
Date: 10 Nov 2005 15:05:28 -0800
Links: << >> << T >> << A >>

I have done successful designs which laid memory out in the application
to avoid having hardware refresh. One such machine just laid the
instruction and data fetches already executed by the clock/timer
routine
out in a straight line. To make this work, you must put low order
address lines on the row portion of the address multiplexor.  With
memories that have a burst mode this may not always be the highest
performance option. CPU's with caches also can cause problems.
But in general for small cpu's, this isn't a problem.

You may find that the client is willing to take a slightly slower
memory,
with slightly higher software service latencies, and not accept this
tradeoff. Just depends how cost sensitive the design is.

If the cost of slightly larger and faster fpga isn't a budget stopper,
it's
probably best not to do this, as it can cause other problems if not
careful .... like memory randomly disappearing because some software
bug occured.

Article: 91676
Subject: Re: Is this even true???
From: "Kryten" <kryten_droid_obfusticator@ntlworld.com>
Date: Thu, 10 Nov 2005 23:10:42 GMT
Links: << >> << T >> << A >>

"Subhasri Krishnan" <subhasri.krishnan@gmail.com> wrote

> if I read faster than I need to refresh, then I can avoid
> refresh altogether. i.e if the refresh period is 64ms and if i access
> the data every, say, 20ms then I don't have to refresh. Please tell me
> if this is true or if I am getting confused.

Asserting RAS causes a row of capacitors to have their charge topped up.

If they are above the voltage sense threshold then the have at least some 
charge, and they are given a full charge.

Asserting CAS causes one capacitor to be connected to a column line, and 
this either drives charge in/out for writing, or senses it for a read.

Capacitors are not completely discharged when read, of course.

From the points above you can deduce what is going to happen and what needs 
to be done.

So long as every row gets strobed at least once every 64 ms, every capacitor 
is refreshed. It does not matter if this is done by a refresh cycle (RAS 
only), or a read/write cycles (RAS then CAS).

The original IBM PC had an interrupt routine to do a series of DRAM accesses 
to refresh the DRAM. It had no DRAM controller at all.

If you can arrange your system software so that every row is accessed every 
refresh period, that should do the trick.

If you are doing a non-PC embedded system, the CPU may be running code from 
ROM most of the time. You could try refreshing the rows with those cycles: 
i.e. the DRAM gets a RAS during DRAM _and_ ROM cycles, but CAS only for RAM 
access. The row address will be whatever the CPU address bus is driven to, 
so obviously you have to make the ROM cycles cover every row. For this 
reason, it is easier to do if you use the least-significant address bits for 
the row address.

Note that the number of accesses needed is the square root of the DRAM chip 
size. I don't think refreshing 64K DRAM chips is too bad (256 accesses), but 
you might not like doing a 16Mbit DRAM chip (2048 accesses).

Article: 91677
Subject: Re: Is this even true???
From: air_bits@yahoo.com
Date: 10 Nov 2005 15:23:01 -0800
Links: << >> << T >> << A >>

should be aware every dram is different in the number of rows that must
be
accessed and the max period between accesses/refersh.

Article: 91678
Subject: Re: Is this even true???
From: "Subhasri krishnan" <subhasri.krishnan@gmail.com>
Date: 10 Nov 2005 15:30:13 -0800
Links: << >> << T >> << A >>

So if its a 64Mb chip (4096 accesses and 64mx between accesses) and I
can do some kind of a serial reading then its better to skip the
refresh? I am looking to push the SDRAM to the limit and to get highest
bandwidth. anything other than bank interleaving and getting rid of
refresh that can be done to maximize performance? this is my first
controller and any suggestion is greatly appreciated.

Article: 91679
Subject: Re: Signal timing problem
From: "Andy Peters" <Bassman59a@yahoo.com>
Date: 10 Nov 2005 15:34:21 -0800
Links: << >> << T >> << A >>

motty wrote:
> Mike--
>
> Seems you are telling me to sample the data on the rising edge.  This
> is the same clock that the external part is seeing.  The external part
> changes data on the rising edge.  I can't be sure that data is valid
> then.

The external part can't change its outputs immediately with the rising
edge of the clock -- there's always some clock-to-out time.  RTFDS.
While you're at it, add in some prop delay between the external device
and the FPGA.

Capturing the "previous" data on the rising edge of the clock is
basically how all synchronous systems work.

-a

Article: 91680
Subject: Re: Is this even true???
From: air_bits@yahoo.com
Date: 10 Nov 2005 15:39:16 -0800
Links: << >> << T >> << A >>

might look at other mfgrs devices, as timings and setup for multibank
accesses
can make a huge difference if concurrent reads/writes are to the same
device.

Article: 91681
Subject: Re: fpga speed logic/density MIPS/FLOPS as compared to general purpose
From: Jim Granville <no.spam@designtools.co.nz>
Date: Fri, 11 Nov 2005 12:49:55 +1300
Links: << >> << T >> << A >>

air_bits@yahoo.com wrote:
> A number of the various papers fail to search out the best space time
> tradeoffs. Mistakes like doing 64bit floating point multipliers the
> hard
> way in an fpga, or doing an FFT/IFFT as wide parallel which isn't
> always the best space time tradeoff.
> 
> There are MANY other architectures that can be developed to optimize
> the performance of a particular application to FPGA, beside brute force
> implementation of wide RISC/CISC processor core elements here.
> Frequently bit serial will yield a higher clocking rate (as it doesn't
> need
> a long carry chain), and doesn't need extra logic for partial sums or
> carry lookahead, so it also delivers more functional units per part,
> but
> at the cost of latency which can frequently be hidden with the faster
> clock rate and high function density per part. It can also remove
> memory as a staging area for wide paralle functional units, and thus
> remove a serialization imposed by the solutions architecture.
> 
> Bit serial operations using Xilinx LUT fifo's can be expensive in both
> power and clock rate reductions, but that is not the only way to use
> LUTs for bit serial memory. Consider using some greycode counters
> and using the LUT's simply as 16x1 rams instead ... faster and less
> dynamic power.
> 
> There are lots of ways to get unexpected performance from FPGAs,
> but not by doing it the worst possible way.
> 
> Be creative. $30M US of FPGAs and memories can easily build a
> 1-10 Petaflop super computer that would smoke existing RISC/CISC
> designs ... we just don't have good software tools and compilers to
> run applications on these machines, or have developed enough
> programming talent used to getting good/excellent performance
> from these devices.
> 
> There are a few dozen better ideas about how to make FPGAs
> as we know them today, into the processor chip of tomarrow,
> but that is another discussion.
> 
> Consider distributed arithmetic made FPGA's popular for high
> performance integer applications, and it's not even a basic type
> available from any of the common compilers or HDL's. Consider
> the space time performance of three variable floating point multiple
> accumulate (MAC) algorithms using this approach for large matrix
> operations.
> 
> Consider this approach for doing high end energy/force/weather
> simulations using a traditional Red/Black interleave as you would
> use for these applications under MPI. 3, 6, 9, 12 variable MAC's
> are a piece of cake with distributed arithmetic, and highly space
> time efficient. The core algorithms of many of these simulations
> are little more than MAC's, frequently with constants, or near
> constrants that seldom need to be changed.
> 
> Consider for many applications the dynamic range needed during
> most of the simulation is very limited, allowing systems to be
> built with FP on both ends of the run, and scaled integers in the
> middle of the run, even simpifing the hardware and improving the
> space time fit even more.
> 
> The big advantage to FPGAs is breaking the serialization that
> memory creates in RISC/CISC architectures. Memoryless
> computing using pipelined distributed arithmetic is the ultimate
> speedup for many applications, including a lot of computer
> vision and pattern recognition applications.
> 
> So read the papers carefully, and consider if there might not be
> a better architecture to solve the problem. If so, take the numbers
> and conclusions presented with a grain of salt.

It can't quite be 'memoryless', but I understand your point;

I'm waiting for Stacked die FPGAs that have fast/wide memory
interfaces to Mbytes of fast xRAM......

There is quite a speed/Icc cost, to driving the all the pin buffers.pcb 
traces in more normal memories.

Meanwhile, I see more 'opening' of the Cell processor, which could
revise some of these FPGA/CPU benchmarks.
The Cell might even make a half-decent FPGA simulation engine, for 
development ?

-jg

Article: 91682
Subject: Re: open-sourced FPGA (vhdl, verilog, C variants) design libraries,
From: Mike Treseler <mike_treseler@comcast.net>
Date: Thu, 10 Nov 2005 16:16:50 -0800
Links: << >> << T >> << A >>

g.wall wrote:
> has anyone in the dig. design and reconfig. computing community looked 
> seriously at open source hardware design libraries, working toward a 
> hardware paradigm similar to that in the open source software community?

Problem 1.

There are ten times as many software designers
as digital hardware designers. The average software guy
is much better at setting up repositories, web sites
and running regression tests than the average hardware guy.
The average hardware guy knows enough HDL to get get by
and maybe enough C language to turn on a circuit board.
Standard software development processes like source control
and code reuse are much less evolved in the hardware area.

Problem 2.

The average software designer couldn't describe
two gates and flip flop in vhdl or verilog.

         -- Mike Treseler

Article: 91683
Subject: Re: Best Async FIFO Implementation
From: cliffc@sunburst-design.com
Date: 10 Nov 2005 16:18:16 -0800
Links: << >> << T >> << A >>

Hi, Davy -

You may want to browse a number of papers on my web page for coding
guidelines and coding styles related to multi-clock design and
asynchronous FIFO design.

At the web page: www.sunburst-design.com/papers

Look for the San Jose SNUG 2001 paper:
Synthesis and Scripting Techniques for Designing Multi-Asynchronous
Clock Designs

Look for the San Jose SNUG 2002 paper:
Simulation and Synthesis Techniques for Asynchronous FIFO Design

Look for the second San Jose SNUG 2002 paper (co-authored with Peter
Alfke of Xilinx):
Simulation and Synthesis Techniques for Asynchronous FIFO Design with
Asynchronous Pointer Comparisons

Peter likes the second FIFO style better but the asynchronous nature of
the design does not lend itself well to timing analysis and DFT.

I prefer the more synchronous style of the first FIFO paper.

I hope to have another FIFO paper on my web page soon that uses Peter's
clever quadrant-based full-empty detection with a more synchronous
coding style.

We spend hours covering multi-clock and Async FIFO design in my
Advanced Verilog Class. These are non-trivial topics that are poorly
covered in undergraduate training. I have had engineers email me to
tell me that their manager told them to run all clock-crossing signals
through a pair of flip-flops and everything should work! WRONG!

Regards - Cliff Cummings
Verilog & SystemVerilog Guru
www.sunburst-design.com

Article: 91684
Subject: Re: fpga speed logic/density MIPS/FLOPS as compared to general purpose microprocessors
From: air_bits@yahoo.com
Date: 10 Nov 2005 16:19:37 -0800
Links: << >> << T >> << A >>

>Meanwhile, I see more 'opening' of the Cell processor

The Cell processor architecture does have some interesting uses, and
strong memory bandwidth, which delivers better than impressive
performance
for it's target markets.

Architecturally it's strengths are also some of it's worst weaknesses
for
building high end machines that would scale well for applications which
assume distributed memory.

The cell processor is a next generation CPU to continue Moore's Law.
The FPGA's which follow to target the same high performance computing
market, will also come with application specific cores and multiple
memory
interfaces to kick but in the same markets. These FPGA's with the same
die size and production volumes will have the same cost. The large
FPGAs
today which have similar die sizes are produced in lower volumes at a
higher
cost which currently skews the cost effectiveness equation toward
traditional
CPUs. Missing are good compiler tools and libraries to even the playing
field.
Cell will suffer some from that too.

Article: 91685
Subject: Re: open-sourced FPGA (vhdl, verilog, C variants) design libraries, working toward a GNU (for hardware) paradigm
From: air_bits@yahoo.com
Date: 10 Nov 2005 16:21:52 -0800
Links: << >> << T >> << A >>

>Problem 2.
> The average software designer couldn't describe
> two gates and flip flop in vhdl or verilog.

does that even matter for "reconfig. computing"?

Article: 91686
Subject: Re: open-sourced FPGA (vhdl, verilog, C variants) design libraries, working toward a GNU (for hardware) paradigm
From: air_bits@yahoo.com
Date: 10 Nov 2005 16:34:43 -0800
Links: << >> << T >> << A >>

Should have noted that the FpgaC project is still looking for
additional
developers, and the long term results of this project are still very
open
to change. It would be great to be able to build a comprehensive set of
library that allow typical MPI and posix-threaded applications to build
and dynamically load/run on multiple FPGA platforms. And to mature
the compiler to handle a full traditional C syntax transparently.

I personally would like to see it handle distributed arithmetic
transparently,
so that it handles the data pipelining of high performance applications
well using data flow like strategies. But that is open to the team as a
whole, with inputs from the user community.

Article: 91687
Subject: Re: open-sourced FPGA (vhdl, verilog, C variants) design libraries,
From: Mike Treseler <mike_treseler@comcast.net>
Date: Thu, 10 Nov 2005 16:39:04 -0800
Links: << >> << T >> << A >>

air_bits@yahoo.com wrote:
>>Problem 2.
>>The average software designer couldn't describe
>>two gates and flip flop in vhdl or verilog.
> 
> does that even matter for "reconfig. computing"?
> 

The OP asked about open source hardware design libraries,
not reconfig. computing.

        -- Mike Treseler

Article: 91688
Subject: Re: open-sourced FPGA (vhdl, verilog, C variants) design libraries, working toward a GNU (for hardware) paradigm
From: Eric Smith <eric@brouhaha.com>
Date: 10 Nov 2005 16:49:49 -0800
Links: << >> << T >> << A >>

Mike Treseler <mike_treseler@comcast.net> writes:
> Problem 2.
> 
> The average software designer couldn't describe
> two gates and flip flop in vhdl or verilog.

Problem 3.

The average software designer couldn't describe two gates
and a flip-flop in C (or any other programming language), but
would instead describe something that synthesizes to a large
collection of gates and flip-flops.

Article: 91689
Subject: Re: Is this even true???
From: Ray Andraka <ray@andraka.com>
Date: Thu, 10 Nov 2005 19:59:37 -0500
Links: << >> << T >> << A >>

Subhasri krishnan wrote:

>Hey all,
>I am designing(trying to design) an sdram controller (for a PC133
>module) to work as fast as it is possible and as I understand from the
>datasheet, if I read faster than I need to refresh, then I can avoid
>refresh altogether. i.e if the refresh period is 64ms and if i access
>the data every, say, 20ms then I dont have to refresh. Please tell me
>if this is true or if I am getting confused.
>Thanks in Advance.
>
>  
>
This is true provided you access every single row, well at least every 
row you have data in, within
the refresh time.  This can be used to advantage in video frame buffers, 
for example as long as the
frame time does not exceed the refresh time.  So yes, it can be useful.  
It doesn't save a lot of
meory bandwidth or time, but it can substantially simplify the DRAM 
controller in your design.


-- 
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 91690
Subject: Re: PC Core AD(x) I/O Enable?
From: Kevin Brace <sa0les1@brac2ed3esi4gns5olut6ions.com>
Date: Fri, 11 Nov 2005 01:08:03 GMT
Links: << >> << T >> << A >>

Hi Anthony,

Data bus signals' tri-state FF not being included in an IOB is for 
timing reasons.
Even when Address Stepping is used, those FFs still do rely on several 
unregistered signals, and when larger devices are used, the unregistered 
signals (i.e., FRAME#, IRDY#, etc.) will have to travel longer distance, 
thus making harder to meet the PCI's stringent setup time requirement. 
(3ns for 66MHz PCI and 7ns for 33MHz PCI.)
Instead, by not including a tri-state FF in IOBs, it allows the 
tri-state FFs to be placed near the unregistered signals, making it 
easier to meet setup time.
Once those unregistered signals go through LUTs, and get captured by a 
FF, they become registered, and once registered, the registered signal 
has much more timing margin. (15ns for 66MHz PCI and 30ns for 33MHz PCI.)

Kevin Brace

Anthony Ellis wrote:
> Hi Kevin,
> 
> I can't figure your explanation. Even if you wnated to step (in clock cycles) the IO enable could still be in the IOB! Using an internal FF, with defined placement and routing, gives control of scew within the same cycle - if you wanted it!
> 
> Anthony.

-- 
Brace Design Solutions
Xilinx (TM) LogiCORE (TM) PCI compatible BDS XPCI PCI IP core available 
for as little as $100 for non-commercial, non-profit, personal use.
http://www.bracedesignsolutions.com

Xilinx and LogiCORE are registered trademarks of Xilinx, Inc.

Article: 91691
Subject: Re: open-sourced FPGA (vhdl, verilog, C variants) design libraries, working toward a GNU (for hardware) paradigm
From: air_bits@yahoo.com
Date: 10 Nov 2005 17:36:11 -0800
Links: << >> << T >> << A >>

>Problem 3.
>
>The average software designer couldn't describe two gates
>and a flip-flop in C (or any other programming language), but
>would instead describe something that synthesizes to a large
>collection of gates and flip-flops.

in TMCC/FpgaC (and Celoxica, and a number of other C HDL like tools)
what you just asked for is pretty easy, and comments otherwise are
pretty egocentric bigotry that just isn't justified.

int 1 a,b,c,d;     // four bit values, possibly mapped to input pins
int 1 singlebit;   // describes a single register, possibly an output
pin
singlebit = (a&b) | (c&d);   // combinatorial sum of products for
ab+cd;

I can train most kids older than about 6-10 to understand this process
and
the steps to produce it. It doesn't take an EE degree to understand or
implement.

So beating your chest here is pretty childish, at best.

Article: 91692
Subject: Re: open-sourced FPGA (vhdl, verilog, C variants) design libraries, working toward a GNU (for hardware) paradigm
From: air_bits@yahoo.com
Date: 10 Nov 2005 17:46:55 -0800
Links: << >> << T >> << A >>

There is a small setup overhead for the main, but for example
this certainly does NOT synthesize "to a large collection of
gates and flip-flps" as you so errantly assert cluelessly:

main()
{

int a:1,b:1,c:1,d:1;
#pragma inputport (a);
#pragma inputport (b);
#pragma inputport (c);
#pragma inputport (d);

int sum_of_products:1;
#pragma outputport (sum_of_products);

   while(1) {
        sum_of_products = (a&b) | (c&d);
   }
}

Produces the following default output (fpgac -S example.c) as
example.xnf:
LCANET, 4
PWR, 1, VCC
PWR, 0, GND
PROG, fpgac, 4.1, "Thu Nov 10 19:42:27 2005"
PART, xcv2000ebg560-8
SYM, CLK-AA, BUFGS
PIN, I, I, CLKin
PIN, O, O, CLK
END
SYM, FFin-0_1_0Running, INV
PIN, I, I, 0_1_0Zero
PIN, O, O, FFin-0_1_0Running
END
SYM, 0_1_0Running, DFF
PIN, D, I, FFin-0_1_0Running
PIN, C, I, CLK
PIN, CE, I, VCC
PIN, Q, O, 0_1_0Running
END
SYM, FFin-0_1_0Zero, BUF
PIN, I, I, 0_1_0Zero
PIN, O, O, FFin-0_1_0Zero
END
SYM, 0_1_0Zero, DFF
PIN, D, I, FFin-0_1_0Zero
PIN, C, I, CLK
PIN, CE, I, VCC
PIN, Q, O, 0_1_0Zero
END
SYM, 0_4__a, IBUF
PIN, I, I, a
PIN, O, O, 0_4__a
END
EXT, a, I
SYM, 0_4__b, IBUF
PIN, I, I, b
PIN, O, O, 0_4__b
END
EXT, b, I
SYM, 0_4__c, IBUF
PIN, I, I, c
PIN, O, O, 0_4__c
END
EXT, c, I
SYM, 0_4__d, IBUF
PIN, I, I, d
PIN, O, O, 0_4__d
END
EXT, d, I
SYM, 0_10__sum_of_products-OBUF, OBUF
PIN, I, I, 0_10__sum_of_products
PIN, O, O, sum_of_products
END
EXT, sum_of_products, O
SYM, FFin-0_10__sum_of_products, BUF
PIN, I, I, T0_15L49_0_10__sum_of_products
PIN, O, O, FFin-0_10__sum_of_products
END
SYM, 0_10__sum_of_products, DFF
PIN, D, I, FFin-0_10__sum_of_products
PIN, C, I, CLK
PIN, CE, I, 0_13_L21looptop
PIN, Q, O, 0_10__sum_of_products
END
SYM, FFin-0_13_L21looptop, EQN, EQN=((~I1)+(I0))
PIN, I1, I, 0_1_0Running
PIN, I0, I, 0_13_L21looptop
PIN, O, O, FFin-0_13_L21looptop
END
SYM, 0_13_L21looptop, DFF
PIN, D, I, FFin-0_13_L21looptop
PIN, C, I, CLK
PIN, CE, I, VCC
PIN, Q, O, 0_13_L21looptop
END
SYM, SYMT0_15L49_0_10__sum_of_products, EQN, EQN=((I0*I1)+(I2*I3))
PIN, I3, I, 0_4__a
PIN, I2, I, 0_4__b
PIN, I1, I, 0_4__c
PIN, I0, I, 0_4__d
PIN, O, O, T0_15L49_0_10__sum_of_products
END
EOF

Article: 91693
Subject: Re: open-sourced FPGA (vhdl, verilog, C variants) design libraries, working toward a GNU (for hardware) paradigm
From: air_bits@yahoo.com
Date: 10 Nov 2005 18:06:04 -0800
Links: << >> << T >> << A >>

Go back and read the first line of the first post, and you will clearly
see
the author included reconfigurable computing in the discussion.

Article: 91694
Subject: Re: open-sourced FPGA (vhdl, verilog, C variants) design libraries,
From: Jim Granville <no.spam@designtools.co.nz>
Date: Fri, 11 Nov 2005 15:11:40 +1300
Links: << >> << T >> << A >>


Eric Smith wrote:
> Mike Treseler <mike_treseler@comcast.net> writes:
> 
>>Problem 2.
>>
>>The average software designer couldn't describe
>>two gates and flip flop in vhdl or verilog.
> 
> 
> Problem 3.
> 
> The average software designer couldn't describe two gates
> and a flip-flop in C (or any other programming language), but
> would instead describe something that synthesizes to a large
> collection of gates and flip-flops.

3b, Without realising it.

-jg

Article: 91695
Subject: Re: Can't pack into OLOGIC
From: "Brian Davis" <brimdavis@aol.com>
Date: 10 Nov 2005 18:13:00 -0800
Links: << >> << T >> << A >>

john wrote:
>
> It is for a bidirectionnal signal: input is registered into IOB, output is also registered there,
> but the duplicated tristate_enable registers don't want to go inside the OLOGIC (Virtex 4).
> Each of them is not that far, but not into the IOB!
>

 Last time I tried this with XST 6.3 / Spartan-3, I had to try a few
coding variants before all the data registers and tristate controls
were properly stuffed into the IOBs from non-structural HDL code.

 Below are some simplified (hand edited,uncompiled!!) code snippets
from a S3 eval kit RAM test that I posted last fall, for the whole
thing see :
  ftp://members.aol.com/fpgastuff/ram_test.zip

Code Snippets:

<ports>

  ram_addr   : out std_logic_vector(17 downto 0);
  ram_dat    : inout  std_logic_vector(15 downto 0);

<signals>

  --
  -- internal ram signals
  --
  signal addr         : std_logic_vector(17 downto 0);
  signal din          : std_logic_vector(15 downto 0);
  signal ram_dat_reg  : std_logic_vector(15 downto 0);

  signal wdat_oe_l : std_logic;

  --
  -- IOB attribute needed to replicate tristate enable FFs in each IOB
  --
  attribute iob of wdat_oe_l : signal is "true";


<code>

     --
     -- output data bus tristate
     --
     --   XST seems to want tristates coded like this to push both
     --   the tristate control register and the data register into IOB
     --   ( had previously been coded as clocked tristate assignment )
     --
     ram_dat <= ram_dat_reg when wdat_oe_l = '0' else ( others => 'Z'
);


	 --
	 -- registered RAM I/O
	 --
     process(clk)
       begin

         if rising_edge(clk) then

           --
           -- IOB registers
           --
           ram_dat_reg <= tdat(15 downto 0);
           ram_addr    <= taddr;

           --
           -- registered tristate control signal
           --   coded this way, with IOB attribute on wdat_oe_l, so
           --   XST will replicate tristate control and push into IOBs
           --
           if  (done_p1 = '0') and ( read_write_p1 = '0') then
              wdat_oe_l <= '0';
           else
              wdat_oe_l <= '1';
           end if;

           --
           -- register input data
           --
           din  <= ram_dat;

         end if;

       end process;

Article: 91696
Subject: Re: open-sourced FPGA (vhdl, verilog, C variants) design libraries, working toward a GNU (for hardware) paradigm
From: air_bits@yahoo.com
Date: 10 Nov 2005 18:38:17 -0800
Links: << >> << T >> << A >>

> 3b, Without realising it.

The interesting point in this process, is that the tools are evolving
to hide
design issues that are seldom a worry for typical cases like
reconfigurable
computing on FPGA compute engines.

Every programmer decides how big each and every variable should be.
For a machine that has a few Gigabytes of memory, and a 64bit native
word size, using 64bit variable may be either free, or faster as it may
not
take an extra step to sign extend the memory on register load.

When programmers move to smaller processors, they quickly learn that
when
programming a PIC micro, that 64bit word sizes just don't work well.

When programmers encounter FPGA compute engines the same processes
quickly come into play, and a short mentoring of the newbies to size
variables
by the bit, or be careful and use char, int, long, and long long
properly isn't that
difficult, or even unexpected.

if the fpga is a single toy sized fpga, it's no different that
programming a PIC
micro, as resources are tight, and the programmer will adapt.

If the fpga system is 4,096 tightly interconnected XC4VLX200's and the
application
isn't particularly large, I suspect the programmer writing applications
for this
fpga based super computer will not have to worry about fit. If they are
fine tuning
the bread and butter simulations at places like Sandia Labs, I suspect
the programmers
will have more than enough experience and skill to size variables
properly and be
very much in tune with space time tradeoffs for applications far more
complex
than even a typical programmer would consider.

it's reconfigurable computing projects where libraries of designs
become very
useful, particularly for SoC designs that used to be an EE design task,
and
is rapidly becoming mainstreamed that software engineers are the most
likely
target as the market continues to mature and expand.

There will be some dino's that stand in the tar pits admiring the bits
as the sun
sets on that segment of their employment history.

Article: 91697
Subject: Re: Forcing carry-ripple adder ?
From: "Mike Treseler" <mike_treseler@comcast.net>
Date: Thu, 10 Nov 2005 18:43:57 -0800
Links: << >> << T >> << A >>

vssumesh wrote:
> just one question (not directly related to the topic)... if i write A=
> C+ D  in the verilog and choose optimize for speed will the tool
> generate the CLA adder ???

Try it and see.
Synthesis only guarantees to
match a netlist to your code.

-- Mike Treseler

Article: 91698
Subject: Re: open-sourced FPGA (vhdl, verilog, C variants) design libraries, working toward a GNU (for hardware) paradigm
From: Eric Smith <eric@brouhaha.com>
Date: 10 Nov 2005 18:44:37 -0800
Links: << >> << T >> << A >>

air_bits@yahoo.com writes:
> There is a small setup overhead for the main, but for example
> this certainly does NOT synthesize "to a large collection of
> gates and flip-flps" as you so errantly assert cluelessly:
> 
> main()
> {
> 
> int a:1,b:1,c:1,d:1;
> #pragma inputport (a);
> #pragma inputport (b);
> #pragma inputport (c);
> #pragma inputport (d);
> 
> int sum_of_products:1;
> #pragma outputport (sum_of_products);
> 
>    while(1) {
>         sum_of_products = (a&b) | (c&d);
>    }
> }

Why should a C programmer expect that to synthesize any flip-flops
at all?  It looks purely combinatorial.

How would you write it if you did NOT want a flip-flop, but only
a combinatorial output?

Anyhow, I wasn't suggesting that the language couldn't represent a few
gates and a flip-flop.  My point is that C programmers don't think
in those terms, so anything they write is likely to result in really
inefficient hardware designs.

For example, typical C code for a discrete cosine transform can be found
here:

	http://www.bath.ac.uk/elec-eng/pages/sipg/resource/c/fastdct.c

But I suspect that code will synthesize to something at least an
order of magnitude larger and an order of magnitude slower than a
typical HDL implementation.

That doesn't mean that you couldn't write a DCT in C that would
synthesize to something efficient; it just means that a normal C
programmer *wouldn't* do that.  You'd have to train the C programmer
to be hardware designer first, and by the time you've done that
there's little point to using C as the HDL, since the whole point
of using C as an HDL was to take advantage of the near-infinite
pool of C programmers.

Eric

Article: 91699
Subject: Re: open-sourced FPGA (vhdl, verilog, C variants) design libraries, working toward a GNU (for hardware) paradigm
From: Eric Smith <eric@brouhaha.com>
Date: 10 Nov 2005 18:48:18 -0800
Links: << >> << T >> << A >>

I wrote:
> Problem 3.
> The average software designer couldn't describe two gates
> and a flip-flop in C (or any other programming language), but
> would instead describe something that synthesizes to a large
> collection of gates and flip-flops.

Jim Granville <no.spam@designtools.co.nz> writes:
> 3b, Without realising it.

Exactly so.  It's perhaps less commonly seen in C, since C *only* has
low-level constructs, but the vast majority of C++ and Java programmers
seem to have no conception of what the compiler is likely to emit for
the programming constructs they use.

A former coworker once tried to write C++ code to talk to Dallas one-wire
devices.  He spend days trying to debug it before someone took pity on
him and pointed out that by the time the constructor for one of his
objects executed the entire transaction had timed out.

Eric

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search