Messages from 132500

Article: 132500
Subject: Re: Are FPGAs headed toward a coarse granularity?
From: Jim Granville <no.spam@designtools.maps.co.nz>
Date: Thu, 29 May 2008 17:59:56 +1200
Links: << >> << T >> << A >>

rickman wrote:
<snip>
> So, are coarse grained architectures the way of FPGA... opps FPxA
> devices in the near future?  Will the lowly LUT and FF be pushed into
> the dark corners of the die in coming years?  I think it is not a
> matter of if, just a matter of when and I think the when is soon!

The main drive seems to be MHz, as hard IP is always faster than
Soft Logic.

Pretty much all the FPGA's now have 'DSP blocks' and those
blocks get ever-more complex. Some have GHz links hardwired.

The latest Altera device uses more of this 'Hard IP', but the
soft-logic speeds have not increased much.

There will always be LUT/FF areas, as that handles the
State engines etc, but perhaps the next iteration, will be wide-path BUS 
routing.

-jg

Article: 132501
Subject: Re: HDL - simulation vs synthesis
From: backhus <nix@nirgends.xyz>
Date: Thu, 29 May 2008 08:09:23 +0200
Links: << >> << T >> << A >>

Hi Jared,
synthesis tools like ISE XST try to optimize the design as best as possible.
If you leave the output of a flipflop open, it's of no use for the 
design anymore and the Flipflop is deleted. Then the gates to the former 
flipflops input have open outputs and are deleted as well. That goes on 
and on until the Input from your switch is reached.
This behavior may cause no warnings but infos so read them as well.
Anyway, if you have an idea about what elements your design should have 
and the synthesis result is suspiciously small you should check the 
reports for deleted flipflops and combinatorical circuits.

Similar thing happens when inputs have a fixed connection to high or 
low. The synthesis tool calculates the optimised logic and the result 
may be fixed outputs, so the design will be reduced to some fixed 
connections to high and low at the output.

Sometimes whole designs vanish this way. :-)

Have a nice synthesis
   Eilert


jared.pierce@gmail.com schrieb:
> Thanks for the help guys.  I made several changes and the problem
> finally went away.  It works!  I still don't know why it wasn't
> working so I plan to back up what I have and insert one test problem
> at a time to see what happens(of the problems I fixed).  I still don't
> get why the RTL schematic would bother to include a DFF where the Q
> wasn't connected to anything.  It would have been just as good not to
> include it.  Any hints on that one?

Article: 132502
Subject: Re: Sequentially syncrhronous
From: "MikeWhy" <boat042-nospam@yahoo.com>
Date: Thu, 29 May 2008 01:52:17 -0500
Links: << >> << T >> << A >>

"rickman" <gnuarm@gmail.com> wrote in message 
news:5d1e4ddd-969f-47a4-9959-b017f9fd8c0f@m3g2000hsc.googlegroups.com...
> On May 28, 12:10 pm, KJ <kkjenni...@sbcglobal.net> wrote:
>> On May 28, 11:24 am, Brian Philofsky
>>
>> <brian.philofsky@no_xilinx_span.com> wrote:
>> > KJ wrote:
>> > > On May 28, 6:50 am, "MikeWhy" <boat042-nos...@yahoo.com> wrote:
>>
>> > This very well could be a timing issue but another possible cause of
>> > this could be the writing of simulatable code but not synthesizable
>> > code.  Looking at the code, I see the signals a and b in the 
>> > sensitivity
>> > list of the process which could cause the else statement to get
>> > evaluated asynchronously in simulation however for synthesis, the
>> > sensitivity list is likely ignored (generally with a warning) and thus
>> > processed differently.
>>
>> Having 'a' and 'b' in the sensitivity list is not the problem.  The
>> structure of the code is
>> process(...)
>> begin
>> if (rst = '1') then
>>    ...assignements here
>> elsif (clk'event and clk = '1') then
>>    ...assignements here
>> end if;
>> end process;
>>
>> There are no assignments to anything outside of the if statement.
>> That if statement is of the form for flops with async presets/resets.
>> Having 'extra' signals in the sensitivity list will not result in any
>> difference between simulation and synthesis.  If it does, then contact
>> the synthesis tool provider and submit a bug report.
>
> Not only is the a and b not a problem, the a in the sensitivity list
> is *required* for proper simulation unless the simulator has the
> smarts to fix problems with the sensitivity list.

XST complained that they were missing from the sensitivity list and warned 
that it was proceeding as though they were there. So I added them.

> Notice that there are assignments in the if (rst ='1') clause that
> have a on the right hand side.  Since this is not in the clocked
> portion of the process, this is a concurrent assignment and the
> process must run when either rst or a change.

Aha. Thanks. That was lucid enough to understand.

>> > >> Also, is there a way to tell XST to not treat reset as a clock? I 
>> > >> haven't
>> > >> fully read up on configuration, having spent way too much time on 
>> > >> this
>> > >> little time waster.
>>
>> > I imagine you are referring to XST using a global buffer for the reset
>> > signal.  In general this should not cause any issues and many times can
>> > be the right thing to do but if you want to go to prevent that 
>> > behavior,
>> > tell XST you want an IBUF on the reset signal by adding the following
>> > attribute:

I missed the import of this the first time through. I'll make note of it for 
the future. It added the clock buffer even when I used a GPIO line, not just 
the sys_rst_pin which has special attributes in the config file.

> If the reset signal is being run on a global clock line, it is because
> your code does not allow the GSR to be used (a global net dedicated to
> the reset function).  You are resetting to a signal value instead of a
> fixed value, so the set/reset signals have to be brought out to the
> routing matrix to accommodate that.  It is actually preferred to use
> the clock nets for such a reset since your reset likely has a high
> fanout and it will not be able to meet a fast timing spec any other
> way.  This also saves a lot of routing resources if you have spare
> clock lines.
>
>
>> > >> Last, .... is this really worth pursuing? I've been programming for 
>> > >> 25
>> > >> years, and know that the greatest leassons come after the greatest 
>> > >> pain. But
>> > >> there's also good pain, and just senseless injury. Is this not a 
>> > >> suitable
>> > >> first Zen parable to contemplate? I'm goaded forward by the belief 
>> > >> that
>> > >> there's a good lesson on synchronous systems lurking as the 
>> > >> punchline.
>>
>> > > The punchline might be static timing analysis.  Signals don't just
>> > > 'happen' when you want them to, you need to guarantee by design that
>> > > they arrive at the proper time relative to the clock.
>
> Can you say what your clock rate is?  For the most part, speeds of
> below 25 MHz are pretty easy to meet.  Speeds of 100 MHz are a lot
> tougher.  Speeds in the middle depend on the logic and the density.
> As you get above about 70% or 80% full, it gets harder to meet faster
> timing.

125 MHz. One of the timing reports said the circuit should be able to get 
300+ MHz. Alas. It wasn't circuit I intended.

I understand race conditions as they apply to multithreaded software. I 
imagine the root problem I have isn't too very different here. Slowing down 
one side to avoid contention isn't the same as preventing it.

Reading the simulation traces was very educational. Debounce is a ff. 
Upstream logic gates the clock enable to cause it to flop state. I can see 
how the brief strobe can be missed if the actual gate timing differs from 
the sim by just a tiny amount. About all I can say at this point is I 
learned 10 times more this way than if it had all just worked when I typed 
it in.

> The one problem that most newbies have, especially if they come from a
> software orientation, is thinking of HDL as software.  HDL stands for
> Hardware Description Language and that is how it works.  It describes
> hardware.  If you try to write it like software, you most likely won't
> care for the hardware that results, if it produces hardware at all.
> Every construct I write I picture in terms of the hardware it will
> generate as well as the behavior it has.

I took time away from the keyboard tonight to re-read code from Pong Chu's 
newbie VHDL book, to find where I had gone astray. There's light at the end 
of this tunnel. I absolutely have it backwards and inside out on how to 
protect the synchronous states. It is, as you say, a software habit that 
doesn't apply.

Article: 132503
Subject: Re: Virtex 2 with PLB_v34 and EDK 10.1
From: Markus <none@nowhere.org>
Date: Thu, 29 May 2008 08:56:15 +0200
Links: << >> << T >> << A >>


There was a similar discussion about device support on this list, for edk 9.2:

http://groups.google.de/group/comp.arch.fpga/browse_frm/thread/d91828e0a0528747/fdb2b5b939566021?hl=de&lnk=st&q=edk+9.2+virtex+2+group%3Acomp.arch.fpga#fdb2b5b939566021

I remember that VirtexII devices should be supported properly again in 
version 10.1 which is not the case, obviously.

If you want to hack your EDK, change the *.mpd file: add VirtexII to the 
supported families and see what happens then. (And tell us)

-Markus

rmeiche schrieb:
> Hi,
> 
> I'm trying to build a system for a XC2V6000 FPGA. The problem I have
> is that I have to implement a PLB. But the PLB shipped with the EDK
> 10.1 is the PLB_v46 which doesn't support the virtex II, only V2Pro
> and V4.
> 
> At the datasheets on the xilinx website I found that the PLB_v34
> should support the V2  (see this link:  http://www.xilinx.com/products/ipcenter/plb_v34.htm
> ).
> 
> But there exist two versions of the datasheet. One on the xilinx
> website and one at the pcore directory. I got the pcore from the EDK
> 8.2 which includes the PLB_v34 in version 1.01a. The datasheet says
> that this version only supports V2Pro and V4.
> 
> I just took that core and copied it to the pcore directory of my EDK
> 10.1 project and added it to the system. I ignored the warnings and
> started the build process but this was aborted with the reason that
> the Virtex2 isn't supported.
> 
> 
> 
> Does that mean that the datasheet on the xilinx website is wrong? Did
> I something wrong?
> 
> 
> 
> Has anyone tried to implement a PLB on a Virtex II system?
> 
> 
> 
> Thanks.

Article: 132504
Subject: Re: Sequentially syncrhronous
From: "MikeWhy" <boat042-nospam@yahoo.com>
Date: Thu, 29 May 2008 02:11:30 -0500
Links: << >> << T >> << A >>

"rickman" <gnuarm@gmail.com> wrote in message 
news:1e1cedff-2d7c-4d5a-b477-ee48b163eaba@p25g2000hsf.googlegroups.com...
> Now that I have looked at it, this seems so frigging simple that I am
> going to try to write it on the fly!

:) I've said that more than once, on just this alone.

I didn't give much thought to actually taking best advantage of the encoder 
resolution. The encoder is a hand knob with detents. Reading just one 
transition per detent serves this exercise better, to demonstrate proper 
debouncing. However, I can see room for a third try, but likely not for a 
few weeks or months. At which point, it should be entirely trivial. ;)


>
> right_way : process (clk, rst) begin
>  if (rst = '1') then
>    old_a <= '0';
...

Thanks to all who wrote with their thoughts.

Article: 132505
Subject: Re: Ph.D Student
From: Pablo <pbantunez@gmail.com>
Date: Thu, 29 May 2008 00:20:54 -0700 (PDT)
Links: << >> << T >> << A >>

I have just included my article in Spain.

Thanks for your reply.

PD: I work at University of Extremadura in Spain.

Article: 132506
Subject: Xilinx Clock Doubler
From: Grant Stockly <grant@stockly.com>
Date: Thu, 29 May 2008 01:23:49 -0700 (PDT)
Links: << >> << T >> << A >>

I have a 10MHz clock but needed a 20MHz clock speed.  I used two
asynchronous clear flip flops with a series of buffers to add delay to
the signal.

Is this a bad practice?  Will it fail with time or temperature?  It
works fine on a PCB, but I am concerned!  It does exactly what I want,
increment the counter on both rising and falling edges.

http://www.stockly.com/images4/080529-Clock_Doubler.jpg

Above is a link to a picture the Xilinx schematic.

Thanks

Grant

Article: 132507
Subject: RGB video panel
From: Ankit <ankitanand1986@gmail.com>
Date: Thu, 29 May 2008 01:28:49 -0700 (PDT)
Links: << >> << T >> << A >>

Hi everyone,i am trying to display a video on a rgb leds.The leds
would be connected to the fpga what i was concerned about is that
which video format can be displayed easily on the leds..Waiting for
your replies..


Regards
Ankit Anand

Article: 132508
Subject: Re: Xilinx Clock Doubler
From: Grant Stockly <grant@stockly.com>
Date: Thu, 29 May 2008 01:29:03 -0700 (PDT)
Links: << >> << T >> << A >>

If this is NOT recommended, then would 2 two bit counters (one with an
inverted clock) and a 4 bit adder be the best solution?  I'd like to
keep my clock at 10MHz.

Article: 132509
Subject: Re: Xilinx Clock Doubler
From: "Symon" <symon_brewer@hotmail.com>
Date: Thu, 29 May 2008 09:46:20 +0100
Links: << >> << T >> << A >>

"Grant Stockly" <grant@stockly.com> wrote in message 
news:f7276c42-5e33-4b6e-96e1-d16afeae08cb@p25g2000pri.googlegroups.com...
> If this is NOT recommended, then would 2 two bit counters (one with an
> inverted clock) and a 4 bit adder be the best solution?

That's almost certainly not the 'best' solution. A better solution is to use 
a DCM to double the frequency. At 10MHz input frequency, you'll need to use 
its CLKFX output.

> I'd like to keep my clock at 10MHz.

No you wouldn't. You'd like to keep your logic clock _enabled_ at 10MHz, but 
clocked by your newly DCMed 20MHz clock.

HTH., Syms.

p.s. Designing with schematics? How quaint! ;-)

Article: 132510
Subject: FIR in FPGA
From: fazulu deen <fazulu.vlsi@gmail.com>
Date: Thu, 29 May 2008 04:25:50 -0700 (PDT)
Links: << >> << T >> << A >>

Hai,

1.Can i set  the clock frequency of the FIR filter at any frequency i
want... but pretty much higher than the sample rate?
  for example :Fc=3.5khz
                    Fs=8khz
         can i clock as any value >8khz say 1Mhz(considering the max
clock for target device)

2.Whether direct form non-symmetric filter structure can support
symmetric coefficients??whether the response computed in non-symmetric
structure is same as  symmetric filter structure??
  I knew resource utlization wise symmetric need more adder at the
cost of multpliers..

3.In addition to impulse test(basic test to check FIR filter
operation before implementing to FPGA),step test,sine wave test.What
are other test that has to be  compulsorily performed in time domain
to check the proper working
filter operation before giving any arbitary input to the filter??

regards,
faz

Article: 132511
Subject: Re: asic gate count
From: Brian Drummond <brian_drummond@btconnect.com>
Date: Thu, 29 May 2008 12:27:59 +0100
Links: << >> << T >> << A >>

On Wed, 28 May 2008 11:24:03 -0700 (PDT), "vijayant.rutgers@gmail.com"
<vijayant.rutgers@gmail.com> wrote:

>I have a design on FPGA that is ready. However, we need to have some
>mapping from fpga design to asic. I know that this will not be
>accurate. But accuracy is not our concern right now. We just need
>upper bound.  Also, we are also looking for some IP Core for ASIC so
>that we can rough estimate.
>
>Regards,
>Vijayant
>
One approach is to run it through the Xilinx tools and review the map
report (.mrp file). If you take this approach, I suggest eliminating
memory blocks (PPC if used) and DSP/multiplier blocks and re-running, to
understand how much of the gate count comes from these blocks.

- Brian

Article: 132512
Subject: Re: Xilinx LogicCore Direct Instantiation
From: Brian Drummond <brian_drummond@btconnect.com>
Date: Thu, 29 May 2008 12:32:35 +0100
Links: << >> << T >> << A >>

On Wed, 28 May 2008 19:45:53 -0400, krw <krw@att.bizzzzzzzzzz> wrote:

>In article <OqSdnacY_bKfqqHVnZ2dnUVZ_tvinZ2d@lmi.net>, 
>rgaddi@technologyhighland.com says...
>> krw wrote:
>> 
>> Any reason to not just infer the comparator?  VHDL generics make this 
>> sort of thing a breeze.
>
>Two things...  First, I want to walk before running.  I also need to 
>manually instantiate BRAMs and it would be nice to ditch the GUI 
>altogether.  It would make managing mu libraries through various 
>core releases much simpler.  
>
>Perhaps it's no longer true, but I found that the LogicCore devices  
>were better optimized than the ones that were inferred from HDL.  

IMO it is good practice to infer first, while bearing this statement in
mind. Then you have a portable design which may be largely good enough;
you only need to pay attention to instantiation where the inferred
design fails size or timing (which does still happen sometimes)

Sounds as if the comparators are good enough now.

- Brian

Article: 132513
Subject: Re: error when 'generating simulation hdl files' in xilinx xps
From: Brian Drummond <brian_drummond@btconnect.com>
Date: Thu, 29 May 2008 12:48:10 +0100
Links: << >> << T >> << A >>

On Wed, 28 May 2008 04:44:40 -0700 (PDT), "fatfpga@googlemail.com"
<fatfpga@googlemail.com> wrote:

>hi,
>
>does anyone know how to solve this error when selecting 'generate
>simulation hdl files' in xps (xilinx edk 9.1):

>Running Data2Mem with the following command:
data2mem -bm system_sim.bmm  -bd
/pl/hardware/user-platforms/MySystemV5/fs-boot/executable.elf tag
microblaze_0 -u  -o u tmpucf.ucf
>ERROR:MDT - Ucf2Vhdl Conversion Generated Errors.

What does this command do on its own? (from a shell)
Find out why that isn't working, fix it and try again.

- Brian

Article: 132514
Subject: Re: RGB video panel
From: Brian Drummond <brian_drummond@btconnect.com>
Date: Thu, 29 May 2008 12:54:16 +0100
Links: << >> << T >> << A >>

On Thu, 29 May 2008 01:28:49 -0700 (PDT), Ankit
<ankitanand1986@gmail.com> wrote:

>Hi everyone,i am trying to display a video on a rgb leds.The leds
>would be connected to the fpga what i was concerned about is that
>which video format can be displayed easily on the leds..Waiting for
>your replies..

Unless you have a very large budget for LEDs, you might want to use
Baird Televisor format.

HTH,

- Brian

Article: 132515
Subject: Re: Are FPGAs headed toward a coarse granularity?
From: Kolja Sulimma <ksulimma@googlemail.com>
Date: Thu, 29 May 2008 04:56:04 -0700 (PDT)
Links: << >> << T >> << A >>

On 29 Mai, 07:59, Jim Granville <no.s...@designtools.maps.co.nz>
wrote:
> rickman wrote:
>
> <snip>
>
> > So, are coarse grained architectures the way of FPGA... opps FPxA
> > devices in the near future?  Will the lowly LUT and FF be pushed into
> > the dark corners of the die in coming years?  I think it is not a
> > matter of if, just a matter of when and I think the when is soon!
>
> The main drive seems to be MHz, as hard IP is always faster than
> Soft Logic.

Special purpose hardware is allways faster than general purpose
hardware, except in the general case ;-)

Coarses granularity makes the implementation of what you are building
a lot more efficient, but at the same time it is less likely to match
the desires of the designer.

Take the DSP block as an example (lets forget the multiplier for now,
as this uses an additional advantage: The existence of very clever
hardware structures for multipliers)
The muxes and adders use a lot less configuration bits and low level
muxes as always 18 to 48 elements are configured to implement the same
functions, and the data lines always run in parallel, can't be
permuted as they could be in the FPGA fabric. This is a huge gain for
a 48+18 bit accumulator.
But, if you need 49 bits you lose a factor of two immediately. (The
fabric implementation grows only by a 2%, the DSP48 implementation by
100%)

Andr=E9 DeHon analyzed this in in a chapter of his PhD thesis many years
ago:
http://www.seas.upenn.edu/~andre/abstracts/dehon_phd.html
There are graphs showing the efficiency as a function of application
word length and hardware granularity.

It should be noted that in FPGAs both delay and area are dominated by
the routing ressources. Therefore mainly the granularity of the
routing should be optimized.

No design has millions of gates of random logic. Large designs are
dominated by arithmetic function blocks. Therefore it is likely that
an FPGA with a granularity of 2 for example will have a much better
efficiency than current FPGAs. For random control logic half of the
LUTs would remain unused, but for datapathes the utilization would
approach 100% and the device coud save as much as 75% of the switches
and configuration bits.

This is old knowledge for FPGA architecure folks, but there are two
strong arguments against it:

1.)
It is hard to quantify routing utilization, but the competitors
marketing will immediately target the lower LUT utilization as a
disadvantage. (But hey, if a LUT costs 75% less, who cares if I can
only use 80% of the LUTS? Especially if the clock frequency is
better?)

2)
Granularity 1 FPGAs make use of the huge knowledge about ASIC EDA
algorithms. For higher granularities you need to redevelopemost of the
software toolflow from scratch.

There is a small FPGA vendor that has high speed global routing with
10bit granularity. Maybe this is a start. The area savings are
marginal, as most of the switches are in local routing, but the speed
improvement for long connections is significant.

Kolja Sulimma

Article: 132516
Subject: Re: FIR in FPGA
From: Jonathan Bromley <jonathan.bromley@MYCOMPANY.com>
Date: Thu, 29 May 2008 13:10:38 +0100
Links: << >> << T >> << A >>

On Thu, 29 May 2008 04:25:50 -0700, fazulu deen wrote:

Let me try to respond with some questions of my own.

>1.Can i set  the clock frequency of the FIR filter at any frequency i
>want... but pretty much higher than the sample rate?
>  for example :Fc=3.5khz
>                    Fs=8khz
>         can i clock as any value >8khz say 1Mhz(considering the max
>clock for target device)

Are you familiar with the concept of "clock enable"?

>2.Whether direct form non-symmetric filter structure can support
>symmetric coefficients??

I assume you understand the phrases you have used in the 
question.  Can you please explain what you find
hard or non-obvious about your question?

>whether the response computed in non-symmetric
>structure is same as  symmetric filter structure??

Have you considered the effects of numeric overflow
in your filter?

>  I knew resource utlization wise symmetric need more adder at the
>cost of multpliers..

Are you sure it needs more adders than an equivalent canonical
implementation?  What leads you to believe that?

>3.In addition to impulse test(basic test to check FIR filter
>operation before implementing to FPGA),step test,sine wave test.What
>are other test that has to be  compulsorily performed in time domain
>to check the proper working
>filter operation before giving any arbitary input to the filter??

Do you understand the concept of linearity?  Can you think of
anything that might make your filter non-linear?  (Clue:
my previous question about overflow).  Do you trust your
adder, multiplier and register building blocks?

~~~~~~~~~~~~~~~

If you are simply reproducing homework problems, please
do us the courtesy of trying to solve them yourself before
asking for help.  If these are real problems of understanding,
then please give us a clue about what you already know and
what you find difficult.
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.

Article: 132517
Subject: Re: FIR in FPGA
From: fazulu deen <fazulu.vlsi@gmail.com>
Date: Thu, 29 May 2008 06:04:36 -0700 (PDT)
Links: << >> << T >> << A >>

hai,

Are you familiar with the concept of "clock enable"?
yes i do..I asked during implementation wat is max clock frequency can
i set for the example pointed out by me...

 Can you please explain what you find  hard or non-obvious about your
question?
I mean structural difference and coefficients support between symmetry
and non-symmetry

Have you considered the effects of numeric overflow  in your filter?
For response checking (during testing)i will consider that...

Are you sure it needs more adders than an equivalent canonical
implementation?  What leads you to believe that?

Have you ever seen the symmetric and non symmetric structure structure
before once u see u will also believe in it..

Do you understand the concept of linearity?  Can you think of
anything that might make your filter non-linear?  (Clue:
my previous question about overflow).  Do you trust your adder,
multiplier and register building blocks?
yes i do..critical path might make it non-linear....

if these are real problems of understanding, then please give us a
clue about what you already know and
what you find difficult.
my problems are mentioned as questions and few comments about the
currents progress to get the answers from the group..

regards,
faz

On May 29, 5:10=A0pm, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> On Thu, 29 May 2008 04:25:50 -0700, fazulu deen wrote:
>
> Let me try to respond with some questions of my own.
>
> >1.Can i set =A0the clock frequency of the FIR filter at any frequency i
> >want... but pretty much higher than the sample rate?
> > =A0for example :Fc=3D3.5khz
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Fs=3D8khz
> > =A0 =A0 =A0 =A0 can i clock as any value >8khz say 1Mhz(considering the =
max
> >clock for target device)
>
> Are you familiar with the concept of "clock enable"?
>
> >2.Whether direct form non-symmetric filter structure can support
> >symmetric coefficients??
>
> I assume you understand the phrases you have used in the
> question. =A0Can you please explain what you find
> hard or non-obvious about your question?
>
> >whether the response computed in non-symmetric
> >structure is same as =A0symmetric filter structure??
>
> Have you considered the effects of numeric overflow
> in your filter?
>
> > =A0I knew resource utlization wise symmetric need more adder at the
> >cost of multpliers..
>
> Are you sure it needs more adders than an equivalent canonical
> implementation? =A0What leads you to believe that?
>
> >3.In addition to impulse test(basic test to check FIR filter
> >operation before implementing to FPGA),step test,sine wave test.What
> >are other test that has to be =A0compulsorily performed in time domain
> >to check the proper working
> >filter operation before giving any arbitary input to the filter??
>
> Do you understand the concept of linearity? =A0Can you think of
> anything that might make your filter non-linear? =A0(Clue:
> my previous question about overflow). =A0Do you trust your
> adder, multiplier and register building blocks?
>
> ~~~~~~~~~~~~~~~
>
> If you are simply reproducing homework problems, please
> do us the courtesy of trying to solve them yourself before
> asking for help. =A0If these are real problems of understanding,
> then please give us a clue about what you already know and
> what you find difficult.
> --
> Jonathan Bromley, Consultant
>
> DOULOS - Developing Design Know-how
> VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services
>
> Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
> jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com
>
> The contents of this message may contain personal views which
> are not the views of Doulos Ltd., unless specifically stated.

Article: 132518
Subject: Re: Are FPGAs headed toward a coarse granularity?
From: David Brown <david@westcontrol.removethisbit.com>
Date: Thu, 29 May 2008 15:55:53 +0200
Links: << >> << T >> << A >>

Kolja Sulimma wrote:
> On 29 Mai, 07:59, Jim Granville <no.s...@designtools.maps.co.nz>
> wrote:
>> rickman wrote:
>>
>> <snip>
>>
>>> So, are coarse grained architectures the way of FPGA... opps FPxA
>>> devices in the near future?  Will the lowly LUT and FF be pushed into
>>> the dark corners of the die in coming years?  I think it is not a
>>> matter of if, just a matter of when and I think the when is soon!
>> The main drive seems to be MHz, as hard IP is always faster than
>> Soft Logic.
> 
> Special purpose hardware is allways faster than general purpose
> hardware, except in the general case ;-)
> 
> Coarses granularity makes the implementation of what you are building
> a lot more efficient, but at the same time it is less likely to match
> the desires of the designer.
> 
> Take the DSP block as an example (lets forget the multiplier for now,
> as this uses an additional advantage: The existence of very clever
> hardware structures for multipliers)
> The muxes and adders use a lot less configuration bits and low level
> muxes as always 18 to 48 elements are configured to implement the same
> functions, and the data lines always run in parallel, can't be
> permuted as they could be in the FPGA fabric. This is a huge gain for
> a 48+18 bit accumulator.
> But, if you need 49 bits you lose a factor of two immediately. (The
> fabric implementation grows only by a 2%, the DSP48 implementation by
> 100%)
> 
> André DeHon analyzed this in in a chapter of his PhD thesis many years
> ago:
> http://www.seas.upenn.edu/~andre/abstracts/dehon_phd.html
> There are graphs showing the efficiency as a function of application
> word length and hardware granularity.
> 
> It should be noted that in FPGAs both delay and area are dominated by
> the routing ressources. Therefore mainly the granularity of the
> routing should be optimized.
> 
> No design has millions of gates of random logic. Large designs are
> dominated by arithmetic function blocks. Therefore it is likely that
> an FPGA with a granularity of 2 for example will have a much better
> efficiency than current FPGAs. For random control logic half of the
> LUTs would remain unused, but for datapathes the utilization would
> approach 100% and the device coud save as much as 75% of the switches
> and configuration bits.
> 
> This is old knowledge for FPGA architecure folks, but there are two
> strong arguments against it:
> 
> 1.)
> It is hard to quantify routing utilization, but the competitors
> marketing will immediately target the lower LUT utilization as a
> disadvantage. (But hey, if a LUT costs 75% less, who cares if I can
> only use 80% of the LUTS? Especially if the clock frequency is
> better?)
> 
> 2)
> Granularity 1 FPGAs make use of the huge knowledge about ASIC EDA
> algorithms. For higher granularities you need to redevelopemost of the
> software toolflow from scratch.
> 
> There is a small FPGA vendor that has high speed global routing with
> 10bit granularity. Maybe this is a start. The area savings are
> marginal, as most of the switches are in local routing, but the speed
> improvement for long connections is significant.
> 
> Kolja Sulimma
> 

Forgive my possible ignorance here (my fairly limited fpga experience is 
only with smaller Cyclones and PLDs, not big devices), but isn't 
"granularity 2" pretty much what the Stratix II, III (and IV, when it's 
available) have in their "adaptive logic module" ?  And as far as I can 
see from the following recent white paper, this is exactly what Altera 
are saying - using the ALM they get much more into a Stratix with 
roughly the same number of logic elements / slices / LUTs / flip-flops 
than into a Virtex.  Obviously all such marketing information must be 
taken with a large handful of salt.

<http://www.altera.com/products/devices/stratix-fpgas/stratix-iii/overview/architecture/performance/st3-opencores.html>

Article: 132519
Subject: Re: FIR in FPGA
From: Jonathan Bromley <jonathan.bromley@MYCOMPANY.com>
Date: Thu, 29 May 2008 15:21:50 +0100
Links: << >> << T >> << A >>

On Thu, 29 May 2008 06:04:36 PDT, fazulu deen wrote:

>Are you familiar with the concept of "clock enable"?
>yes i do..I asked during implementation wat is max clock frequency can
>i set for the example pointed out by me...

FIR filters can (in almost all applications) be pipelined
as deeply as necessary, so the upper limit on clock frequency
is much the same as you would get for any other logic in
the same technology.  All that's needed is to enable all
the FIR's registers for one clock cycle on each sample 
(i.e. at the appropriate sample rate).  Not hard.

I asked...
>>Are you sure it [symmetric FIR structure]
>>needs more adders than an equivalent canonical
>>implementation?  What leads you to believe that?
>
>Have you ever seen the symmetric and non symmetric structure structure
>before once u see u will also believe in it..

I know how to build an N-tap non-symmetric FIR using
N multipliers and N-1 adders.  And I know how to build
a symmetric or antisymmetric N-tap FIR using ceil(N/2)
multipliers and N-1 adders (some are subtractors, if it's
antisymmetric).  So I don't see why you think you need 
extra adders in the symmetric/antisymmetric case.  Do you
know some additional tricks that I don't?

>Do you understand the concept of linearity?  Can you think of
>anything that might make your filter non-linear?  (Clue:
>my previous question about overflow).  Do you trust your adder,
>multiplier and register building blocks?
>yes i do..critical path might make it non-linear....

My point is this: if it's truly linear, and it gives the 
correct impulse response, then it's correct; no further
testing is needed.  However, nonlinearity could easily 
be introduced by...
 - buggy multiplier or adder blocks
 - arithmetic overflow
 - improperly connected input bits that were not exercised
   by the impulse test

>if these are real problems of understanding, then please give us a
>clue about what you already know and
>what you find difficult.

>my problems are mentioned as questions and few comments about the
>currents progress to get the answers from the group..

I'm none the wiser.
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.

Article: 132520
Subject: Re: Are FPGAs headed toward a coarse granularity?
From: Kolja Sulimma <ksulimma@googlemail.com>
Date: Thu, 29 May 2008 08:51:58 -0700 (PDT)
Links: << >> << T >> << A >>

On 29 Mai, 15:55, David Brown <da...@westcontrol.removethisbit.com>
wrote:
> Kolja Sulimma wrote:
> > On 29 Mai, 07:59, Jim Granville <no.s...@designtools.maps.co.nz>
> > wrote:
> >> rickman wrote:
>
> >> <snip>
>
> >>> So, are coarse grained architectures the way of FPGA... opps FPxA
> >>> devices in the near future?  Will the lowly LUT and FF be pushed into
> >>> the dark corners of the die in coming years?  I think it is not a
> >>> matter of if, just a matter of when and I think the when is soon!
> >> The main drive seems to be MHz, as hard IP is always faster than
> >> Soft Logic.
>
> > Special purpose hardware is allways faster than general purpose
> > hardware, except in the general case ;-)
>
> > Coarses granularity makes the implementation of what you are building
> > a lot more efficient, but at the same time it is less likely to match
> > the desires of the designer.
>
> > Take the DSP block as an example (lets forget the multiplier for now,
> > as this uses an additional advantage: The existence of very clever
> > hardware structures for multipliers)
> > The muxes and adders use a lot less configuration bits and low level
> > muxes as always 18 to 48 elements are configured to implement the same
> > functions, and the data lines always run in parallel, can't be
> > permuted as they could be in the FPGA fabric. This is a huge gain for
> > a 48+18 bit accumulator.
> > But, if you need 49 bits you lose a factor of two immediately. (The
> > fabric implementation grows only by a 2%, the DSP48 implementation by
> > 100%)
>
> > Andr=E9 DeHon analyzed this in in a chapter of his PhD thesis many years=

> > ago:
> >http://www.seas.upenn.edu/~andre/abstracts/dehon_phd.html
> > There are graphs showing the efficiency as a function of application
> > word length and hardware granularity.
>
> > It should be noted that in FPGAs both delay and area are dominated by
> > the routing ressources. Therefore mainly the granularity of the
> > routing should be optimized.
>
> > No design has millions of gates of random logic. Large designs are
> > dominated by arithmetic function blocks. Therefore it is likely that
> > an FPGA with a granularity of 2 for example will have a much better
> > efficiency than current FPGAs. For random control logic half of the
> > LUTs would remain unused, but for datapathes the utilization would
> > approach 100% and the device coud save as much as 75% of the switches
> > and configuration bits.
>
> > This is old knowledge for FPGA architecure folks, but there are two
> > strong arguments against it:
>
> > 1.)
> > It is hard to quantify routing utilization, but the competitors
> > marketing will immediately target the lower LUT utilization as a
> > disadvantage. (But hey, if a LUT costs 75% less, who cares if I can
> > only use 80% of the LUTS? Especially if the clock frequency is
> > better?)
>
> > 2)
> > Granularity 1 FPGAs make use of the huge knowledge about ASIC EDA
> > algorithms. For higher granularities you need to redevelopemost of the
> > software toolflow from scratch.
>
> > There is a small FPGA vendor that has high speed global routing with
> > 10bit granularity. Maybe this is a start. The area savings are
> > marginal, as most of the switches are in local routing, but the speed
> > improvement for long connections is significant.
>
> > Kolja Sulimma
>
> Forgive my possible ignorance here (my fairly limited fpga experience is
> only with smaller Cyclones and PLDs, not big devices), but isn't
> "granularity 2" pretty much what the Stratix II, III (and IV, when it's
> available) have in their "adaptive logic module" ?  And as far as I can
> see from the following recent white paper, this is exactly what Altera
> are saying - using the ALM they get much more into a Stratix with
> roughly the same number of logic elements / slices / LUTs / flip-flops
> than into a Virtex.  Obviously all such marketing information must be
> taken with a large handful of salt.

No. What I was saying is, that with granularity two, you get slightly
less logic into the same number of LUTs, at greatly reduced costs.

Alteras (probably correct) claim is, that because they are more
flexible how the inputs to a LUT pair can be routed you can better
utilize the LUTs. This added flexibility probably increases the area
cost for the input routing significantly.
Granularity 2 would mean that a pair of elements (most importantly
routing switches) share a configuration. Each output of a LUT can only
reach half of the inputs that it could reach in a granularity 1 FPGA
(or must take a detour). Useful logic per LUT would go down (Because
soem LUTs can't be used), useful logic per chip area would go up
(Because each LUT with its associated routing ressources would get
less expensive).

Altera is doing the opposite: Paying extra area for added flexibility.
It is achieving two goals by this:
a) It sounds better for marketing, because chip area is kept secret
anyway and sales prices are interpreted creatively. LUT count and
utilization OTOH are easily measured.
b) The device is easier to use because you can accurately estimate
whether your design will fit into the device. This is valueable and
might be worth the price.
I do not know much about Altera, but I know that starting with
Virtex-4 Xilinx decided to spent a lot extra area for routing to make
the delays more predictable. This helps the XST software people and
the users. But another design would have a better cost/performace
ratio.

Kolja Sulimma

Article: 132521
Subject: Re: Xilinx Clock Doubler
From: Peter Alfke <peter@xilinx.com>
Date: Thu, 29 May 2008 09:58:13 -0700 (PDT)
Links: << >> << T >> << A >>

On May 29, 1:23=A0am, Grant Stockly <gr...@stockly.com> wrote:
> I have a 10MHz clock but needed a 20MHz clock speed. =A0I used two
> asynchronous clear flip flops with a series of buffers to add delay to
> the signal.
>
> Is this a bad practice? =A0Will it fail with time or temperature? =A0It
> works fine on a PCB, but I am concerned! =A0It does exactly what I want,
> increment the counter on both rising and falling edges.
>
> http://www.stockly.com/images4/080529-Clock_Doubler.jpg
>
> Above is a link to a picture the Xilinx schematic.
>
> Thanks
>
> Grant

Grant, years ago I published a reliable clock doubler circuit, part of
the "six easy pieces" that seem to be lost.
In words:
Run your 10 MHz clock through a 2-input XOR.
Generate a toggling flip-flop by feeding Q back through an inverting
LUT to the D input.
Route the signal driving D also to the second XOR input.
Use the XOR output to clock the flip-flop, and also use it as your 20
MHz clock.

Disadvantage: If your 10 MHz doesn't have 50/50 duty cycle, your 20
MHz will have frequency modulation.
And the High (or Low depending on XOR or XNOR) time of your 20 MHz
clock will be short but you can lengthen it by adding delay to the Q-
to-D path. Anyhow, it's self-adaptive to the device speed. Use this
trick only when no PLL or DLL is available.
Peter Alfke

Article: 132522
Subject: Re: Are FPGAs headed toward a coarse granularity?
From: Peter Alfke <peter@xilinx.com>
Date: Thu, 29 May 2008 10:28:17 -0700 (PDT)
Links: << >> << T >> << A >>

On May 29, 8:51=A0am, Kolja Sulimma <ksuli...@googlemail.com> wrote:
> On 29 Mai, 15:55, David Brown <da...@westcontrol.removethisbit.com>
> wrote:
>
>
>
> > Kolja Sulimma wrote:
> > > On 29 Mai, 07:59, Jim Granville <no.s...@designtools.maps.co.nz>
> > > wrote:
> > >> rickman wrote:
>
> > >> <snip>
>
> > >>> So, are coarse grained architectures the way of FPGA... opps FPxA
> > >>> devices in the near future? =A0Will the lowly LUT and FF be pushed i=
nto
> > >>> the dark corners of the die in coming years? =A0I think it is not a
> > >>> matter of if, just a matter of when and I think the when is soon!
> > >> The main drive seems to be MHz, as hard IP is always faster than
> > >> Soft Logic.
>
> > > Special purpose hardware is allways faster than general purpose
> > > hardware, except in the general case ;-)
>
> > > Coarses granularity makes the implementation of what you are building
> > > a lot more efficient, but at the same time it is less likely to match
> > > the desires of the designer.
>
> > > Take the DSP block as an example (lets forget the multiplier for now,
> > > as this uses an additional advantage: The existence of very clever
> > > hardware structures for multipliers)
> > > The muxes and adders use a lot less configuration bits and low level
> > > muxes as always 18 to 48 elements are configured to implement the same=

> > > functions, and the data lines always run in parallel, can't be
> > > permuted as they could be in the FPGA fabric. This is a huge gain for
> > > a 48+18 bit accumulator.
> > > But, if you need 49 bits you lose a factor of two immediately. (The
> > > fabric implementation grows only by a 2%, the DSP48 implementation by
> > > 100%)
>
> > > Andr=E9 DeHon analyzed this in in a chapter of his PhD thesis many yea=
rs
> > > ago:
> > >http://www.seas.upenn.edu/~andre/abstracts/dehon_phd.html
> > > There are graphs showing the efficiency as a function of application
> > > word length and hardware granularity.
>
> > > It should be noted that in FPGAs both delay and area are dominated by
> > > the routing ressources. Therefore mainly the granularity of the
> > > routing should be optimized.
>
> > > No design has millions of gates of random logic. Large designs are
> > > dominated by arithmetic function blocks. Therefore it is likely that
> > > an FPGA with a granularity of 2 for example will have a much better
> > > efficiency than current FPGAs. For random control logic half of the
> > > LUTs would remain unused, but for datapathes the utilization would
> > > approach 100% and the device coud save as much as 75% of the switches
> > > and configuration bits.
>
> > > This is old knowledge for FPGA architecure folks, but there are two
> > > strong arguments against it:
>
> > > 1.)
> > > It is hard to quantify routing utilization, but the competitors
> > > marketing will immediately target the lower LUT utilization as a
> > > disadvantage. (But hey, if a LUT costs 75% less, who cares if I can
> > > only use 80% of the LUTS? Especially if the clock frequency is
> > > better?)
>
> > > 2)
> > > Granularity 1 FPGAs make use of the huge knowledge about ASIC EDA
> > > algorithms. For higher granularities you need to redevelopemost of the=

> > > software toolflow from scratch.
>
> > > There is a small FPGA vendor that has high speed global routing with
> > > 10bit granularity. Maybe this is a start. The area savings are
> > > marginal, as most of the switches are in local routing, but the speed
> > > improvement for long connections is significant.
>
> > > Kolja Sulimma
>
> > Forgive my possible ignorance here (my fairly limited fpga experience is=

> > only with smaller Cyclones and PLDs, not big devices), but isn't
> > "granularity 2" pretty much what the Stratix II, III (and IV, when it's
> > available) have in their "adaptive logic module" ? =A0And as far as I ca=
n
> > see from the following recent white paper, this is exactly what Altera
> > are saying - using the ALM they get much more into a Stratix with
> > roughly the same number of logic elements / slices / LUTs / flip-flops
> > than into a Virtex. =A0Obviously all such marketing information must be
> > taken with a large handful of salt.
>
> No. What I was saying is, that with granularity two, you get slightly
> less logic into the same number of LUTs, at greatly reduced costs.
>
> Alteras (probably correct) claim is, that because they are more
> flexible how the inputs to a LUT pair can be routed you can better
> utilize the LUTs. This added flexibility probably increases the area
> cost for the input routing significantly.
> Granularity 2 would mean that a pair of elements (most importantly
> routing switches) share a configuration. Each output of a LUT can only
> reach half of the inputs that it could reach in a granularity 1 FPGA
> (or must take a detour). Useful logic per LUT would go down (Because
> soem LUTs can't be used), useful logic per chip area would go up
> (Because each LUT with its associated routing ressources would get
> less expensive).
>
> Altera is doing the opposite: Paying extra area for added flexibility.
> It is achieving two goals by this:
> a) It sounds better for marketing, because chip area is kept secret
> anyway and sales prices are interpreted creatively. LUT count and
> utilization OTOH are easily measured.
> b) The device is easier to use because you can accurately estimate
> whether your design will fit into the device. This is valueable and
> might be worth the price.
> I do not know much about Altera, but I know that starting with
> Virtex-4 Xilinx decided to spent a lot extra area for routing to make
> the delays more predictable. This helps the XST software people and
> the users. But another design would have a better cost/performace
> ratio.
>
> Kolja Sulimma

Let me add my 2 cents worth here, as a personal opinion (not official
Xilinx position):
In the distant past, each process generation gave us smaller and thus
cheaper die, and higher speed, while leakage current was a non-issue.
=46rom now on, the next process generation will still give us smaller
size, and eventually lower cost, but hardly any raw speed improvement.
And leakage current is the big concern...
Speed improvement will predominantly come from architectural
(granularity) changes. That's why Virtex-5 quadrupled the logic size
of the LUTs (from 16 bits to 64 bits) to pack logic more tightly, and
to reduce routing.
That's also why we added many hard-coded functions, multipliers, ALUs,
FIFOs, SerDes in each I/O, PCIexpress, Ethernet, and multi-gigabit
transceivers in all Virtex-5 LXT/SXT/FXT devices. In the 'FXT
subfamily we also include one or two hard-coded PPC microprocessors
with attached crosspoint and DMA.
So we are increasing efficiency and speed and reducing power not only
in the general-purpose fabric, but more importantly through larger
hard-coded blocks. But we always make sure that our FPGAs remain
general-purpose devices.
The art of engineering is forever a compromise between conflicting
demands...
Peter Alfke

Article: 132523
Subject: Re: using EXP connector of Spartan 3a board
From: bish <bisheshkh@gmail.com>
Date: Thu, 29 May 2008 10:37:51 -0700 (PDT)
Links: << >> << T >> << A >>

On May 28, 4:07 am, Bryan <bryan.fletc...@avnet.com> wrote:
> On May 27, 12:42 am, "MikeWhy" <boat042-nos...@yahoo.com> wrote:
>
>
>
> > "bish" <bishes...@gmail.com> wrote in message
>
> >news:5df586c0-0126-48cc-9ff1-ee382f505221@u6g2000prc.googlegroups.com...
>
> > > We have just bought a new Spartan 3a 1800a dsp board of Xilinx. We
> > > needed i/o pins to control motors and use various sensors and camera.
> > > The board contains EXP expansion slots, ( somewhere I found it is
> > > called QTE connector?).
>
> > > We are confused as how to easily connect our sensors like optical
> > > encoder, camera and output for our motor drivers using these EXP
> > > slots?
> > > Somewhere we found that we need to use QSE connector but we are not
> > > clear about it. We need a low cost solution !!!!! Can we find the
> > > connectors to match with EXP slot at one end and have simple wires at
> > > the other end??
>
> > The S3ADSP starter kit board comes with Samtec QTE connectors. The
> > corresponding connectors are series their QSE. The user guide documents the
> > specific type. Samtec was nice enough to send a couple of the required
> > connectors as samples. These are SMT components; you'll have to build a
> > board to bring out the required signals. They also sell QSE terminated
> > cables, although I doubt these are cheap. They are shielded differential
> > pairs. You might look at daughterboards sold for the S3ADSP3400 kits. I
> > don't have experience with the 3400 board. Its accessories might or might
> > not fit the QTE connectors on the 1800 board.
>
> The Avnet EXP Proto module brings the I/Os out to headers.www.em.avnet.com/exp-prototype

thanx for the link of EXP Proto module.
But it's bit EXPENSIVE for us, having just bought the board. We don't
need complete protoboard. It'd be ok if we can just get the I/O of the
FPGA from the EXP either simply in wires or in headers.

Any

Article: 132524
Subject: Re: Xilinx Clock Doubler
From: Eric Smith <eric@brouhaha.com>
Date: Thu, 29 May 2008 11:52:11 -0700
Links: << >> << T >> << A >>

Peter Alfke wrote:
> Grant, years ago I published a reliable clock doubler circuit, part of
> the "six easy pieces" that seem to be lost.

I repeat my request that the Xilinx marketing and/or web people put all
the old stuff that they unceremoniously removed back into an archive
section of the web or FTP site.

The "six easy pieces" article is exactly the sort of thing that I was
worried would be lost.  :-(

Just because application notes and white papers are old does NOT mean
that they aren't of any use to Xilinx customers.

Eric

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search