Messages from 45450

Article: 45450
Subject: Re: Translate the design from FPGA to Custom IC
From: B__ S_______ <B___S_____@nonoonnoo.com>
Date: Wed, 24 Jul 2002 05:11:39 GMT
Links: << >> << T >> << A >>

> Actually, I work in a IC design company. My boss want to develop a low-end
> DSP chip. However, we are less experience in this.
> We think that one of the important building block is 16X16 small size,
> single cycle multiplier.
> I write simple verilog and synthesis by Xilinx Web pack tools. It seems that
> work.
> Assuming it is work, I want to open some output files to see what "circuit"
> is synthesised, because I will design a DSP chip. But i do not know which
> output files mention the netlist of the "systhesised design" in gate level.

It has been a while since I used Xilinx webpack.  To support gate-level
simulations, webpack can write the synthesized netlist to a standard
Verilog
or VHDL file.  (To run a Verilog simulation, all you need are the 
included Xilinx primitives library and your gate-level netlist.)
   
> I guess that the verilog code will be synthesised by synthesis according to
> synthesis tool's library. Am I correct? Can i force the synthesis tool to
> synthesis the verilog code without using library? (I means the design is
> systhesis in gate level ...AND OR XOR.....)

Webpack will only target Xilinx's FPGA parts, which means it'll always
target some kind of Xilinx primitives library. (That's mostly the LUT4
cell primitive.)  Someone correct me if I'm wrong  If your ASIC-vendor 
truly offers a 'FPGA->ASIC conversion flow', they surely will accept the 
Xilinx netlist 'as is.'  The ASIC-vendor will worry about the logical 
remapping between FPGA-library and ASIC-library.

>  Then, can i see the netlist in  gate level such that I can study the 
> design synthesised by the synsthesis tool?

Why do you care what it looks like?  You can export the netlist to
a structural Verilog (ASCII text) file.  This is usually done if you
want to run the netlist through Verilog simulations.  If you want a
'graphical view' breaking down the netlist into intelligible AND, OR,
NOT
functions, then I don't know.  Someone else needs to answer this
question.

> You say that:
> >To make sure the synthesized design was synthesized correctly,
> >do a gate-level simulation of the synthesized design.
> >You should be able to run the same testbench code you used for an RTL
> simulation.
> 
> I am not really understand because I am a beginner of IC design.
> what is the meaning of gate-level simulation? by what kind of tools?
> Modelsim? Xilinx? or other?

Forgive me if I'm treating you like a novice.  Let's start from the
beginning.  "RTL-simulation" - your source RTL code (that's the Verilog
code you used to synthesize your FPGA-DSP circuit) is instantiated in
a top-level 'testbench file.'  Then you have some waveform 'stimulus',
i.e. you set some inputs on your DSP-model, advance the 'clock'
waveform, then look at the DSP-model's outputs!

A gate-level simulation works the same way.  The difference is the
device under test -- instead of the RTL-source code, here you
instantiate
the synthesized-netlist in your testbench.  Once again, you drive the
netlist's inputs, advance the clock, then check the outputs.  If
the netlist is functionally identical to the RTL-source, then the
outputs
should agree 100%.  If they don't agree, you get to some "fun"
debugging!

The Verilog-simulator isn't part of webpack.  You have to buy that
separately. (I think Modeltech is popular for this kind of thing.)

Article: 45451
Subject: Re: delay pipes in verilog for spartan IIe?
From: "Kevin Neilson" <kevin-neilson@removethistextattbi.com>
Date: Wed, 24 Jul 2002 05:18:14 GMT
Links: << >> << T >> << A >>

I'm not quite sure if that works.  I think if you are writing to port A, the
output on port A isn't the data that was in that address, but rather the
data that you are writing into the A input.  I think it gets routed directly
through.  Check the timing diagrams in the datasheet.  I think the blockRAM
on Virtex2 allows you to circumvent this mode, but not V-E/Spartan-IIe.

"John_H" <johnhandwork@mail.com> wrote in message
news:3D3DFC80.43C0EF93@mail.com...
> I like the BlockRAM idea.  But - to get 32 bit mode - use the same read
and
> write address (cycle through n addresses) and use port A as the upper 16
bits
> and port B as the lower 16 bits.  The output data is the registered
version of
> the memory at that address so there aren't any timing problems.  The
memory can
> look like a 32 wide by 128 deep in this mode.  Only 12 memories needed
which
> would fit in a Spartan-IIE 150!  Lots of extra logic to develop a "pong"
game
> as well.
>
> I tried compiling an array of 384x64 regs in Synplify and I quit after 8
> minutes of compiling.  Synplify usually goes much faster!  The SRL16s with
> arrays of instatiations would probably work much faster in Synplify but I
don't
> know if Webpack supports the "arrays of instances" which - I believe - was
> Verilog 1995, just not supported by many.
>
> The BlockRAM approach would work out so very nice.
>
>
> Kevin Neilson wrote:
>
> > John,
> > You definitely don't want to use flops for this.  (I don't think there
are
> > even enough; you need 24500 flops and that part only has about 6000 I
> > think.)  You could use the SRLs, which are 16-bit shift registers.  You
> > would need about 1530 of these, and the part has about 6000 of these.
> > Better yet would be to use the dual-port blockRAMs.  You could use 24 of
> > these in a 16wide x 256deep configuration.  Then you just set the write
and
> > read addresses 16 apart to get a 16-deep pipe.  If your clock is slow
> > enough, you split up the RAM and use only six RAMs.  This part has about
32
> > blockRAMs I think.
> > -Kevin
> >
> > "John Hovell" <jhovell@yahoo.com> wrote in message
> > news:9402973.0207231559.2030b94a@posting.google.com...
> > > Hello all --
> > >
> > > I am trying to implement a delay pipe that is 384 bits wide and 64
> > > bits long in Verilog.
> > >
> > > I was trying to build one out of fairly simple D-flops, but my design
> > > has been "synthesizing" in Xilinx Web Pack for nearly 2 hours now, so
> > > I think I mus have done something wrong.
> > >
> > > Is there an efficient or correct way to implement such a pipe on a
> > > 300K gate Spartan IIe?  I think the size should be OK since a Spartan
> > > IIe 300K gate could theoretically make a 98kbit distributed memory...
> > > but maybe I am missing something here.
> > >
> > > TiA for any help, pointers, etc.
> > >
> > > Cheers,
> > > John
>

Article: 45452
Subject: Re: How to implement efficient wide word comparator?
From: John_H <johnhandwork@mail.com>
Date: Tue, 23 Jul 2002 22:41:20 -0700
Links: << >> << T >> << A >>

You're trying to @(posedge clk) increment the counter and provide a
comparison value on... the new value?  The old value?  In the telecom
stuff I worked with, there were typically frame counters to track the
bytes and provide gates for various operations.  If you only need one
gate signal, things are too simple.  If you need a separate gate for
each of 20 bit positions, it's a little tougher but your speeds should
be extreme with a little care.

If you're doing an equality compare for each gate, there are two ways to
do it, with a tree or a carry chain.  I'll be playing with my first
Virtex-II in a week or two but I've heard the carry chains aren't as
effective as they were in the Virtex-E parts but they should still
provide excellent results.

A 14 bit *constant* equality compare in a tree would require 3.5 LUTs
for the first level of comparison and another LUT to assemble all those
together.  Since there are 4 slices (8 LUTs) in one Virtex-II CLB, this
should scream!  If it's a variable equality compare, the 7 LUTs feeding
2 LUTs feeding 1 final LUT isn't as clean but you should still get great
speed.  One of the key factors is that the *registered* count value
needs to be compared to a constant or a *registered* comparison value.

The carry chain is probably better for a 14 bit equality compare since
the 7 LUTs can cascade into one carry chain.  If you want to do a 98 bit
equality compare, you could assemble the 7 bit carry chains into a
series of (horizontal) cascade ORs (if that's what they're called - I
won't look it up now).

The point is, things should scream in either format compared to the
speeds you're getting.

Check out your logic and routing delays to see how your timing goes from
source register to destination.  Ask yourself if some of the stages can
be pipelined.  One of the beautiful things about counters is that they
increment predictably!  (Unless they decrement)

You could assemble a huge comparison tree and register each level to
attain outrageous pipelined speeds.  Look at your requirements and
figure out what you can back into a previous pipeline stage.  Very good
things should come together with nice design work.

An example of a counter with a single compare output (apologies if
you're VHDL):

always @(posedge clk)
  if( count == max_count ) count <= 0 + ena;
  else                     count <= count + ena;
assign out_gate = (count == max_count);

The structure above isn't very efficient because a wide compare is
needed in the logic while it isn't needed in the design.  The logic may
not synthesize into a simple counter, either, requiring two stages of
logic for the counter to add to the compare.

You could use a registered compare of

  out_gate <= (count == max_count - 1) & ena;

which (in the always block) has the gate go active when you want it.

But you could do better by resetting your counter with a different
value:

always @(posedge clock)
  if( out_gate )  {out_gate,count} <= {1'b0,-max_count} + ena;
  else            {out_gate,count} <= count + ena;

Note that the gate is now synchronous and there is NO compare required. 
(Apologies that things look a little strange... the constant "max_count"
should be dimensioned the same as the "count" vector so the out_gate
initializes properly false)

The structure can be made "synthesis friendly" to use one level of
synthesized logic (if it doesn't already) by using an equation that's
more friendly to the Xilinx carry chain configuration:

always @(posedge clock)
  {out_gate,count} <= (out_gate ? {1'b0,-max_count} : count) + ena;

The conditional operator works in place of the if/else construct and
"fits" in the carry structure.

Many things to do.  Happy coding!

- John_H

Sniper Daryl wrote:
> 
> Here,
> 
>    I am Daryl and I have to trouble you. :-)
> 
>    When I design a chip used for optical network, a lot of effort must
> be made to increase the clock speed and reduce the chip resource cost.
> In a timing interface module, there is a counter with 14-bit width to
> provide timing to the outgoing frame. So, a comparator used to compare
> the counter word with a series of registers set by the controller.
> I've notice that the slices cost increases seriously and the maxinum
> clock speed decreases a lot, when the counter and the comparator get
> wider.
> 
>    Troubled with it, I firstly tried a wider counter(14-bit) and a
> narrower comparator(4-bit) and got 20MHz upgrade of speed and more
> than 20 slices saving. Then, a 4-bit counter and 14-bit comparator
> with a result of 10MHz upgrade and about 10 slices saving. So, I think
> the critical factor is the wide comparator. This is proved by studying
> the report and schematics from the synthesis tools(FCII3.6.1 and
> Synplify Pro with Amplify).
> 
>    To improved the performance, I've tried to use CoreGen tool to
> generate a core of comparator. But,after implement, the result is no
> better than from myselft code.
> 
>    The synthesis tool I used is FCII 3.6.1, the device is
> VirtextII1000, implement by ISE4.2SP3. Here is the result of my trials
> :
> 
>       14-bit counter,  14-bit comparator and other logic :      63
> slices used(36 FFs and 105 LUTs);     95MHz
> 
>       4-bit counter,    14-bit comparator and other logic :      50
> slices used(26 FFs and 85 LUTs);      115MHz
> 
>       14-bit counter,    4-bit comparator and other logic :      41
> slices used(26 FFs and 62 LUTs);       127MHz
> 
>    Would you give me some advice about it from your experience? Or
> some resource to study?
> 
> 
> 
> Thanks in advance for you time!
> 
> Daryl

Article: 45453
Subject: Re: delay pipes in verilog for spartan IIe?
From: "John Hovell" <jhovell@yahoo.com>
Date: Tue, 23 Jul 2002 23:20:25 -0700
Links: << >> << T >> << A >>

Kevin --

Thanks very much for the suggestion!  SRL's sound like a great bet.

As many others have very helpfully pointed out, for a simple delay pipe
Block RAM's are an even better choice, but I am already using all 16 block
RAM's for another part of my design :-(... and my delay pipe isn't quite a
simple delay pipe -- specifically I need to read a few values in the middle.

I'm hunting around right now for the instatiation syntax for an SRL's on the
net... Is there a primative that I can call so one of these is inferred?  It
seems the PRNG (Xilinx app 211) just uses some fancy compiler ifdef
statements to get the right piece of hardware inferred.  I'm sure I can find
this info on the 'net so I don't want to bother anyone with simple
questions... however if someone feels compelled to clue me in, I certainly
won't mind ;-).

The only reason SRL's might not work is that I need to read *some* values in
the delay pipe (i.e. the first column and a sort of diagonal row through the
first half of it:  Total bits: 384*2 = 768).  Hopefully I can read values
that are in the shift registers....

Thanks everyone for your help!

Cheers,
John


"Kevin Neilson" <kevin-neilson@removethistextattbi.com> wrote in message
news:sPm%8.142580$uw.86229@rwcrnsc51.ops.asp.att.net...
> John,
> You definitely don't want to use flops for this.  (I don't think there are
> even enough; you need 24500 flops and that part only has about 6000 I
> think.)  You could use the SRLs, which are 16-bit shift registers.  You
> would need about 1530 of these, and the part has about 6000 of these.
> Better yet would be to use the dual-port blockRAMs.  You could use 24 of
> these in a 16wide x 256deep configuration.  Then you just set the write
and
> read addresses 16 apart to get a 16-deep pipe.  If your clock is slow
> enough, you split up the RAM and use only six RAMs.  This part has about
32
> blockRAMs I think.
> -Kevin
>
> "John Hovell" <jhovell@yahoo.com> wrote in message
> news:9402973.0207231559.2030b94a@posting.google.com...
> > Hello all --
> >
> > I am trying to implement a delay pipe that is 384 bits wide and 64
> > bits long in Verilog.
> >
> > I was trying to build one out of fairly simple D-flops, but my design
> > has been "synthesizing" in Xilinx Web Pack for nearly 2 hours now, so
> > I think I mus have done something wrong.
> >
> > Is there an efficient or correct way to implement such a pipe on a
> > 300K gate Spartan IIe?  I think the size should be OK since a Spartan
> > IIe 300K gate could theoretically make a 98kbit distributed memory...
> > but maybe I am missing something here.
> >
> > TiA for any help, pointers, etc.
> >
> > Cheers,
> > John
>
>

Article: 45454
Subject: Re: How to implement efficient wide word comparator?
From: "Giuseppe³" <gziggio.pleasedontsendmeanything@tin.it>
Date: Wed, 24 Jul 2002 08:33:52 +0200
Links: << >> << T >> << A >>

I don't sure to understand well your problem, but to divide a clock source
you can try to use the LUT ram.
You have to set the LUT as shift ram and use it as the follow:

U_SRL16_1 : SRL16
        -- synopsys translate_off
        generic map (INIT => X"0001")
        -- synopsys translate_on
        port map (
            D   => srl_out,
            CLK => CLK25M,
            A0  => one,                   -- 16 division
            A1  => one,
            A2  => one,
            A3  => one,
            Q   => srl_out);

To increase the clock division you can use more LUT in sequence.

This use only one slice.

Hope to be usefull

Regards
Giuseppe


"Sniper Daryl" <e-engineer@eastday.com> ha scritto nel messaggio
news:289dc5a9.0207232014.5ca2f487@posting.google.com...

> In a timing interface module, there is a counter with 14-bit width to
> provide timing to the outgoing frame. So, a comparator used to compare
> the counter word with a series of registers set by the controller.
> I've notice that the slices cost increases seriously and the maxinum
> clock speed decreases a lot, when the counter and the comparator get
> wider.
>
<cut>

Article: 45455
Subject: Re: Translate the design from FPGA to Custom IC
From: "Reala" <manfield.chow@scoreconcept.com>
Date: Wed, 24 Jul 2002 14:42:49 +0800
Links: << >> << T >> << A >>

Dear B S,

Thank you for your help.
You answer is very details. It make me learn a lot about ASIC design.

In your post, you say:
> Webpack will only target Xilinx's FPGA parts, which means it'll always
> target some kind of Xilinx primitives library. (That's mostly the LUT4
> cell primitive.)  Someone correct me if I'm wrong  If your ASIC-vendor
> truly offers a 'FPGA->ASIC conversion flow', they surely will accept the
> Xilinx netlist 'as is.'  The ASIC-vendor will worry about the logical
> remapping between FPGA-library and ASIC-library.

If the netlish including LUT4, then, how to change this into "circuit" when
I implement this in ASIC? Is it done by ASIC-vender? You says that
ASIC-vender will worry remapping? So, what is the normally development flow
for ASIC starting at Verilog?
As I am a beginning of IC design, I am not sure about this.

Thanks again ^_^

Reala

"B__ S_______" <B___S_____@nonoonnoo.com> wrote in message
news:3D3E3756.20AA602B@nonoonnoo.com...
> > Actually, I work in a IC design company. My boss want to develop a
low-end
> > DSP chip. However, we are less experience in this.
> > We think that one of the important building block is 16X16 small size,
> > single cycle multiplier.
> > I write simple verilog and synthesis by Xilinx Web pack tools. It seems
that
> > work.
> > Assuming it is work, I want to open some output files to see what
"circuit"
> > is synthesised, because I will design a DSP chip. But i do not know
which
> > output files mention the netlist of the "systhesised design" in gate
level.
>
> It has been a while since I used Xilinx webpack.  To support gate-level
> simulations, webpack can write the synthesized netlist to a standard
> Verilog
> or VHDL file.  (To run a Verilog simulation, all you need are the
> included Xilinx primitives library and your gate-level netlist.)
>
> > I guess that the verilog code will be synthesised by synthesis according
to
> > synthesis tool's library. Am I correct? Can i force the synthesis tool
to
> > synthesis the verilog code without using library? (I means the design is
> > systhesis in gate level ...AND OR XOR.....)
>
> Webpack will only target Xilinx's FPGA parts, which means it'll always
> target some kind of Xilinx primitives library. (That's mostly the LUT4
> cell primitive.)  Someone correct me if I'm wrong  If your ASIC-vendor
> truly offers a 'FPGA->ASIC conversion flow', they surely will accept the
> Xilinx netlist 'as is.'  The ASIC-vendor will worry about the logical
> remapping between FPGA-library and ASIC-library.
>
> >  Then, can i see the netlist in  gate level such that I can study the
> > design synthesised by the synsthesis tool?
>
> Why do you care what it looks like?  You can export the netlist to
> a structural Verilog (ASCII text) file.  This is usually done if you
> want to run the netlist through Verilog simulations.  If you want a
> 'graphical view' breaking down the netlist into intelligible AND, OR,
> NOT
> functions, then I don't know.  Someone else needs to answer this
> question.
>
> > You say that:
> > >To make sure the synthesized design was synthesized correctly,
> > >do a gate-level simulation of the synthesized design.
> > >You should be able to run the same testbench code you used for an RTL
> > simulation.
> >
> > I am not really understand because I am a beginner of IC design.
> > what is the meaning of gate-level simulation? by what kind of tools?
> > Modelsim? Xilinx? or other?
>
> Forgive me if I'm treating you like a novice.  Let's start from the
> beginning.  "RTL-simulation" - your source RTL code (that's the Verilog
> code you used to synthesize your FPGA-DSP circuit) is instantiated in
> a top-level 'testbench file.'  Then you have some waveform 'stimulus',
> i.e. you set some inputs on your DSP-model, advance the 'clock'
> waveform, then look at the DSP-model's outputs!
>
> A gate-level simulation works the same way.  The difference is the
> device under test -- instead of the RTL-source code, here you
> instantiate
> the synthesized-netlist in your testbench.  Once again, you drive the
> netlist's inputs, advance the clock, then check the outputs.  If
> the netlist is functionally identical to the RTL-source, then the
> outputs
> should agree 100%.  If they don't agree, you get to some "fun"
> debugging!
>
> The Verilog-simulator isn't part of webpack.  You have to buy that
> separately. (I think Modeltech is popular for this kind of thing.)

Article: 45456
Subject: Field Programmable SoC's
From: Michael Cozza <Michael.Cozza@verizon.net>
Date: Wed, 24 Jul 2002 06:46:04 GMT
Links: << >> << T >> << A >>

I'm looking at the possibility of doing a new design with a programmable (or "configurable") System-on-chip (SoC) device, which is basically a CPU, memory, FPGA and some peripheral devices on a single chip.

(If this is the wrong newsgroup for this question, I apologize and ask for direction.)

So far, I've located the Cypress PSoC, Atmel FPSLIC and Triscend E5 - all with 8-bit MCU's on board.

Can anyone who has used any of these chips comment on their performance, reliability, ease of use, etc.?  How about the development systems?

 From what I've seen so far, I like the Atmel parts but I'm leaning toward the Cypress because of lower development system cost.

Thanks,
Mike

Article: 45457
Subject: Field Programmable SoC's
From: jaideep@sasken.com (jaideep)
Date: 23 Jul 2002 23:56:31 -0700
Links: << >> << T >> << A >>

This is a reply to Michael Cozza posting.

Try STRATIX series of programmable SOCs from Altera.
Regards.
Jaideep

Article: 45458
Subject: Re: 32-bit PCI Target core
From: "BROTO Laurent" <lbroto@free.fr>
Date: Wed, 24 Jul 2002 09:08:07 +0200
Links: << >> << T >> << A >>

Try http://www.opencores.org/projects/pci/ .
You'll found here an open source IP core PCI.
I succed to compile it for a Spartan2 2S200 and a Spartan 2 2S150.

Regards,

BROTO Laurent

"Jeff Reeve" <teamreeve@attbi.com> a écrit dans le message news:
_jq%8.642847$cQ3.104066@sccrnsc01...
> I'm looking for a synthesizeable 32-bit 33MHz PCI Target only design to be
> placed into a FPGA or large CPLD. Minimal implementation is fine. Does
> anybody know if such a thing is available in VHDL or Verilog and is open
> sourced? I seem to recall Xilinx publishing a target only design quite
some
> time ago but I can no longer find it on their web site.
>
> Any help is much apprecieated!
> Jeff
>
>

Article: 45459
Subject: Re: RLOC Origin problems in ISE4.2sp3?
From: Stephan Neuhold <stephan.neuhold@xilinx.com>
Date: Wed, 24 Jul 2002 08:37:23 +0100
Links: << >> << T >> << A >>

This is a multi-part message in MIME format.
--------------51E2AFBB507CD441856E0905
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi Ray,

I have just tested the RLOC_ORIGIN attribute in 4.2.03i and it works for
me. I was using XST and Synplify. I agree I just put the attribute onto
FFs but it worked. I know that the RLOC_RANGE attribute only gets picked
up via the ucf file but the RLOC_ORIGIN should work.

Regards,
Stephan

Ray Andraka wrote:

> Fellow experts:
>
> I'm hoping someone has fallen on this one and found a
> workable solution.
> So far the Xilinx hotline has been unusually unresponsive in
> handling this
> case (they've had it since 7/12 and haven't even
> acknowledged whether
> or not they see the problem with the test case I sent).
> Seems the hotline
> is not as responsive and helpful as it once was, which is a
> shame.
> Anyway, this is the case:
>
> RLOC_ORIGIN being ignored by placer. The macro shows up in
> the
> correct position in the floorplanner but with the G and Y
> elements
> missing (that is a separate issue which is still an open
> case). The RLOC
> origin is being ignored by the place and route, so the macro
> is not landing
> in the specified position. It is critical I be able to
> specify the RLOC
> origins in this design (2V6000, 200 MHz, high utilization.
> Is this a known
> problem (I don't see it in the answers data base). I tried
> adjusting the
> RLOC origin by single slice steps as suggested in answer
> record 12192
> to no avail.
>
> The RPMs are not created under FPGA editor, rather they are
> RLOC'd in the
> VHDL source. That gets me relative locations for each BEL in
> the design
> which in turn give me a macro that I should be able to place
> or let the
> tools place. I am trying to put an RLOC_ORIGIN on it by
> adding an
> RLOC_ORIGIN attribute to the UCF file. The syntax is
> correct, and indeed
> the part of the RPM that is not decimated by the
> floorplanner shows up in
> the floorplanner editable window in the correct position
> (there is a
> previously reported bug in the floorplanner that prevents
> the RPM from
> showing correctly unless you first go through auto PAR, then
> constrain from
> placement, then unbind and bind the RPM). However, when the
> design is run
> through the PAR, the macro is placed at some location other
> than that
> indicated by the RLOC_ORIGIN.
>
> Thanks in advance for any info you may have seen on this
> problem.
>
> --
> --Ray Andraka, P.E.
> President, the Andraka Consulting Group, Inc.
> 401/884-7930     Fax 401/884-7950
> email ray@andraka.com
> http://www.andraka.com
>
>  "They that give up essential liberty to obtain a little
>   temporary safety deserve neither liberty nor safety."
>                                           -Benjamin
> Franklin, 1759

Article: 45460
Subject: Re: Field Programmable SoC's
From: Uwe Bonnes <bon@elektron.ikp.physik.tu-darmstadt.de>
Date: Wed, 24 Jul 2002 08:05:33 +0000 (UTC)
Links: << >> << T >> << A >>

Michael Cozza <Michael.Cozza@verizon.net> wrote:
: I'm looking at the possibility of doing a new design with a programmable
: (or "configurable") System-on-chip (SoC) device, which is basically a CPU,
: memory, FPGA and some peripheral devices on a single chip. 
  
: (If this is the wrong newsgroup for this question, I apologize and ask for
:direction.) 

: So far, I've located the Cypress PSoC, Atmel FPSLIC and Triscend E5 - all
: with 8-bit MCU's on board. 

: Can anyone who has used any of these chips comment on their performance,
: reliability, ease of use, etc.?  How about the development systems? 

:  From what I've seen so far, I like the Atmel parts but I'm leaning toward
: the Cypress because of lower development system cost. 

There are several synthezisable CPU core available from the different
manufacturers. If you can use them depends on your processing power needs.

Bye
-- 
Uwe Bonnes                bon@elektron.ikp.physik.tu-darmstadt.de

Institut fuer Kernphysik  Schlossgartenstrasse 9  64289 Darmstadt
--------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------

Article: 45461
Subject: Re: Editing constraints in WebPack
From: Walter Dvorak <walter.dvorak@remove.gmx.at>
Date: 24 Jul 2002 10:55:23 +0200
Links: << >> << T >> << A >>

"Børge Strand" <borge.strand.remove.if.not.spamming@sintef.no> wrote:
> errors when parsing the .ucf-file. What is the recommended way to maintain
> constraints?

	Use your favorite text editor and edit the .ucf file directly. 

	(i.e. UltraEdit/Win32 is a quite fine thing)

WD
--

Article: 45462
Subject: Re: Field Programmable SoC's
From: Jim Granville <jim.granville@designtools.co.nz>
Date: Wed, 24 Jul 2002 22:45:46 +1200
Links: << >> << T >> << A >>

Michael Cozza wrote:
> 
> I'm looking at the possibility of doing a new design with a programmable (or "configurable") System-on-chip (SoC) device, which is basically a CPU, memory, FPGA and some peripheral devices on a single chip.
> 
> (If this is the wrong newsgroup for this question, I apologize and ask for direction.)
> 
> So far, I've located the Cypress PSoC, Atmel FPSLIC and Triscend E5 - all with 8-bit MCU's on board.
> 
> Can anyone who has used any of these chips comment on their performance, reliability, ease of use, etc.?  How about the development systems?
> 
>  From what I've seen so far, I like the Atmel parts but I'm leaning toward the Cypress because of lower development system cost.
> 

You should give more info on what you want the SoC device to do, both
core
and peripherals.

Cypress has the lowest 'configurable' quotient, but it does have Analog
capability that
the others lack, and is more a Flash-uC with Quasi-Smart peripherals.
If that level of peripheral is enough for your task, this will be the
cheapest solution.
Remember this family has only ONE variant per package, so don't over-run
the 
code budget.

The E5 and FPSlic are FPGA with RAM based uC alongside. ( no analog )
Both need a Boot loader, so are actually two chip solutions [So2C] 
- as such, you should include a Std FLASH uC + CPLD on the PSoC map.
This two chip pathway offers multi sourcing, and much more chance to
better 
fit the uC and Logic to the task at hand, and the consequence of missing
the 
code budget are not so drop-dead.

- jg

Article: 45463
Subject: Re: Clock-gating in Virtex-E parts
From: ed.moore@snellwilcox.com (Edward Moore)
Date: 24 Jul 2002 03:50:09 -0700
Links: << >> << T >> << A >>

You can gate the clock using the enable pin on the Virtex-E global clock buffer.

You would need to create a hard macro in Fpga Editor containing a
GCLKBUF with the I, O and CE i/o's as external pins. I suggest driving
the CE pin from a FF clocked off the falling edge of the ungated clock.

Beware : there seems to be a bug with the GCLKBUF primitive due to swapped
configuration bits; the workaround is to set the CEMUX option to '1' not CE.

I have tried this. It works.

--
Edward Moore


Jason Crawford <jace@cisco.com> wrote in message news:<3D3BAC69.A03017A@cisco.com>...
> Hi,
> 
> Apart from using clock-enables, does anyone know of any
> way to use clock-gating in Virtex-E parts? 
> 
> We have a design that is partially written for an ASIC 
> target and expects to see a gated clock. Rather than have 
> to get the designers to pour throught the code and add
> clock enables to all flip flops (I can hear teeth gnashing
> already) I am hoping against hope that someone has an 
> alternate answer to this rather difficult problem.
> 
> yours in hope,
> Jason.

Article: 45464
Subject: Re: How's the FPGA design job market near you??
From: Ray Andraka <ray@andraka.com>
Date: Wed, 24 Jul 2002 12:22:14 GMT
Links: << >> << T >> << A >>

The key to consulting is  to offer more value than the cost of using your
services.  Generally, that means developing an expertise that is difficult to
or too expensive to cultivate in-house.  A good consultant can offer a
breadth of experience that just isn't available anywhere else.


B__ S_______ wrote:

> >My impression working in Southern California (so I have a small view of
> the EE world!) is US companies prefer to keep project development
> in-house.  If an outside vendor already has a finished, working
> IP-block,
> the company will consider buying it.  In essence trading money for time
> (the idea being they can 'buy' the IP block and just drop it in their
> current project.)
>
> Otherwise, the company will have to invest money AND time(since the
> contracter's development-cycle would be non 0-day.)  In that case,
> for a little additional expenditure, the company can keep the
> development expertise in-house, so why pay someone else to learn
> your project?
>
> I know there are some success stories when it comes to design
> services companies.  But given the size of the fabless semiconductor
> industry, wouldn't you expect there to be *more* design services
> consulting?

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 45465
Subject: Re: RLOC Origin problems in ISE4.2sp3?
From: Ray Andraka <ray@andraka.com>
Date: Wed, 24 Jul 2002 14:39:34 GMT
Links: << >> << T >> << A >>

I tried the RLOC origin on a smaller macro in the design, and it works fine
in that case.  The failing macro has tiles which come from a different edif
netlist.  It is a fairly large macro as well (24x80 slices).  The size
and/or the fact that its netlist comes from nested edif files may be
contributing factors.  The odd thing is that the RLOC_ORIGIN is recognized
in the editable view of the floorplanner (which pulls it's info from the ucf
and ngd files), but is being ignored in place and route.

Stephan Neuhold wrote:

> Hi Ray,
>
> I have just tested the RLOC_ORIGIN attribute in 4.2.03i and it works for
> me. I was using XST and Synplify. I agree I just put the attribute onto
> FFs but it worked. I know that the RLOC_RANGE attribute only gets picked
> up via the ucf file but the RLOC_ORIGIN should work.
>
> Regards,
> Stephan
>
> Ray Andraka wrote:
>
> > Fellow experts:
> >
> > I'm hoping someone has fallen on this one and found a
> > workable solution.
> > So far the Xilinx hotline has been unusually unresponsive in
> > handling this
> > case (they've had it since 7/12 and haven't even
> > acknowledged whether
> > or not they see the problem with the test case I sent).
> > Seems the hotline
> > is not as responsive and helpful as it once was, which is a
> > shame.
> > Anyway, this is the case:
> >
> > RLOC_ORIGIN being ignored by placer. The macro shows up in
> > the
> > correct position in the floorplanner but with the G and Y
> > elements
> > missing (that is a separate issue which is still an open
> > case). The RLOC
> > origin is being ignored by the place and route, so the macro
> > is not landing
> > in the specified position. It is critical I be able to
> > specify the RLOC
> > origins in this design (2V6000, 200 MHz, high utilization.
> > Is this a known
> > problem (I don't see it in the answers data base). I tried
> > adjusting the
> > RLOC origin by single slice steps as suggested in answer
> > record 12192
> > to no avail.
> >
> > The RPMs are not created under FPGA editor, rather they are
> > RLOC'd in the
> > VHDL source. That gets me relative locations for each BEL in
> > the design
> > which in turn give me a macro that I should be able to place
> > or let the
> > tools place. I am trying to put an RLOC_ORIGIN on it by
> > adding an
> > RLOC_ORIGIN attribute to the UCF file. The syntax is
> > correct, and indeed
> > the part of the RPM that is not decimated by the
> > floorplanner shows up in
> > the floorplanner editable window in the correct position
> > (there is a
> > previously reported bug in the floorplanner that prevents
> > the RPM from
> > showing correctly unless you first go through auto PAR, then
> > constrain from
> > placement, then unbind and bind the RPM). However, when the
> > design is run
> > through the PAR, the macro is placed at some location other
> > than that
> > indicated by the RLOC_ORIGIN.
> >
> > Thanks in advance for any info you may have seen on this
> > problem.
> >

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 45466
Subject: Re: Xilinx NGDBuild -sd option in Project Navigator?
From: "D Brown" <dbrown123@shaw.ca>
Date: Wed, 24 Jul 2002 09:29:30 -0600
Links: << >> << T >> << A >>

I found it. It's under Implement Design->Properties, then on the Translate
Properties Tab, it's called Macro Search Path. Hmm, I wish the Project
Navigator documentation correlated better with the command line interface
documents. Oh well.
Dave
"D Brown" <dbrown123@shaw.ca> wrote in message
news:ahhr1c$dfn$1@pallas.novatel.ca...
> Is there any way to specify the -sd (search directory) option for NGDBuild
> from the Project Navigator for Xilinx 4.2i software? Or am I limited to
only
> being able to do this at the command line?
> Thanks,
> Dave
>
>
>

Article: 45467
Subject: Re: delay pipes in verilog for spartan IIe?
From: John_H <johnhandwork@mail.com>
Date: Wed, 24 Jul 2002 16:26:45 GMT
Links: << >> << T >> << A >>

Thanks, Kevin, I stand corrected.  Identical read and write addresses pass the
input data to the output according to the libraries guide,

   http://toolbox.xilinx.com/docsan/xilinx4/data/docs/lib/dsgnelpr30.html

Since the Spartan-IIE 300 only has 16 block rams,

  http://www.xilinx.com/partinfo/ds077_1.pdf

the way to fully implement the pipeline would be to use bunches and bunches of
SRL16 elements.  (Half could be SRLs and half be memory but this gets weird)

I'd recommend putting together a small submodule that's 4 cascaded SRL elements
for the 64 bit delay (or make it less to account for input/output registers, for
instance) and instantiate that module 384 times.  The arrays of instances are
wonderful to do this;  if you don't have that capability, another module
instantiating 16 of those modules could, itself, be instantiated 24 times for a
total of 384 delay elements.

- John_H

Kevin Neilson wrote:

> I'm not quite sure if that works.  I think if you are writing to port A, the
> output on port A isn't the data that was in that address, but rather the
> data that you are writing into the A input.  I think it gets routed directly
> through.  Check the timing diagrams in the datasheet.  I think the blockRAM
> on Virtex2 allows you to circumvent this mode, but not V-E/Spartan-IIe.

Article: 45468
Subject: Re: delay pipes in verilog for spartan IIe?
From: John_H <johnhandwork@mail.com>
Date: Wed, 24 Jul 2002 16:35:58 GMT
Links: << >> << T >> << A >>

Go to http://toolbox.xilinx.com/docsan/xilinx4/manuals.htm and look at the
"Libraries Guide" which has all the primitives listed.  You can use an SRL16
with or without enable.

Of you need to tap off a fixed diagonal, the task is pretty straightrorward but
the coding (3 levels of module hierarchy) isn't as clean.  Do you know about
parameterized modules?

You can have a chain of 4 SRL16s to get your 64 bit delay.  To "tap" an element
in the middle at a fixed address, you can either daisy-chain two shorter SRL16s
together (you can program them for delays of 1 to 16, inclusive) or you can tap
off the feed between the 16 long delays and feed an SRL16 in parallel with the
fixed chain.  You can hard code the address or select which of those 16 taps you
want dynamically if necessary (but the bits that make the selection could have a
HUGE fanout!  384 bits?!).

John Hovell wrote:
<excerpt>

> I'm hunting around right now for the instatiation syntax for an SRL's on the
> net... Is there a primative that I can call so one of these is inferred?  It
> seems the PRNG (Xilinx app 211) just uses some fancy compiler ifdef
> statements to get the right piece of hardware inferred.  I'm sure I can find
> this info on the 'net so I don't want to bother anyone with simple
> questions... however if someone feels compelled to clue me in, I certainly
> won't mind ;-).
>
> The only reason SRL's might not work is that I need to read *some* values in
> the delay pipe (i.e. the first column and a sort of diagonal row through the
> first half of it:  Total bits: 384*2 = 768).  Hopefully I can read values
> that are in the shift registers....

Article: 45469
Subject: FPGA prototyping boards
From: prashantj@usa.net (Prashant)
Date: 24 Jul 2002 10:54:22 -0700
Links: << >> << T >> << A >>

Hi,
I'm looking to prototype my design on an evaluation board. But I have
some questions.
Hardware requirements : One Apex20KE 1500E or an equivalent Xilinx
device

1. What evaluation or prototyping boards would the experienced
recommend ?

2. I was looking at Altera's DSP development board. How would you
write the output from the FPGA into a file on a PC, for this or any
other board ?

3. Is it possible to give inputs real time from a PC to an FPGA on a
dev board ? I wouldn't think so, but let me know if the technology
allows.

Thanks,
Prashant

PS : Prototype board companies feel free to contact me at the email
address : pjain@tensorcomm.com

Article: 45470
Subject: Wind River Diab Xilinx Edition
From: big_kap@yahoo.com (Kaplan)
Date: 24 Jul 2002 12:26:38 -0700
Links: << >> << T >> << A >>

What exactly does the Wind River Diab XE compile down to? Does it
target the Virtex II-Pro core PowerPC, or does it actually manipulate
the FPGA fabric?
Any help understanding this will be most appreciated.

Thanks!

Article: 45471
Subject: Re: I want to buy 4 Xilinx FPGA
From: Kevin Brace <killspam4kevinbraceusenet@killspam4hotmail.com>
Date: Wed, 24 Jul 2002 14:39:28 -0500
Links: << >> << T >> << A >>



Erik wrote:
> 
> Hi Kevin,
> 
> 
> I look at first in the Table, from the Chip-description-pdf,
>  "Table 2: Performance for Common Circuit Functions" .
> In this Table you can found some differences betwen the FPGA-Families.
> 
> I hope this tables are korrect.
> 


        Where did you download such a PDF file?



> 
> Why not? I will check the real Voltage-levels on the Signallines with an
> oscilloscope and if i see the highest level is 3.3V i can use the Virtex-E.
> 
> Bye
> Erik


        It's up to you if you want to risk a Virtex-E.
I will rather live within the specification than risking a PCI card with
Virtex-E (Worth at least several hundred dollars.).


Kevin Brace (In general, don't respond to me directly, and respond
within the newsgroup.)

Article: 45472
Subject: Re: delay pipes in verilog for spartan IIe?
From: Ray Andraka <ray@andraka.com>
Date: Wed, 24 Jul 2002 19:49:38 GMT
Links: << >> << T >> << A >>

My 2 cents worth on this thread:

First, some tools such as synplify will infer the SRL16's, and will even put a
register at the output, which improves the clock to Q considerably.  What it does
not do well is adding a register between each SRL16 in the chain.  Personally, I
prefer to instantiate them using the SRL16 primitive in the unisim library.  That
way I can put the registers between where they belong, and If I want I can add RLOCs
as well as non-zero initial values.  Set the timingcheckson generic to false (it
defaults to true) to avoid problems in functional simulation, and put the generics
inside a syn_translate pragma to avoid possible problems with inference as a black
box.  Only the output of a register or the flip-flop following it (if you put them
in there, which I advise) are visible.  If you need to get somewhere in between,
then you'll need to adjust your delays to get to the tap you desire.  VirtexII has a
nice feature that adds an always available output out of the last tap useful for
cascade chains.   If you dynamically control the shift length, you'll probably want
to split up and duplicate the address drivers.

If you were not using the block RAMs for something else, you could use them in 16
bit wide mode as a delay queue by using one port for read and one for write.  The
read address and write address have to be offset for it to work correctly.
Depending on your data rate frequency, you may also be able to run the BRAM on a 2x
or even 4x clock in order to get 2 accesses per clock in order to double the
available width.  At 1x, and with a 64 deep queue, you can only use 1/4 of the
memory per block RAM, so a 4x memory clock would be ideal provided it does not
exceed the capabilities of the BRAM.

John_H wrote:

> Go to http://toolbox.xilinx.com/docsan/xilinx4/manuals.htm and look at the
> "Libraries Guide" which has all the primitives listed.  You can use an SRL16
> with or without enable.
>
> Of you need to tap off a fixed diagonal, the task is pretty straightrorward but
> the coding (3 levels of module hierarchy) isn't as clean.  Do you know about
> parameterized modules?
>
> You can have a chain of 4 SRL16s to get your 64 bit delay.  To "tap" an element
> in the middle at a fixed address, you can either daisy-chain two shorter SRL16s
> together (you can program them for delays of 1 to 16, inclusive) or you can tap
> off the feed between the 16 long delays and feed an SRL16 in parallel with the
> fixed chain.  You can hard code the address or select which of those 16 taps you
> want dynamically if necessary (but the bits that make the selection could have a
> HUGE fanout!  384 bits?!).
>
> John Hovell wrote:
> <excerpt>
>
> > I'm hunting around right now for the instatiation syntax for an SRL's on the
> > net... Is there a primative that I can call so one of these is inferred?  It
> > seems the PRNG (Xilinx app 211) just uses some fancy compiler ifdef
> > statements to get the right piece of hardware inferred.  I'm sure I can find
> > this info on the 'net so I don't want to bother anyone with simple
> > questions... however if someone feels compelled to clue me in, I certainly
> > won't mind ;-).
> >
> > The only reason SRL's might not work is that I need to read *some* values in
> > the delay pipe (i.e. the first column and a sort of diagonal row through the
> > first half of it:  Total bits: 384*2 = 768).  Hopefully I can read values
> > that are in the shift registers....

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 45473
Subject: Re: delay pipes in verilog for spartan IIe?
From: "Kevin Neilson" <kevin-neilson@removethistextattbi.com>
Date: Wed, 24 Jul 2002 20:05:59 GMT
Links: << >> << T >> << A >>

John,
I don't know about the other synthesizers, but if you are using Synplify you
can write a single 'for' loop that will infer all the SRLs.

You could still use the BRAMs if you can run them at twice the data rate and
write half the bus on one cycle and the other half on the next.
-Kevin

"John_H" <johnhandwork@mail.com> wrote in message
news:3D3ED543.95AE563B@mail.com...
> Thanks, Kevin, I stand corrected.  Identical read and write addresses pass
the
> input data to the output according to the libraries guide,
>
>    http://toolbox.xilinx.com/docsan/xilinx4/data/docs/lib/dsgnelpr30.html
>
> Since the Spartan-IIE 300 only has 16 block rams,
>
>   http://www.xilinx.com/partinfo/ds077_1.pdf
>
> the way to fully implement the pipeline would be to use bunches and
bunches of
> SRL16 elements.  (Half could be SRLs and half be memory but this gets
weird)
>
> I'd recommend putting together a small submodule that's 4 cascaded SRL
elements
> for the 64 bit delay (or make it less to account for input/output
registers, for
> instance) and instantiate that module 384 times.  The arrays of instances
are
> wonderful to do this;  if you don't have that capability, another module
> instantiating 16 of those modules could, itself, be instantiated 24 times
for a
> total of 384 delay elements.
>
> - John_H
>
>
> Kevin Neilson wrote:
>
> > I'm not quite sure if that works.  I think if you are writing to port A,
the
> > output on port A isn't the data that was in that address, but rather the
> > data that you are writing into the A input.  I think it gets routed
directly
> > through.  Check the timing diagrams in the datasheet.  I think the
blockRAM
> > on Virtex2 allows you to circumvent this mode, but not V-E/Spartan-IIe.
>

Article: 45474
Subject: Re: 32-bit PCI Target core
From: Kevin Brace <killspam4kevinbraceusenet@killspam4hotmail.com>
Date: Wed, 24 Jul 2002 15:29:49 -0500
Links: << >> << T >> << A >>



Jeff Reeve wrote:
> 
> I'm looking for a synthesizeable 32-bit 33MHz PCI Target only design to be
> placed into a FPGA or large CPLD. Minimal implementation is fine. Does
> anybody know if such a thing is available in VHDL or Verilog and is open
> sourced? I seem to recall Xilinx publishing a target only design quite some
> time ago but I can no longer find it on their web site.
> 
> Any help is much apprecieated!
> Jeff


        This is what you are probably talking about.

ftp://ftp.xilinx.com/pub/applications/pci/
ftp://ftp.xilinx.com/pub/applications/pci/00_index.htm


For some reason, a Verilog version of the reference design is missing,
but if you want it I can E-mail it to you (Some kind, long time Xilinx
user sent it to me.). 
I also believe Lattice Semiconductor and Quicklogic also have their own
PCI reference design (I know the Lattice one is written in Verilog, but
not sure about the Quicklogic one.).
        However, here is a caveat of using reference designs offered by
device manufacturers.
Even if the design is written in a device independent form (Uses generic
Verilog or VHDL statements, and no vendor specific primitives.), when
using reference designs offered by device manufacturers, you are often
legally required to use the reference designs on their devices.
        Opencores.org also has a free PCI IP core, but it is a lot more
complex (Supports initiator and target transfers.) than any of the above
mentioned reference designs, so I feel like you will likely have a hard
time modifying it to suit your own needs.
        When modifying a PCI interface, PCI specification Appendix B's
state machine examples and the following article may be helpful.

http://www.eedesign.com/editorial/1995/fpgafeature9502.html



Kevin Brace (In general, don't respond to me directly, and respond
within the newsgroup.)

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search