Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search

Messages from 155725

Article: 155725
Subject: Problem in Xilinx xapp1052 DMA PCIE custom flow
From: hima.aj@gmail.com
Date: Thu, 22 Aug 2013 10:55:07 -0700 (PDT)
Links: << >>  << T >>  << A >>
Hi,

I am using ML505 XC5VLX110t board for PCIe core implementation. To implemen=
t the xapp1052 DMA design for my Endpoint block plus, I am following a cust=
om flow. I copied my *.xst, *.xcf and *.scr file in xst directory and the c=
orresponding *.ucf file in the ucf directory. While running the script "xil=
perl implement_dma.pl", I see the error as:

ERROR:Xst: 1585 - Cannot open file 'endpoint_blk_plus_v1_14.xcf'. Please ma=
ke sure that the file exists and that you have read permission for it.

The file already exists in the xst directory. How to resolve this issue in =
non-GUI mode of implementation ?

Appreciate any help.

Thanks...

Article: 155726
Subject: Re: Cascaded floating-point reduction?
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 22 Aug 2013 18:15:40 +0000 (UTC)
Links: << >>  << T >>  << A >>
jonesandy@comcast.net wrote:
> On Wednesday, August 21, 2013 6:55:15 PM UTC-5, Mark Curry wrote:
>> Not in any synthesizer I know. Floating point types aren't
>> handled at all, much less operation like multiplication on them.
>> I wouldn't expect them to do so *EVER*. Too much overhead,
>> and too little of a customer base would need/want it.
 

(snip)
> "When one of the following constructs in encountered, 
> compilation continues, but will subsequently error out if 
> logic must be generated for the construct."

Most of the time, you want internal pipelining on the floating
point operations. There is no where to specify that with the
usual arithmetic operators, but is is easy of you reference
a module to do it.

-- glen


Article: 155727
Subject: Re: Cascaded floating-point reduction?
From: jonesandy@comcast.net
Date: Fri, 23 Aug 2013 09:58:20 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Thursday, August 22, 2013 1:15:40 PM UTC-5, glen herrmannsfeldt wrote:
>  Most of the time, you want internal pipelining on the floating point
> operations. There is no where to specify that with the usual arithmetic
> operators, but is is easy of you reference a module to do it.

Most of the time you will need the extra pipelining if you want to infer bu=
ilt-in multipliers.

This is where retiming and pipelining synthesis optimizations come in handy=
. If you follow up (and/or precede) the expression assignment with a few ex=
tra clock cycles of latency (pipeline register stages), the synthesis tool =
can distribute the HW across the extra clock cycles automatically.=20

Whether synthesis can do it as well as you can manually, I don't know. But =
if it is good enough to work, does it really need to be as good as you coul=
d have done manually? I'd rather have the maintainability of the mathematic=
al expression, if it will work.

Andy

Article: 155728
Subject: Re: Cascaded floating-point reduction?
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Fri, 23 Aug 2013 17:34:35 +0000 (UTC)
Links: << >>  << T >>  << A >>
jonesandy@comcast.net wrote:

(snip regarding pipelining)

> Most of the time you will need the extra pipelining if you want 
> to infer built-in multipliers.
 
> This is where retiming and pipelining synthesis optimizations 
> come in handy. If you follow up (and/or precede) the expression 
> assignment with a few extra clock cycles of latency 
> (pipeline register stages), the synthesis tool can distribute 
> the HW across the extra clock cycles automatically. 

Which tools do that? That sounds pretty useful. 

As I am not the OP, the things that I try to do are different.
One that I have wondered about is the ability to add extra register
stages to speed up the critical path. I work on very long, fixed point
pipelines, so usually there is at some point some very long routes
which limit the speed. If I could put registers in them, it could
run a lot faster.

> Whether synthesis can do it as well as you can manually, 
> I don't know. But if it is good enough to work, does it 
> really need to be as good as you could have done manually? 
> I'd rather have the maintainability of the mathematical 
> expression, if it will work.

Well, for really large problems every ns counts. For 5% 
difference, maybe I wouldn't worry about it, but 20% or 30%
is worth working for.

-- glen

Article: 155729
Subject: Re: Cascaded floating-point reduction?
From: gtwrek@sonic.net (Mark Curry)
Date: Fri, 23 Aug 2013 21:24:01 +0000 (UTC)
Links: << >>  << T >>  << A >>
In article <kv86fb$h37$1@speranza.aioe.org>,
glen herrmannsfeldt  <gah@ugcs.caltech.edu> wrote:
>jonesandy@comcast.net wrote:
>
>(snip regarding pipelining)
>
>> Most of the time you will need the extra pipelining if you want 
>> to infer built-in multipliers.
> 
>> This is where retiming and pipelining synthesis optimizations 
>> come in handy. If you follow up (and/or precede) the expression 
>> assignment with a few extra clock cycles of latency 
>> (pipeline register stages), the synthesis tool can distribute 
>> the HW across the extra clock cycles automatically. 
>
>Which tools do that? That sounds pretty useful. 

In Xilinx XST, the switch you're looking for is:
-register_balancing yes

I now leave it on by default - it rarely makes things worse.  It 
seems to help - I notice in the log file it does move Flops forward
and backward through the combinational logic in an attempt to better
balance the pipeline paths.  How well it does the job - I've not dug
in that deep.

>As I am not the OP, the things that I try to do are different.
>One that I have wondered about is the ability to add extra register
>stages to speed up the critical path. I work on very long, fixed point
>pipelines, so usually there is at some point some very long routes
>which limit the speed. If I could put registers in them, it could
>run a lot faster.

Sounds just like what the tool is targetting.  If you have access
to it, I'd suggest giving it a shot.

Regards,

Mark




Article: 155730
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: rickman <gnuarm@gmail.com>
Date: Fri, 23 Aug 2013 17:31:35 -0400
Links: << >>  << T >>  << A >>
On 8/22/2013 12:09 PM, jonesandy@comcast.net wrote:
> Rick,
>
> You might add MicroSemi Igloo2 to your list as well. A SmartFusion2 without the ARM, but still has the other hard-silicon interfaces.

Yes, I've taken a look at the MicroSemi parts, but they suffer the same 
problems as the others.  The favorable 100 pin QFP is not used in the 
Igloo2 line and the Igloo line is rather long in the tooth.  It also is 
hard to get now, I see very little inventory at Digikey and a lead time 
query says "Unable to get lead time".

If the prices on the SmartFusion lines weren't so high I'd consider 
them, but they are pretty high for this design and they only come in 
rather large packages.

-- 

Rick

Article: 155731
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: jg <j.m.granville@gmail.com>
Date: Fri, 23 Aug 2013 15:34:40 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Saturday, August 24, 2013 9:31:35 AM UTC+12, rickman wrote:
> The favorable 100 pin QFP is not used in the 
> Igloo2 line and the Igloo line is rather long in the tooth. 

I've noticed a significant trend from Asia, in Microcontrollers to offer
a choice of package-pitch, in particular, 64pin 0.8mm == same plastic as qfp100.

This allows higher yield PCB design rules.

It would be great if the FPGA/CPLD vendors grasped this.
-jg

Article: 155732
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: rickman <gnuarm@gmail.com>
Date: Fri, 23 Aug 2013 21:06:37 -0400
Links: << >>  << T >>  << A >>
On 8/23/2013 6:34 PM, jg wrote:
> On Saturday, August 24, 2013 9:31:35 AM UTC+12, rickman wrote:
>> The favorable 100 pin QFP is not used in the
>> Igloo2 line and the Igloo line is rather long in the tooth.
>
> I've noticed a significant trend from Asia, in Microcontrollers to offer
> a choice of package-pitch, in particular, 64pin 0.8mm == same plastic as qfp100.
>
> This allows higher yield PCB design rules.
>
> It would be great if the FPGA/CPLD vendors grasped this.

I used to rail against the FPGA vendor's decisions in packaging.  I find 
it very inconvenient at a minimum.  But these days I have learned to do 
the Zen thing and reabsorb my dissatisfaction so as to turn it to an 
advantage.  I'm not sure what that means exactly, but I've given up 
trying to think of FPGAs as an MCU substitute.

I suppose the markets are different enough that FPGAs just can't be 
produced economically in as wide a range of packaging.  I think that is 
what Austin Leesa used to say, that it just costs too much to provide 
the parts in a lot of packages.  Their real market is at the high end. 
Like the big CPU makers, it is all about raising the ASP, Average 
Selling Price.  Which means they don't spend a lot of time wooing the 
small part users like us.

I don't really need the 0.8 mm pitch.  My current design uses the 0.5 mm 
pitch part with 6/6 design rules and no one other than Sunstone has any 
trouble making the board.  Is there any real advantage to using a 0.8 mm 
pitch QFP?  I would be very interested in using QFNs.  They seem to be 
the same package as the QFNs but without the leads.  Of course that 
presents it's own problem.  BGAs and QFNs don't like it when the board 
flexes and my board is long and skinny.  It tends to get pried off of 
the mother board and flexed in the process.  That can cause lead-less 
parts to separate from the board.

As long as we are wishing for stuff, I'd really love to see a smallish 
MCU mated to a smallish FPGA.  I think Atmel made one called the FPSLIC 
with an AVR and FPGA inside.  It never took off and I was never brave 
enough to design one in expecting it to be dropped at any time.  I just 
looked and it appears the FPSLIC is finally EOL. lol

-- 

Rick

Article: 155733
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: Uwe Bonnes <bon@hertz.ikp.physik.tu-darmstadt.de>
Date: Sat, 24 Aug 2013 10:02:23 +0000 (UTC)
Links: << >>  << T >>  << A >>
jg <j.m.granville@gmail.com> wrote:
> On Saturday, August 24, 2013 9:31:35 AM UTC+12, rickman wrote:
> > The favorable 100 pin QFP is not used in the 
> > Igloo2 line and the Igloo line is rather long in the tooth. 

> I've noticed a significant trend from Asia, in Microcontrollers to offer
> a choice of package-pitch, in particular, 64pin 0.8mm == same plastic 
> as qfp100.

> This allows higher yield PCB design rules.

But that gets the decouplig caps farer away from the chip  and so worsens
simultanious switching.

But it is also great for prototyping and degugging...

> It would be great if the FPGA/CPLD vendors grasped this.

-- 
Uwe Bonnes                bon@elektron.ikp.physik.tu-darmstadt.de

Institut fuer Kernphysik  Schlossgartenstrasse 9  64289 Darmstadt
--------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------

Article: 155734
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Sat, 24 Aug 2013 11:25:38 +0100
Links: << >>  << T >>  << A >>
On 24/08/13 02:06, rickman wrote:
> As long as we are wishing for stuff, I'd really love to see a smallish MCU mated to a smallish FPGA.

Isn't the presumption that you will use one of their soft-core processors?


Article: 155735
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: rickman <gnuarm@gmail.com>
Date: Sat, 24 Aug 2013 12:37:23 -0400
Links: << >>  << T >>  << A >>
On 8/24/2013 6:25 AM, Tom Gardner wrote:
> On 24/08/13 02:06, rickman wrote:
>> As long as we are wishing for stuff, I'd really love to see a smallish
>> MCU mated to a smallish FPGA.
>
> Isn't the presumption that you will use one of their soft-core processors?

There is more to an MCU than the processor core.  Memory is a 
significant issue.  Trying to use the block memory for the processor 
creates a lot of routing congestion and ties up those resources for 
other uses.  Adding something like the AVR or MSP430 to an FPGA would be 
a lot more efficient than building it out of the "fabric".  But yes, 
that is an approach for sure.

I think the real issue for me is that they just don't put very much FPGA 
into small packages usable without very fine board geometries.  I did my 
original design with a 3 kLUT part which was the largest I could get in 
a 100 pin QFP.  Over 5 years later, it is *still* the largest part I can 
get in that package, hence my problem... lol  Man, I would kill (or 
kiloLUT) to get a 6 kLUT part in a QFN package with just ONE row of 
pins.  They always seem to use QFN packages with two or more rows of 
pins which require 3/3 mil space/trace design rules.

One of my potential solutions is to use a similar, but smaller part from 
the XO2 line (assuming I stay with Lattice after this) and roll my own 
processor to handle the slow logic.  It will be a lot more work than 
just porting the HDL to a new chip though.  I think rolling my own will 
likely produce a smaller and more efficient design.  The 32 bit designs 
wouldn't even fit on the chip.  I'm not sure how large the 8 bit designs 
are in LUTs.  I'm thinking a 16 or 18 bit CPU would be a good data size. 
  Or if the LUTs are really tight, a 4 or 5 bit processor might save on 
real estate.  Maybe 8 or 9 bits is a good compromise depending on the 
LUT count.

-- 

Rick

Article: 155736
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: rickman <gnuarm@gmail.com>
Date: Sat, 24 Aug 2013 12:46:59 -0400
Links: << >>  << T >>  << A >>
On 8/24/2013 6:02 AM, Uwe Bonnes wrote:
> jg<j.m.granville@gmail.com>  wrote:
>> On Saturday, August 24, 2013 9:31:35 AM UTC+12, rickman wrote:
>>> The favorable 100 pin QFP is not used in the
>>> Igloo2 line and the Igloo line is rather long in the tooth.
>
>> I've noticed a significant trend from Asia, in Microcontrollers to offer
>> a choice of package-pitch, in particular, 64pin 0.8mm == same plastic
>> as qfp100.
>
>> This allows higher yield PCB design rules.
>
> But that gets the decouplig caps farer away from the chip  and so worsens
> simultanious switching.

Two comments.  He is describing two packages with the same body size and 
so the caps would be the same distance from the chip.  But also, when 
you use power and group planes with effective coupling, the distance of 
the cap from the chip is nearly moot.  The power planes act as a 
transmission line providing the current until the wave reaches the 
capacitor.  Transmission lines are your friend.

-- 

Rick

Article: 155737
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: Gabor <gabor@szakacs.org>
Date: Sat, 24 Aug 2013 13:14:43 -0400
Links: << >>  << T >>  << A >>
On 8/24/2013 12:37 PM, rickman wrote:
> On 8/24/2013 6:25 AM, Tom Gardner wrote:
>> On 24/08/13 02:06, rickman wrote:
>>> As long as we are wishing for stuff, I'd really love to see a smallish
>>> MCU mated to a smallish FPGA.
>>
>> Isn't the presumption that you will use one of their soft-core
>> processors?
>
> There is more to an MCU than the processor core.  Memory is a
> significant issue.  Trying to use the block memory for the processor
> creates a lot of routing congestion and ties up those resources for
> other uses.  Adding something like the AVR or MSP430 to an FPGA would be
> a lot more efficient than building it out of the "fabric".  But yes,
> that is an approach for sure.
>
> I think the real issue for me is that they just don't put very much FPGA
> into small packages usable without very fine board geometries.  I did my
> original design with a 3 kLUT part which was the largest I could get in
> a 100 pin QFP.  Over 5 years later, it is *still* the largest part I can
> get in that package, hence my problem... lol  Man, I would kill (or
> kiloLUT) to get a 6 kLUT part in a QFN package with just ONE row of
> pins.  They always seem to use QFN packages with two or more rows of
> pins which require 3/3 mil space/trace design rules.
>
> One of my potential solutions is to use a similar, but smaller part from
> the XO2 line (assuming I stay with Lattice after this) and roll my own
> processor to handle the slow logic.  It will be a lot more work than
> just porting the HDL to a new chip though.  I think rolling my own will
> likely produce a smaller and more efficient design.  The 32 bit designs
> wouldn't even fit on the chip.  I'm not sure how large the 8 bit designs
> are in LUTs.  I'm thinking a 16 or 18 bit CPU would be a good data size.
>   Or if the LUTs are really tight, a 4 or 5 bit processor might save on
> real estate.  Maybe 8 or 9 bits is a good compromise depending on the
> LUT count.
>
I would think the Lattice mico8 would be pretty small, and it should be
easy enough to try that out.  Rolling your own would be more fun,
though...

-- 
Gabor

Article: 155738
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: rickman <gnuarm@gmail.com>
Date: Sat, 24 Aug 2013 18:51:57 -0400
Links: << >>  << T >>  << A >>
On 8/24/2013 1:14 PM, Gabor wrote:
> On 8/24/2013 12:37 PM, rickman wrote:
>>
>> One of my potential solutions is to use a similar, but smaller part from
>> the XO2 line (assuming I stay with Lattice after this) and roll my own
>> processor to handle the slow logic. It will be a lot more work than
>> just porting the HDL to a new chip though. I think rolling my own will
>> likely produce a smaller and more efficient design. The 32 bit designs
>> wouldn't even fit on the chip. I'm not sure how large the 8 bit designs
>> are in LUTs. I'm thinking a 16 or 18 bit CPU would be a good data size.
>> Or if the LUTs are really tight, a 4 or 5 bit processor might save on
>> real estate. Maybe 8 or 9 bits is a good compromise depending on the
>> LUT count.
>>
> I would think the Lattice mico8 would be pretty small, and it should be
> easy enough to try that out. Rolling your own would be more fun,
> though...

I've already done the roll your own thing.  Stack processors can be 
pretty small and they are what I prefer to use, but a register machine 
is ok too.  I don't know how big the Micro8 is in terms of LUTs.  My 
last design was 16 bits and about 600 LUTs and that included some 
flotsam that isn't required in a final design.  By contrast the 
microBlaze I know is multiple kLUTs, but that is a 32 bit design.  The 
Xilinx picoBlaze 8 bit design is rather small (I don't recall the 
number) but it instantiates LUTs and FFs and so is not portable.  There 
may be a home grown version of the picoBlaze.

-- 

Rick

Article: 155739
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: Allan Herriman <allanherriman@hotmail.com>
Date: 25 Aug 2013 00:27:42 GMT
Links: << >>  << T >>  << A >>
On Sat, 24 Aug 2013 18:51:57 -0400, rickman wrote:

[snip]
> The
> Xilinx picoBlaze 8 bit design is rather small (I don't recall the
> number) but it instantiates LUTs and FFs and so is not portable.  There
> may be a home grown version of the picoBlaze.

There's "Pacoblaze" - a behavioural Verilog reimplementation of Picoblaze 
that should be fairly portable.

The last time I checked (in 2006?) by simulating Pacoblaze vs Picoblaze 
back-to-back, it wasn't cycle-accurate for interrupts.  However, I do 
notice that a 2007 bug fix is described as "Bug corrections on stack and 
interrupt" so perhaps it now is an exact clone.

http://bleyer.org/pacoblaze/

You'll be restricted to assembly language of course.  If your source is 
in C (or some other mid-level language) then you should look to a 
different micro core.

Regards,
Allan

Article: 155740
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: rickman <gnuarm@gmail.com>
Date: Sat, 24 Aug 2013 20:35:56 -0400
Links: << >>  << T >>  << A >>
On 8/24/2013 8:27 PM, Allan Herriman wrote:
> On Sat, 24 Aug 2013 18:51:57 -0400, rickman wrote:
>
> [snip]
>> The
>> Xilinx picoBlaze 8 bit design is rather small (I don't recall the
>> number) but it instantiates LUTs and FFs and so is not portable.  There
>> may be a home grown version of the picoBlaze.
>
> There's "Pacoblaze" - a behavioural Verilog reimplementation of Picoblaze
> that should be fairly portable.
>
> The last time I checked (in 2006?) by simulating Pacoblaze vs Picoblaze
> back-to-back, it wasn't cycle-accurate for interrupts.  However, I do
> notice that a 2007 bug fix is described as "Bug corrections on stack and
> interrupt" so perhaps it now is an exact clone.
>
> http://bleyer.org/pacoblaze/
>
> You'll be restricted to assembly language of course.  If your source is
> in C (or some other mid-level language) then you should look to a
> different micro core.

My source is in VHDL.  I would be porting sections of VHDL that don't 
need to run fast to software.  So either the Pacoblaze or the Micro8 
would do.  I don't think you can use the Altera CPUs on anyone else's 
chips.  But still, I would most likely use a stack processor, especially 
if I could find one that is programmable in Forth.  Actually, I'm fine 
with the assembly language as long as it's a stack machine.  Very simple 
instructions and very simple to implement.

-- 

Rick

Article: 155741
Subject: Synthesis and mapping of ALU
From: chthon <jurgen.defurne@gmail.com>
Date: Sun, 25 Aug 2013 07:03:48 -0700 (PDT)
Links: << >>  << T >>  << A >>
Dear all,

I have the following ALU code as part of a data path:

  -- purpose: simple ALU
  -- type   : combinational
  -- inputs : a, b, op_sel
  -- outputs: y
  alu : PROCESS (a, b, op_sel)
  BEGIN  -- PROCESS alu
    CASE op_sel IS
      WHEN "0000" =3D>                    -- increment
        y <=3D a + 1;
      WHEN "0001" =3D>                    -- decrement
        y <=3D a - 1;
      WHEN "0010" =3D>                    -- test for zero
        y <=3D a;
      WHEN "0111" =3D>                    -- addition
        y <=3D a + b;
      WHEN "1000" =3D>                    -- subtract, compare
        y <=3D a - b;
      WHEN "1010" =3D>                    -- logical and
        y <=3D a AND b;
      WHEN "1011" =3D>                    -- logical or
        y <=3D a OR b;
      WHEN "1100" =3D>                    -- logical xor
        y <=3D a XOR b;
      WHEN "1101" =3D>                    -- logical not
        y <=3D NOT a;
      WHEN "1110" =3D>                    -- shift left logical
        y <=3D a SLL 1;
      WHEN "1111" =3D>                    -- shift right logical
        y <=3D a SRL 1;
      WHEN OTHERS =3D>
        y <=3D a;
    END CASE;
  END PROCESS alu;

Is it normal that this is synthesized as 14 separate functions, which are t=
hen multiplexed through a 14:1 multiplexer onto the y bus? I am just trying=
 to find out if this is the fastest implementation that can be had, or if i=
t is possible to get a faster implementation by mapping more functions into=
 a single slice, so that the multiplexer becomes smaller.

As an example of something similar, the multiplexers generated by default b=
y ISE tend to cascade, while building them from LUT6 gives the possibility =
to build wide but not deep multiplexers (XAPP522).

Regards,

Jurgen

Article: 155742
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: already5chosen@yahoo.com
Date: Sun, 25 Aug 2013 09:44:51 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Sunday, August 25, 2013 1:51:57 AM UTC+3, rickman wrote:
> On 8/24/2013 1:14 PM, Gabor wrote:
> 
> > On 8/24/2013 12:37 PM, rickman wrote:
> 
> >>
> >> One of my potential solutions is to use a similar, but smaller part from
> >> the XO2 line (assuming I stay with Lattice after this) and roll my own
> >> processor to handle the slow logic. It will be a lot more work than
> >> just porting the HDL to a new chip though. I think rolling my own will
> >> likely produce a smaller and more efficient design. The 32 bit designs
> >> wouldn't even fit on the chip. I'm not sure how large the 8 bit designs
> >> are in LUTs. I'm thinking a 16 or 18 bit CPU would be a good data size.
> >> Or if the LUTs are really tight, a 4 or 5 bit processor might save on
> >> real estate. Maybe 8 or 9 bits is a good compromise depending on the
> >> LUT count.
> >>
> 
> > I would think the Lattice mico8 would be pretty small, and it should be
> > easy enough to try that out. Rolling your own would be more fun,
> > though...
> 
> 
> 
> I've already done the roll your own thing.  Stack processors can be 
> pretty small and they are what I prefer to use, but a register machine 
> is ok too.  I don't know how big the Micro8 is in terms of LUTs.  My 
> last design was 16 bits and about 600 LUTs and that included some 
> flotsam that isn't required in a final design.  By contrast the 
> microBlaze I know is multiple kLUTs, but that is a 32 bit design.  The 
> Xilinx picoBlaze 8 bit design is rather small (I don't recall the 
> number) but it instantiates LUTs and FFs and so is not portable.  There 
> may be a home grown version of the picoBlaze.
> -- 
> 
> Rick

I just measured Altera Nios2e on Stratix3 - 379 ALMs + 2 M9K blocks (out of 18K memory bits only 2K bits used). It's hard to translate exactly into old-fashioned LUTs, but I'd say - around 700.
Per clock Nios2e is pretty slow, but it clocks rather high and it is a 32-bit CPU - very easy to program in C.

Reimplementing Nios2 in minimal number of LUTs, e.g. trading memory for fabric, could be an interesting exercise, well suitable for coding competition. But, probably, illegal :(

Article: 155743
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Mon, 26 Aug 2013 01:35:58 +0000 (UTC)
Links: << >>  << T >>  << A >>
rickman <gnuarm@gmail.com> wrote:

(snip)
> I used to rail against the FPGA vendor's decisions in packaging.  I find 
> it very inconvenient at a minimum.  But these days I have learned to do 
> the Zen thing and reabsorb my dissatisfaction so as to turn it to an 
> advantage.  I'm not sure what that means exactly, but I've given up 
> trying to think of FPGAs as an MCU substitute.
 
> I suppose the markets are different enough that FPGAs just can't be 
> produced economically in as wide a range of packaging.  I think that is 
> what Austin Leesa used to say, that it just costs too much to provide 
> the parts in a lot of packages.  Their real market is at the high end. 
> Like the big CPU makers, it is all about raising the ASP, Average 
> Selling Price.  Which means they don't spend a lot of time wooing the 
> small part users like us.

In many cases they would put one in a package with fewer pins
than pads, a waste of good I/O drivers, but maybe useful for
some people. 

I don't know much much it costs just to support an additional
package, though.

Also, a big problem with FPGAs, and ICs in general, is a low enough
lead inductance. Many packages that would otherwise be useful have
too much inductance.

-- glen

Article: 155744
Subject: Re: Synthesis and mapping of ALU
From: Jon Elson <elson@pico-systems.com>
Date: Sun, 25 Aug 2013 23:21:16 -0500
Links: << >>  << T >>  << A >>
chthon wrote:

> Dear all,
> 
> I have the following ALU code as part of a data path:
> 
>   -- purpose: simple ALU
>   -- type   : combinational
>   -- inputs : a, b, op_sel
>   -- outputs: y
>   alu : PROCESS (a, b, op_sel)
>   BEGIN  -- PROCESS alu
>     CASE op_sel IS
>       WHEN "0000" =>                    -- increment
>         y <= a + 1;
>       WHEN "0001" =>                    -- decrement
>         y <= a - 1;
>       WHEN "0010" =>                    -- test for zero
>         y <= a;
>       WHEN "0111" =>                    -- addition
>         y <= a + b;
>       WHEN "1000" =>                    -- subtract, compare
>         y <= a - b;
>       WHEN "1010" =>                    -- logical and
>         y <= a AND b;
>       WHEN "1011" =>                    -- logical or
>         y <= a OR b;
>       WHEN "1100" =>                    -- logical xor
>         y <= a XOR b;
>       WHEN "1101" =>                    -- logical not
>         y <= NOT a;
>       WHEN "1110" =>                    -- shift left logical
>         y <= a SLL 1;
>       WHEN "1111" =>                    -- shift right logical
>         y <= a SRL 1;
>       WHEN OTHERS =>
>         y <= a;
>     END CASE;
>   END PROCESS alu;
> 
> Is it normal that this is synthesized as 14 separate functions, which are
> then multiplexed through a 14:1 multiplexer onto the y bus? I am just
> trying to find out if this is the fastest implementation that can be had,
> or if it is possible to get a faster implementation by mapping more
> functions into a single slice, so that the multiplexer becomes smaller.
> 
This is a functional description, to get the correct logical output.
it goes into the synthesis program and gets massively reduced.  For
instance, the separate increment/decrement logic will get folded into
one adder logic with a few extra gates to perform the two functions.
So, the resulting logic actually implemented in an FPGA will not resemble
what you describe at all, but will be pretty well optimized for the
specific chip architecture.

Jon

Article: 155745
Subject: Re: Cascaded floating-point reduction?
From: jonesandy@comcast.net
Date: Mon, 26 Aug 2013 06:20:05 -0700 (PDT)
Links: << >>  << T >>  << A >>
Glen,

I know Synplify Pro has a retiing/pipelining option (for Xilinx and Altera =
targets), and I think Altera's and Xilinx's own tools do as well.

The last time I checked, straight retiming may only move logic into an adja=
cent clock cycle, but pipelining of functions such as multipliers or multip=
lexers can spread that logic over several clock cycles. I have seen example=
s where a large multiply (larger than a DSP block could handle) was automat=
ically partitioned and pipelined to use multiplie DSP blocks.=20

Since straight retiming may be limited to adjacent clock cycles, it might b=
e best to provide additional clock cycles of latency before and after the e=
xpression, so that two empty, adjacent clock cycles are available. Note tha=
t retiming does not need to have empty clock cycles to share logic across, =
but there does need to be positive slack in those adjacent clock cycles in =
order to "make room" for any retimed logic.

As far as timing or utiliszation is concerned, as long as I have positive s=
lack in both, with any margin requirements met, I prefer to have the most u=
nderstandable, maintainable description possible, even if a lesser descript=
ion would cut either (or both) by half. This was very hard to do when I sta=
rted VHDL based FPGA design many years ago (just meeting timing and utiliza=
tion was tougher in those devices and with those tools, and the "optimizer"=
 in me was hard to re-calibrate.) I now try to optimize for maintainability=
 whenever possible.

Andy

Article: 155746
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: jonesandy@comcast.net
Date: Mon, 26 Aug 2013 15:58:29 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Saturday, August 24, 2013 11:46:59 AM UTC-5, rickman wrote:
> Two comments. He is describing two packages with the same body
> size and so the caps would be the same distance from the chip.
> But also, when you use power and group planes with effective
> coupling, the distance of the cap from the chip is nearly moot.
> The power planes act as a transmission line providing the
> current until the wave reaches the capacitor. Transmission
> lines are your friend. -- Rick

Two more comments...

The problem with leaded packages is, especially compared to flip-chip packa=
ges, is the electrical distance (and characteristic impedance) from the lea=
d/board joint to the die pad, particularly for power/ground connections. Th=
e substrate for a CSP looks like a mini-circuit board with its own power/gr=
ound planes.

Sure you can put the cap on the board close to the power/ground lead, but y=
ou cannot get it electrically as close to the die pad as you can with a fli=
p-chip package.

Transmission lines for power connections are not your friend, unless they a=
re of very low characteristic impedance at the high frequencies of interest=
 (e.g. transition times on your fast outputs, etc.) Until the wave traverse=
s the length of the transmission line, you are effectively supplying curren=
t through a resistor with the same value as the transmission line impedance=
.

What power planes do is provide "very low impedance transmission lines" for=
 the power/ground connections, and the ability to connect an appropriately =
packaged capacitor to the end of that line with very low inductance.

If your design is slow (output edge rates, not clock rates) or has few simu=
ltaneously switching outputs, it won't matter which package you use.

Andy

Article: 155747
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: jg <j.m.granville@gmail.com>
Date: Mon, 26 Aug 2013 16:43:54 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Saturday, August 24, 2013 1:06:37 PM UTC+12, rickman wrote:
> 
> 
> I suppose the markets are different enough that FPGAs just can't be 
> produced economically in as wide a range of packaging.  I think that is 
> what Austin Leesa used to say, that it just costs too much to provide 
> the parts in a lot of packages.  Their real market is at the high end. 
> Like the big CPU makers, it is all about raising the ASP, Average 
> Selling Price.  

I was meaning more at the lower end, - eg where lattice can offer parts in QFN32, then take a large jump to Qfp100 0.5mm.
QFN32 have only 21 io, so you easily exceed that, but there is a very large gap between the QFN32 and TQFP100.
 They claim to be chasing the lower cost markets with these parts, but seem rather blinkered in doing so.
 Altera well priced parts in gull wing,(MAX V) but only in a 0.4mm pitch.

 
> As long as we are wishing for stuff, I'd really love to see a smallish 
> MCU mated to a smallish FPGA.  

If you push that 'smallish', the Cypress PSoC series have uC+logic.

The newest PSoc4 seems to have solved some of the sticker shock, but I think they crippled the Logic to achieve that.
 Seems there is no free lunch.

Cypress do however, grasp the package issue, and offer QFN40(0.5mm), as well as SSOP28(0.65mm)  and TQFP44(0.8mm).

Article: 155748
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: jg <j.m.granville@gmail.com>
Date: Mon, 26 Aug 2013 17:26:48 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Saturday, August 24, 2013 10:02:23 PM UTC+12, Uwe Bonnes wrote:
> jg  wrote:
> > This allows higher yield PCB design rules.
>
> But that gets the decouplig caps farer away from the chip  and so worsens
> simultanious switching.

As rick has already pointed out, the package is the same size, so the die-cap pathway via the leadframe is no different. 

In fact, because the leads are wider, the inductance is actually less on the coarser pitch package. (not by much, but it is lower)

Also, someone wanting a lower pin count logic package is less likely to be pushing 'simultanious switching'

-jg


Article: 155749
Subject: Re: Lattice Announces EOL for XP and EC/P Product Lines
From: jg <j.m.granville@gmail.com>
Date: Mon, 26 Aug 2013 17:32:40 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Sunday, August 25, 2013 12:35:56 PM UTC+12, rickman wrote:
> Actually, I'm fine 
> with the assembly language as long as it's a stack machine.  Very simple 
> instructions and very simple to implement.

 This tolerance of a subset opens an interesting design approach, whereby you make opcodes granular, and somehow list the ones used, to allow the tools to remove the unused ones.

 I think I saw a NXP design some time ago, along these lines of removing unused opcode logic.

 That means you can start with a more powerful core (hopefully more proven)
and then limit your SW to an opcode-subset.
 
-jg



Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search