Messages from 158475

Article: 158475
Subject: Re: Simulation vs Synthesis
From: rickman <gnuarm@gmail.com>
Date: Tue, 1 Dec 2015 22:58:04 -0500
Links: << >> << T >> << A >>

On 12/1/2015 8:55 PM, BobH wrote:
> On 11/30/2015 5:34 PM, rickman wrote:
>> On 11/30/2015 6:44 PM, BobH wrote:
>>> A mistake that I have made, is to mis-spell the wire connection and then
>>> there is no user for the outputs. The easiest way to check that is to
>>> inspect the simulation at the inputs to the next stage that uses the
>>> data and make sure that they are wiggling as you expect and not showing
>>> undefined as they would for an undriven wire. The second easiest way to
>>> check that is to eyeball the naming for this problem.
>>
>> If you make a spelling error, won't that be flagged because that signal
>> hasn't been declared?
>>
> Often the auto-wire "feature" will generate a replacement. If you go
> through the logs, it is noted, and usually the auto-wire will be a
> single wide signal instead of a bus, so it shows up that way too.

That is why VHDL has strong typing, errors like this are made *very* clear.

-- 

Rick

Article: 158476
Subject: Re: Simulation vs Synthesis
From: Simon <google@gornall.net>
Date: Tue, 1 Dec 2015 20:49:48 -0800 (PST)
Links: << >> << T >> << A >>

So this evening I implemented the PLA instruction, which reads from the sta=
ck (at the current location of the stack pointer) and stores the value ther=
e into A. Synthesis took about 3x as long, and at the end of it there's a w=
hole bunch of Info messages about how it wasn't storing the stack in a bloc=
k ram for this reason or that.

Looking at the registers, I jumped from ~260 to ~520, so it looks as though=
 the variably-indexed (via SP) set of stack registers were incorporated int=
o the design again :) Phew!

I guess I'll just get on with it and implement more instructions - I was ju=
st afraid that as the design got larger, it would be harder to debug. Looks=
 like it might have been easier :)

Thanks again for all the help everyone, especially the verilog examples Bob=
 :)

Simon

Article: 158477
Subject: Re: Simulation vs Synthesis
From: rickman <gnuarm@gmail.com>
Date: Wed, 2 Dec 2015 00:02:19 -0500
Links: << >> << T >> << A >>

On 12/1/2015 11:49 PM, Simon wrote:
> So this evening I implemented the PLA instruction, which reads from the stack (at the current location of the stack pointer) and stores the value there into A. Synthesis took about 3x as long, and at the end of it there's a whole bunch of Info messages about how it wasn't storing the stack in a block ram for this reason or that.
>
> Looking at the registers, I jumped from ~260 to ~520, so it looks as though the variably-indexed (via SP) set of stack registers were incorporated into the design again :) Phew!
>
> I guess I'll just get on with it and implement more instructions - I was just afraid that as the design got larger, it would be harder to debug. Looks like it might have been easier :)
>
> Thanks again for all the help everyone, especially the verilog examples Bob :)

You do have an issue if a block RAM is not being used.  The code I've 
seen looks like you are writing from a functional perspective rather 
than structural.   I would suggest you write a module for a block RAM 
using example code provided by your chip manufacturer.  Then incorporate 
that RAM module into your code as appropriate.

Block RAM must have a register delay in the RAM itself.  There are other 
restrictions as well, the details depending on the vendor.  If you code 
the module by the provider's example you should get a block RAM.  This 
should also help you see the limitations of how you can use that RAM.

I have had similar problems coding adders when I was trying to use the 
carry out.  One small issue with how I was using the adder resulting in 
a second adder being used to generate the carry out.

-- 

Rick

Article: 158478
Subject: Re: Simulation vs Synthesis
From: Simon <google@gornall.net>
Date: Wed, 2 Dec 2015 08:08:55 -0800 (PST)
Links: << >> << T >> << A >>

On Tuesday, December 1, 2015 at 9:02:26 PM UTC-8, rickman wrote:
>=20
> You do have an issue if a block RAM is not being used.  The code I've=20
> seen looks like you are writing from a functional perspective rather=20
> than structural.   I would suggest you write a module for a block RAM=20
> using example code provided by your chip manufacturer.  Then incorporate=
=20
> that RAM module into your code as appropriate.

But I don't want a block-ram. I don't want to pay the penalty of a clock-cy=
cle for access to the values. I want a block of 256 registers, which I can =
access with as-close-to-zero time cost as possible. Block-ram's are great, =
but in this case I really want just a whole bunch of registers.

I'm conscious that something is screwy. I don't understand why an array of =
registers declared as...

    ///////////////////////////////////////////////////////////////////////=
/////=20
    // Set up zero-page as register-based for speed reasons=20
    ///////////////////////////////////////////////////////////////////////=
/////=20
    reg    [`NW:0]                       zp[0:255];                // Zero-=
page=20

... should exhibit a whole bunch of warnings along the lines of

    INFO: [Synth 8-5545] ROM "zp_reg[255]" won't be mapped to RAM because a=
ddress size (32) is larger than maximum supported(25)"=20

Um, que ? Address size =3D=3D 32 ? Even if you treat it as a 1-bit array, t=
hat's only 11 bits of address (8 * 256 =3D 2048) to access any given bit. H=
mm, now there's a thought. I wonder if declaring:

    reg [2047:0]                        zp;

.. and doing the bit-selections might be a way to do it. No array, just a f=
reaking huge register. I wonder how efficient it is at ganging up LUTs to m=
ake a combined single register...

I actually might try implementing a module along the lines of BobH's code a=
bove - rather than just declaring the register array, and see how that work=
s out. At the moment I'm busy writing unit tests :)
=20
Cheers
   Simon

Article: 158479
Subject: Re: Simulation vs Synthesis
From: rickman <gnuarm@gmail.com>
Date: Wed, 2 Dec 2015 11:28:04 -0500
Links: << >> << T >> << A >>

On 12/2/2015 11:08 AM, Simon wrote:
> On Tuesday, December 1, 2015 at 9:02:26 PM UTC-8, rickman wrote:
>>
>> You do have an issue if a block RAM is not being used.  The code
>> I've seen looks like you are writing from a functional perspective
>> rather than structural.   I would suggest you write a module for a
>> block RAM using example code provided by your chip manufacturer.
>> Then incorporate that RAM module into your code as appropriate.
>
> But I don't want a block-ram. I don't want to pay the penalty of a
> clock-cycle for access to the values. I want a block of 256
> registers, which I can access with as-close-to-zero time cost as
> possible. Block-ram's are great, but in this case I really want just
> a whole bunch of registers.

Ok, I understand better now.


> I'm conscious that something is screwy. I don't understand why an
> array of registers declared as...
>
> ////////////////////////////////////////////////////////////////////////////
>  // Set up zero-page as register-based for speed reasons
> ////////////////////////////////////////////////////////////////////////////
>  reg    [`NW:0]    zp[0:255];     // Zero-page
>
> ... should exhibit a whole bunch of warnings along the lines of
>
> INFO: [Synth 8-5545] ROM "zp_reg[255]" won't be mapped to RAM because
> address size (32) is larger than maximum supported(25)"
>
> Um, que ? Address size == 32 ? Even if you treat it as a 1-bit array,
> that's only 11 bits of address (8 * 256 = 2048) to access any given
> bit. Hmm, now there's a thought. I wonder if declaring:
>
> reg [2047:0]                        zp;
>
> .. and doing the bit-selections might be a way to do it. No array,
> just a freaking huge register. I wonder how efficient it is at
> ganging up LUTs to make a combined single register...
>
> I actually might try implementing a module along the lines of BobH's
> code above - rather than just declaring the register array, and see
> how that works out. At the moment I'm busy writing unit tests :)

Now I am lost again.  Why are you trying to change the code that is 
giving you 256 registers?  The only RAM in FPGAs these days is 
synchronous RAM.  If you don't want the address register delay then your 
only choice is to use fabric FFs.

-- 

Rick

Article: 158480
Subject: Re: Simulation vs Synthesis
From: GaborSzakacs <gabor@alacron.com>
Date: Wed, 02 Dec 2015 11:41:31 -0500
Links: << >> << T >> << A >>

rickman wrote:
> On 12/1/2015 8:55 PM, BobH wrote:
>> On 11/30/2015 5:34 PM, rickman wrote:
>>> On 11/30/2015 6:44 PM, BobH wrote:
>>>> A mistake that I have made, is to mis-spell the wire connection and 
>>>> then
>>>> there is no user for the outputs. The easiest way to check that is to
>>>> inspect the simulation at the inputs to the next stage that uses the
>>>> data and make sure that they are wiggling as you expect and not showing
>>>> undefined as they would for an undriven wire. The second easiest way to
>>>> check that is to eyeball the naming for this problem.
>>>
>>> If you make a spelling error, won't that be flagged because that signal
>>> hasn't been declared?
>>>
>> Often the auto-wire "feature" will generate a replacement. If you go
>> through the logs, it is noted, and usually the auto-wire will be a
>> single wide signal instead of a bus, so it shows up that way too.
> 
> That is why VHDL has strong typing, errors like this are made *very* clear.
> 

You don't need VHDL, just Verilog 2001 and use `default_nettype none to
prevent auto-wire generation.

-- 
Gabor

Article: 158481
Subject: Re: Simulation vs Synthesis
From: gtwrek@sonic.net (Mark Curry)
Date: Wed, 2 Dec 2015 17:55:31 -0000 (UTC)
Links: << >> << T >> << A >>

In article <n3lkei021un@news3.nntpjunkie.com>,
BobH  <wanderingmetalhead.nospam.please@yahoo.com> wrote:
>
>The brute force might look like:
>
>module reg_ram
>(
>   input wire [1:0] address,
>   input wire [7:0] write_data,
>   input wire       write_en,
>   input wire       clk,
>   input wire       rstn,
>   output reg [7:0] read_data
>);
>
>reg [7:0] cell0, cell1, cell2, cell3;
>
>always @(posedge clk or negedge rstn)
>if (~rstn)
>   cell0 <= 8'h0;
>else
>   if (write_en & (address == 2'h0))
>     cell0 <= write_data;
>
>always @(posedge clk or negedge rstn)
>if (~rstn)
>   cell1 <= 8'h0;
>else
>   if (write_en & (address == 2'h1))
>     cell1 <= write_data;
>
<snip>
>   case (address)
>     2'h0: read_data = cell0;
>     2'h0: read_data = cell1;
>     2'h0: read_data = cell2;
>     2'h0: read_data = cell3;
>   endcase
>endmodule
>
>As rude as this looks, most of the other structures that I can think of 
>result in something that looks like a huge barrel shifter and are larger 
>to implement.
<snip>

Huh.  I missed what led up to this, but explicity coding up each case
like this is entirely unneccesary in verilog.

reg [ 7 : 0] cell [ 3 : 0];
always @( posedge clk ) // NO ASYNC RESET - messes up optimization - no reset at all actually is prefered
  if( write_en )
    cell[ address ] <= write_data;

always @*
  read_data = cell[ address ];

Done.  If reset's are needed then it won't map to block RAM.
Xilinx has examples in their docs for how to successfully infer block RAM.

Regards,

Mark

Article: 158482
Subject: Re: Simulation vs Synthesis
From: Simon <google@gornall.net>
Date: Wed, 2 Dec 2015 10:01:05 -0800 (PST)
Links: << >> << T >> << A >>

On Wednesday, December 2, 2015 at 8:28:10 AM UTC-8, rickman wrote:
>=20
> > I actually might try implementing a module along the lines of BobH's
> > code above - rather than just declaring the register array, and see
> > how that works out. At the moment I'm busy writing unit tests :)
>=20
> Now I am lost again.  Why are you trying to change the code that is=20
> giving you 256 registers?  The only RAM in FPGAs these days is=20
> synchronous RAM.  If you don't want the address register delay then your=
=20
> only choice is to use fabric FFs.

Maybe I'm reading/understanding it incorrectly - it looks to me that there'=
s an always @ (posedge(clk)) dependency for writes - but I'm relatively fin=
e with that - I won't need the data until the next clock anyway if I'm writ=
ing, because that's how the 6502 worked.=20

For reads, it looked to me as though it used always @ (*), and I (perhaps i=
ncorrectly) thought that would get me the results on the module's data bus =
as soon as the 'address' lines changed.

As for why to change it, I don't like it when I don't understand the error/=
info messages the tool is giving me. Given my (relatively limited) understa=
nding of what the synthesis tool is actually *doing* under the hood, it pro=
bably means I'm not getting what I actually want, or if I am, it's in some =
highly-inefficient manner. Your comment about inferring extra adders unnece=
ssarily is pretty relevant I feel :)

It does tie me to a single write/read per clock, whereas I could set N regi=
sters per clock (and thus "push" 3 elements onto the stack for the BRK inst=
ruction in a single clock for example), but I'm actually ok with that too, =
I think. The 6502 only had 1 databus, so *it* took multiple clocks to do mu=
ltiple writes as well.=20

Its entirely possible my understanding of the module is flawed. I'm happy t=
o be corrected :)

Cheers
   Simon

Article: 158483
Subject: Re: Simulation vs Synthesis
From: BobH <wanderingmetalhead.nospam.please@yahoo.com>
Date: Wed, 2 Dec 2015 16:27:38 -0700
Links: << >> << T >> << A >>

On 12/2/2015 10:55 AM, Mark Curry wrote:
> In article <n3lkei021un@news3.nntpjunkie.com>,
> <snip>
>
> Huh.  I missed what led up to this, but explicity coding up each case
> like this is entirely unneccesary in verilog.
>
> reg [ 7 : 0] cell [ 3 : 0];
> always @( posedge clk ) // NO ASYNC RESET - messes up optimization - no reset at all actually is prefered
>    if( write_en )
>      cell[ address ] <= write_data;
>
> always @*
>    read_data = cell[ address ];
>
> Done.  If reset's are needed then it won't map to block RAM.
> Xilinx has examples in their docs for how to successfully infer block RAM.
>

Thanks! I have always explicitly built a model RAM when I wanted RAM in 
an FPGA rather than inferring one. I just automatically include the 
reset when I do D flops because it makes the simulation cleaner.
   I think that the original poster wanted an array of flops thinking 
that they would be faster than block ram.

This looks worth messing with when I get some breathing space. I am a 
little curious about the synthesizabilty of it.

Regards,
BobH

Article: 158484
Subject: Re: Simulation vs Synthesis
From: BobH <wanderingmetalhead.nospam.please@yahoo.com>
Date: Wed, 2 Dec 2015 16:42:09 -0700
Links: << >> << T >> << A >>

On 12/2/2015 9:08 AM, Simon wrote:
> On Tuesday, December 1, 2015 at 9:02:26 PM UTC-8, rickman wrote:
>>
>> You do have an issue if a block RAM is not being used.  The code I've
>> seen looks like you are writing from a functional perspective rather
>> than structural.   I would suggest you write a module for a block RAM
>> using example code provided by your chip manufacturer.  Then incorporate
>> that RAM module into your code as appropriate.
>
> But I don't want a block-ram. I don't want to pay the penalty of a clock-cycle
> for access to the values. I want a block of 256 registers, which I can access
> with as-close-to-zero time cost as possible. Block-ram's are great, but in
> this case I really want just a whole bunch of registers.
>
> I'm conscious that something is screwy. I don't understand why an array of
> registers declared as...
>
>      ////////////////////////////////////////////////////////////////////////////
>      // Set up zero-page as register-based for speed reasons
>      ////////////////////////////////////////////////////////////////////////////
>      reg    [`NW:0]                       zp[0:255];                // Zero-page
>
> ... should exhibit a whole bunch of warnings along the lines of
>
>      INFO: [Synth 8-5545] ROM "zp_reg[255]" won't be mapped to RAM because
> address size (32) is larger than maximum supported(25)"

Block ram tends to be smallish and often wierd sizes.

>
> Um, que ? Address size == 32 ? Even if you treat it as a 1-bit array, that's
> only 11 bits of address (8 * 256 = 2048) to access any given bit. Hmm, now there's
> a thought. I wonder if declaring:
>
>      reg [2047:0]                        zp;
>
> .. and doing the bit-selections might be a way to do it. No array, just a freaking
> huge register. I wonder how efficient it is at ganging up LUTs to make a combined single
> register...

This will result in a huge barrel shifter which will likely get slow. I 
don't know what your clock speeds are relative the the FPGA capability, 
but I don't like the big barrel shifter implementations. If your clock 
speeds are a few MHz and you are using a modern FPGA, you probably can 
afford to implement it that way.

> I actually might try implementing a module along the lines of BobH's code above - rather
> than just declaring the register array, and see how that works out. At the moment
> I'm busy writing unit tests :)

Try Mark Curry's suggested syntax. If it is synthesizable, it will be 
MUCH easier to implement! From Mark's comment, if you include the reset, 
it should prevent the replacement of FF's with block RAM.

Regards,
BobH

Article: 158485
Subject: Re: Simulation vs Synthesis
From: gtwrek@sonic.net (Mark Curry)
Date: Thu, 3 Dec 2015 00:46:52 -0000 (UTC)
Links: << >> << T >> << A >>

In article <n3nup505fn@news3.nntpjunkie.com>,
BobH  <wanderingmetalhead.nospam.please@yahoo.com> wrote:
>On 12/2/2015 10:55 AM, Mark Curry wrote:
>> In article <n3lkei021un@news3.nntpjunkie.com>,
>> <snip>
>>
>> Huh.  I missed what led up to this, but explicity coding up each case
>> like this is entirely unneccesary in verilog.
>>
>> reg [ 7 : 0] cell [ 3 : 0];
>> always @( posedge clk ) // NO ASYNC RESET - messes up optimization - no reset at all actually is prefered
>>    if( write_en )
>>      cell[ address ] <= write_data;
>>
>> always @*
>>    read_data = cell[ address ];
>>
>> Done.  If reset's are needed then it won't map to block RAM.
>> Xilinx has examples in their docs for how to successfully infer block RAM.
>>
>
>Thanks! I have always explicitly built a model RAM when I wanted RAM in 
>an FPGA rather than inferring one. I just automatically include the 
>reset when I do D flops because it makes the simulation cleaner.
>   I think that the original poster wanted an array of flops thinking 
>that they would be faster than block ram.
>
>This looks worth messing with when I get some breathing space. I am a 
>little curious about the synthesizabilty of it.

Bob - it's all synthesizable for FPGA's just fine.  The only trick is 
when you definetly want to infer Block RAMs.  In that case, it's best
to check the Xilinx Docs, and use their templates, with little modification.

You can modify the Xilinx template, for instance, to make the RAM width,
and depth a parameter.  But stray to far, and it may trip up.  And when 
I say trip up - I mean it'll synthesize to something that matches your
description - however it may mess up and build it up out of FFs instead
of Block RAMS. (You may also optionally attach a pragma to FORCE it
to map to FFs - in the case you mentioned above where you may want 
the faster access.  Just don't make it a very big array!)

Play with it when you have time.  It's an excellent tool in your toolbox.

Regards,

Mark

Article: 158486
Subject: Re: Simulation vs Synthesis
From: rickman <gnuarm@gmail.com>
Date: Thu, 3 Dec 2015 00:01:29 -0500
Links: << >> << T >> << A >>

On 12/2/2015 6:27 PM, BobH wrote:
> On 12/2/2015 10:55 AM, Mark Curry wrote:
>> In article <n3lkei021un@news3.nntpjunkie.com>,
>> <snip>
>>
>> Huh.  I missed what led up to this, but explicity coding up each case
>> like this is entirely unneccesary in verilog.
>>
>> reg [ 7 : 0] cell [ 3 : 0];
>> always @( posedge clk ) // NO ASYNC RESET - messes up optimization -
>> no reset at all actually is prefered
>>    if( write_en )
>>      cell[ address ] <= write_data;
>>
>> always @*
>>    read_data = cell[ address ];
>>
>> Done.  If reset's are needed then it won't map to block RAM.
>> Xilinx has examples in their docs for how to successfully infer block
>> RAM.
>>
>
> Thanks! I have always explicitly built a model RAM when I wanted RAM in
> an FPGA rather than inferring one. I just automatically include the
> reset when I do D flops because it makes the simulation cleaner.
>    I think that the original poster wanted an array of flops thinking
> that they would be faster than block ram.
>
> This looks worth messing with when I get some breathing space. I am a
> little curious about the synthesizabilty of it.

I'm not sure the above is a correct model for block RAMs in many 
devices.  The ones I have used have a register delay even in the read 
path.   There can be separate interfaces (address, controls and data) 
for reading and writing, but in all cases the read data is registered.

What devices will this model work for?  Or maybe I'm not so familiar 
with Verilog.  The read path in the above description is async, no?

-- 

Rick

Article: 158487
Subject: Re: Simulation vs Synthesis
From: rickman <gnuarm@gmail.com>
Date: Thu, 3 Dec 2015 00:12:08 -0500
Links: << >> << T >> << A >>

On 12/2/2015 1:01 PM, Simon wrote:
> On Wednesday, December 2, 2015 at 8:28:10 AM UTC-8, rickman wrote:
>>
>>> I actually might try implementing a module along the lines of
>>> BobH's code above - rather than just declaring the register
>>> array, and see how that works out. At the moment I'm busy writing
>>> unit tests :)
>>
>> Now I am lost again.  Why are you trying to change the code that
>> is giving you 256 registers?  The only RAM in FPGAs these days is
>> synchronous RAM.  If you don't want the address register delay then
>> your only choice is to use fabric FFs.
>
>
> Maybe I'm reading/understanding it incorrectly - it looks to me that
> there's an always @ (posedge(clk)) dependency for writes - but I'm
> relatively fine with that - I won't need the data until the next
> clock anyway if I'm writing, because that's how the 6502 worked.

My understanding is that all block RAM have a register in the read path, 
I've always considered there is a register in the input side of address, 
data in and control rather than worrying about any internal details.  It 
all works the same.

Looks like I had forgotten about the distributed RAM.  It has async read 
and sync write.  So your model will work just fine.


> For reads, it looked to me as though it used always @ (*), and I
> (perhaps incorrectly) thought that would get me the results on the
> module's data bus as soon as the 'address' lines changed.
>
> As for why to change it, I don't like it when I don't understand the
> error/info messages the tool is giving me. Given my (relatively
> limited) understanding of what the synthesis tool is actually *doing*
> under the hood, it probably means I'm not getting what I actually
> want, or if I am, it's in some highly-inefficient manner. Your
> comment about inferring extra adders unnecessarily is pretty relevant
> I feel :)

Now that my misunderstanding is straightened out I see what you are 
saying.  I don't understand the error message either, but then I can't 
see the code.

Try isolating the error to a smaller section of code.  Obviously there 
is something else going on that it thinks an 8 bit address RAM is being 
indexed by a 32 bit value.  I expect it has something to do with the way 
you are using the array rather than the way you are declaring it.


> It does tie me to a single write/read per clock, whereas I could set
> N registers per clock (and thus "push" 3 elements onto the stack for
> the BRK instruction in a single clock for example), but I'm actually
> ok with that too, I think. The 6502 only had 1 databus, so *it* took
> multiple clocks to do multiple writes as well.
>
> Its entirely possible my understanding of the module is flawed. I'm
> happy to be corrected :)

-- 

Rick

Article: 158488
Subject: Re: Simulation vs Synthesis
From: rickman <gnuarm@gmail.com>
Date: Thu, 3 Dec 2015 00:13:48 -0500
Links: << >> << T >> << A >>

On 12/3/2015 12:01 AM, rickman wrote:
> On 12/2/2015 6:27 PM, BobH wrote:
>> On 12/2/2015 10:55 AM, Mark Curry wrote:
>>> In article <n3lkei021un@news3.nntpjunkie.com>,
>>> <snip>
>>>
>>> Huh.  I missed what led up to this, but explicity coding up each case
>>> like this is entirely unneccesary in verilog.
>>>
>>> reg [ 7 : 0] cell [ 3 : 0];
>>> always @( posedge clk ) // NO ASYNC RESET - messes up optimization -
>>> no reset at all actually is prefered
>>>    if( write_en )
>>>      cell[ address ] <= write_data;
>>>
>>> always @*
>>>    read_data = cell[ address ];
>>>
>>> Done.  If reset's are needed then it won't map to block RAM.
>>> Xilinx has examples in their docs for how to successfully infer block
>>> RAM.
>>>
>>
>> Thanks! I have always explicitly built a model RAM when I wanted RAM in
>> an FPGA rather than inferring one. I just automatically include the
>> reset when I do D flops because it makes the simulation cleaner.
>>    I think that the original poster wanted an array of flops thinking
>> that they would be faster than block ram.
>>
>> This looks worth messing with when I get some breathing space. I am a
>> little curious about the synthesizabilty of it.
>
> I'm not sure the above is a correct model for block RAMs in many
> devices.  The ones I have used have a register delay even in the read
> path.   There can be separate interfaces (address, controls and data)
> for reading and writing, but in all cases the read data is registered.
>
> What devices will this model work for?  Or maybe I'm not so familiar
> with Verilog.  The read path in the above description is async, no?

I Googled and found the distributed RAM in the Xilinx parts support 
async reads.  So I am clear on this now.  I must have forgotten this.

-- 

Rick

Article: 158489
Subject: Re: Simulation vs Synthesis
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 3 Dec 2015 06:34:37 +0000 (UTC)
Links: << >> << T >> << A >>

rickman <gnuarm@gmail.com> wrote:
> On 12/2/2015 6:27 PM, BobH wrote:
>> On 12/2/2015 10:55 AM, Mark Curry wrote:

(snip)
>>> reg [ 7 : 0] cell [ 3 : 0];
>>> always @( posedge clk ) // NO ASYNC RESET - messes up optimization -
>>> no reset at all actually is prefered
>>>    if( write_en )
>>>      cell[ address ] <= write_data;

>>> always @*
>>>    read_data = cell[ address ];

(snip)
>> Thanks! I have always explicitly built a model RAM when I wanted RAM in
>> an FPGA rather than inferring one. I just automatically include the
>> reset when I do D flops because it makes the simulation cleaner.
>>    I think that the original poster wanted an array of flops thinking
>> that they would be faster than block ram.

(snip) 
> I'm not sure the above is a correct model for block RAMs in many 
> devices.  The ones I have used have a register delay even in the read 
> path.   There can be separate interfaces (address, controls and data) 
> for reading and writing, but in all cases the read data is registered.
 
> What devices will this model work for?  Or maybe I'm not so familiar 
> with Verilog.  The read path in the above description is async, no?

Yes that has async. read and sync. write, and that doesn't
work with the usual block RAM. 

I am not sure if it wants the register before, or after, or
if it doesn't matter.

-- glen

Article: 158490
Subject: Re: Simulation vs Synthesis
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 3 Dec 2015 06:40:32 +0000 (UTC)
Links: << >> << T >> << A >>

rickman <gnuarm@gmail.com> wrote:

(snip)

> I Googled and found the distributed RAM in the Xilinx parts support 
> async reads.  So I am clear on this now.  I must have forgotten this.

The distributed RAM is just the usual LUTs, so support asynchronous
read the same way they do when they are gates. I think they also
support asynchronous write, but that is less obvious.

-- glen

Article: 158491
Subject: Re: Simulation vs Synthesis
From: rickman <gnuarm@gmail.com>
Date: Thu, 3 Dec 2015 01:52:53 -0500
Links: << >> << T >> << A >>

On 12/3/2015 1:40 AM, glen herrmannsfeldt wrote:
> rickman <gnuarm@gmail.com> wrote:
>
> (snip)
>
>> I Googled and found the distributed RAM in the Xilinx parts support
>> async reads.  So I am clear on this now.  I must have forgotten this.
>
> The distributed RAM is just the usual LUTs, so support asynchronous
> read the same way they do when they are gates. I think they also
> support asynchronous write, but that is less obvious.

No, they do not support async writes.  I recall it was in the XC4000 
series they got rid of async writes because they had so much trouble 
supporting it.  Basically there were too many users who didn't know how 
to properly use async memory.  There may have been some technical 
advantages to using a sync write for the FPGA designers, but I am pretty 
sure it was really an issue of complaints that it didn't work right 
which really meant they were not meeting the specs on the pulse width of 
the write strobe.  Async RAM has a lot of timing details to meet 
compared to the sync version.  With sync it is basically just setup and 
hold of the inputs.

-- 

Rick

Article: 158492
Subject: Re: Simulation vs Synthesis
From: rickman <gnuarm@gmail.com>
Date: Thu, 3 Dec 2015 01:56:31 -0500
Links: << >> << T >> << A >>

On 12/3/2015 1:34 AM, glen herrmannsfeldt wrote:
> rickman <gnuarm@gmail.com> wrote:
>> On 12/2/2015 6:27 PM, BobH wrote:
>>> On 12/2/2015 10:55 AM, Mark Curry wrote:
>
> (snip)
>>>> reg [ 7 : 0] cell [ 3 : 0];
>>>> always @( posedge clk ) // NO ASYNC RESET - messes up optimization -
>>>> no reset at all actually is prefered
>>>>     if( write_en )
>>>>       cell[ address ] <= write_data;
>
>>>> always @*
>>>>     read_data = cell[ address ];
>
> (snip)
>>> Thanks! I have always explicitly built a model RAM when I wanted RAM in
>>> an FPGA rather than inferring one. I just automatically include the
>>> reset when I do D flops because it makes the simulation cleaner.
>>>     I think that the original poster wanted an array of flops thinking
>>> that they would be faster than block ram.
>
> (snip)
>> I'm not sure the above is a correct model for block RAMs in many
>> devices.  The ones I have used have a register delay even in the read
>> path.   There can be separate interfaces (address, controls and data)
>> for reading and writing, but in all cases the read data is registered.
>
>> What devices will this model work for?  Or maybe I'm not so familiar
>> with Verilog.  The read path in the above description is async, no?
>
> Yes that has async. read and sync. write, and that doesn't
> work with the usual block RAM.
>
> I am not sure if it wants the register before, or after, or
> if it doesn't matter.

I'm not sure what you mean.  Before or after what exactly?

-- 

Rick

Article: 158493
Subject: Re: Simulation vs Synthesis
From: "jt_eaton" <84408@FPGARelated>
Date: Thu, 03 Dec 2015 10:27:19 -0600
Links: << >> << T >> << A >>

>>
>> Yes that has async. read and sync. write, and that doesn't
>> work with the usual block RAM.
>>
>> I am not sure if it wants the register before, or after, or
>> if it doesn't matter.
>
>I'm not sure what you mean.  Before or after what exactly?
>
>-- 
>
>Rick

Do you put a register before or after the ram array.

You can register the addresses and then do an asynchronous read or you can
do an asynchronous read and then register the data.


The difference is known as writethru. If you have a dual port sram and do
both a read and write operation to the same address in the same cycle then
do you read the old data or the new?


In the first case you will get the new data while the second case will
give you the old data. 

In the first case the write data is written though the sram to become the
read data.

Selection depends on the circuit needs. If you are using sram in a fifo
then writting to a completely full fifo on exactly the same cycle that
data is popped off will not work with writethru. You want
to pop off the oldest and replace it with the newest.

If sram is a cpu register bank and you store in register X followed by an
instruction the uses register X then pipelining will read the new data on
the same cycle that it writes it to ram. In that case you must have
writethru.

John Eaton

---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158494
Subject: Re: Simulation vs Synthesis
From: rickman <gnuarm@gmail.com>
Date: Thu, 3 Dec 2015 11:49:16 -0500
Links: << >> << T >> << A >>

On 12/3/2015 11:27 AM, jt_eaton wrote:
>>>
>>> Yes that has async. read and sync. write, and that doesn't
>>> work with the usual block RAM.
>>>
>>> I am not sure if it wants the register before, or after, or
>>> if it doesn't matter.
>>
>> I'm not sure what you mean.  Before or after what exactly?
>>
>> --
>>
>> Rick
>
> Do you put a register before or after the ram array.
>
> You can register the addresses and then do an asynchronous read or you can
> do an asynchronous read and then register the data.
>
>
> The difference is known as writethru. If you have a dual port sram and do
> both a read and write operation to the same address in the same cycle then
> do you read the old data or the new?

Yes.


> In the first case you will get the new data while the second case will
> give you the old data.
>
> In the first case the write data is written though the sram to become the
> read data.
>
> Selection depends on the circuit needs. If you are using sram in a fifo
> then writting to a completely full fifo on exactly the same cycle that
> data is popped off will not work with writethru. You want
> to pop off the oldest and replace it with the newest.
>
> If sram is a cpu register bank and you store in register X followed by an
> instruction the uses register X then pipelining will read the new data on
> the same cycle that it writes it to ram. In that case you must have
> writethru.

Different vendors give the modes different names, but essentially on 
block RAM writes the read data can be the old data, the new data or the 
read data port is held at the last value with no change.  None of this 
is affected by where you put the registers in your HDL.  This is 
typically controlled by attributes.

-- 

Rick

Article: 158495
Subject: Re: Simulation vs Synthesis
From: Mike Field <mikefield1969@gmail.com>
Date: Thu, 3 Dec 2015 09:41:13 -0800 (PST)
Links: << >> << T >> << A >>

>      INFO: [Synth 8-5545] ROM "zp_reg[255]" won't be mapped to RAM because 
> address size (32) is larger than maximum supported(25)" 


The problem might be How are you indexing that small block of registers - is the address being used to index zp_reg also 8 bits?

Also, why 255 elements and not 256?

Mike

Article: 158496
Subject: Re: Simulation vs Synthesis
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 3 Dec 2015 17:59:32 +0000 (UTC)
Links: << >> << T >> << A >>

rickman <gnuarm@gmail.com> wrote:
(snip)
>>>>> always @( posedge clk ) // NO ASYNC RESET - messes up optimization -
>>>>> no reset at all actually is prefered
>>>>>     if( write_en )
>>>>>       cell[ address ] <= write_data;

>>>>> always @*
>>>>>     read_data = cell[ address ];

(snip)
>> Yes that has async. read and sync. write, and that doesn't
>> work with the usual block RAM.

>> I am not sure if it wants the register before, or after, or
>> if it doesn't matter.
 
> I'm not sure what you mean.  Before or after what exactly?

For the case of reading, so consider a ROM, do you put the
register on the address inputs, or the data outputs?

Or, since the difference is only delay, can the synthesis tools
move it from one to the other?

-- glen

Article: 158497
Subject: Re: Simulation vs Synthesis
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 3 Dec 2015 18:06:26 +0000 (UTC)
Links: << >> << T >> << A >>

jt_eaton <84408@fpgarelated> wrote:
 
(snip, I wrote)
>>> I am not sure if it wants the register before, or after, or
>>> if it doesn't matter.

>>I'm not sure what you mean.  Before or after what exactly?

(snip) 
> Do you put a register before or after the ram array.

Yes that is what I meant.

> You can register the addresses and then do an asynchronous read or 
> you can do an asynchronous read and then register the data.
 
> The difference is known as writethru. If you have a dual port sram and do
> both a read and write operation to the same address in the same cycle then
> do you read the old data or the new?
 
> In the first case you will get the new data while the second case will
> give you the old data. 
 
> In the first case the write data is written though the sram to 
> become the read data.
 
> Selection depends on the circuit needs. If you are using sram in a fifo
> then writting to a completely full fifo on exactly the same cycle that
> data is popped off will not work with writethru. You want
> to pop off the oldest and replace it with the newest.

If the FIFO has the same clock for both, then I suppose you can
do that. With asynchronous read and write, you can't really do that,
as you can't prevent the read from coming just slightly after
the write.

Most FIFOs have an "almost full" that helps avoid that, and also
allows for other delays in stopping data come in. 
 
> If sram is a cpu register bank and you store in register X followed by an
> instruction the uses register X then pipelining will read the new data on
> the same cycle that it writes it to ram. In that case you must have
> writethru.

Or you add extra logic to bypass the RAM in that case.

-- glen

Article: 158498
Subject: Re: Simulation vs Synthesis
From: gtwrek@sonic.net (Mark Curry)
Date: Thu, 3 Dec 2015 18:24:40 -0000 (UTC)
Links: << >> << T >> << A >>

In article <n3oi6n$23b$1@dont-email.me>, rickman  <gnuarm@gmail.com> wrote:
>On 12/2/2015 6:27 PM, BobH wrote:
>> On 12/2/2015 10:55 AM, Mark Curry wrote:
>>> In article <n3lkei021un@news3.nntpjunkie.com>,
>>> <snip>
>>>
>>> Huh.  I missed what led up to this, but explicity coding up each case
>>> like this is entirely unneccesary in verilog.
>>>
>>> reg [ 7 : 0] cell [ 3 : 0];
>>> always @( posedge clk ) // NO ASYNC RESET - messes up optimization -
>>> no reset at all actually is prefered
>>>    if( write_en )
>>>      cell[ address ] <= write_data;
>>>
>>> always @*
>>>    read_data = cell[ address ];
>>>
>>> Done.  If reset's are needed then it won't map to block RAM.
>>> Xilinx has examples in their docs for how to successfully infer block
>>> RAM.
>>>
>>
>> Thanks! I have always explicitly built a model RAM when I wanted RAM in
>> an FPGA rather than inferring one. I just automatically include the
>> reset when I do D flops because it makes the simulation cleaner.
>>    I think that the original poster wanted an array of flops thinking
>> that they would be faster than block ram.
>>
>> This looks worth messing with when I get some breathing space. I am a
>> little curious about the synthesizabilty of it.
>
>I'm not sure the above is a correct model for block RAMs in many 
>devices.  The ones I have used have a register delay even in the read 
>path.   There can be separate interfaces (address, controls and data) 
>for reading and writing, but in all cases the read data is registered.
>
>What devices will this model work for?  Or maybe I'm not so familiar 
>with Verilog.  The read path in the above description is async, no?

Rick, 

My only real point in the above code was showing it was possible to
index into a multi-dimensional array in Verilog in synthesizable code.
One doesn't need to explicity code out each index.  
Synthesis WILL build SOMETHING for all of these variations.  It's 
all synthesizable.

Now, if you're intending to map specifically to BLOCK, or Distributed 
memories, then I strongly suggestions checking the vendor documentation, 
and using their templates.  It's easy to trip up the tools, and have 
them not build what you intended.  Your example is a simple one.  If 
you want to generate a BRAM, then you must register your read data 
(as well as your write).  Missing this, you'll get Distributed (or FFs!).

Regards,

Mark

Article: 158499
Subject: Re: Simulation vs Synthesis
From: rickman <gnuarm@gmail.com>
Date: Thu, 3 Dec 2015 13:57:41 -0500
Links: << >> << T >> << A >>

On 12/3/2015 12:59 PM, glen herrmannsfeldt wrote:
> rickman <gnuarm@gmail.com> wrote:
> (snip)
>>>>>> always @( posedge clk ) // NO ASYNC RESET - messes up optimization -
>>>>>> no reset at all actually is prefered
>>>>>>      if( write_en )
>>>>>>        cell[ address ] <= write_data;
>
>>>>>> always @*
>>>>>>      read_data = cell[ address ];
>
> (snip)
>>> Yes that has async. read and sync. write, and that doesn't
>>> work with the usual block RAM.
>
>>> I am not sure if it wants the register before, or after, or
>>> if it doesn't matter.
>
>> I'm not sure what you mean.  Before or after what exactly?
>
> For the case of reading, so consider a ROM, do you put the
> register on the address inputs, or the data outputs?
>
> Or, since the difference is only delay, can the synthesis tools
> move it from one to the other?

Rather than try to guess what is happening, just read the vendor's 
documentation and copy their examples for inferring RAM.  I know Xilinx 
gives this info.  I looked at an 8 year old document from Lattice and 
they say there are enough subtle differences between vendors that there 
is little point to inferring block RAM, so just instantiate it, (a newer 
document may have different recommendations).  I don't like that and 
have never had any trouble with inference.  I always put the registers 
at the inputs to the RAM as in some families there is an optional 
additional register on the data output.  Otherwise I expect there is no 
difference based on where you put it...

-- 

Rick

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search