Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search

Messages from 158500

Article: 158500
Subject: Re: Simulation vs Synthesis
From: Simon <google@gornall.net>
Date: Thu, 3 Dec 2015 22:07:21 -0800 (PST)
Links: << >>  << T >>  << A >>
On Thursday, December 3, 2015 at 9:41:18 AM UTC-8, Mike Field wrote:
> >      INFO: [Synth 8-5545] ROM "zp_reg[255]" won't be mapped to RAM because 
> > address size (32) is larger than maximum supported(25)" 
> 
> 
> The problem might be How are you indexing that small block of registers - is the address being used to index zp_reg also 8 bits?

Yep. The only part that references anything larger than 8 bits is where I select out 8 bits from a 32-bit register, using constant bit-ranges.

`define 		W			8
`define 		NW			(`W-1)

reg   		 [`NW:0]		stack[0:255];         // Stack-page 
reg			[`NW:0]		SP;				// Stack pointer

	...

	if ((action & `UPDATE_SP) == `UPDATE_SP)
		begin
			if (numSPBytes == 1)
				begin
					stack[SP]	<= newSPData[`NW:0];
					SP			<= SP - 1;
				end
			else if (numSPBytes == 2)
				begin
					stack[SP]	<= newSPData[`NW:0];
					stack[SP-1]	<= newSPData[`W*2-1:`W];
					SP			<= SP - 2;
				end
			else if (numSPBytes == 3)
				begin
					stack[SP]	<= newSPData[`NW:0];
					stack[SP-1]	<= newSPData[`W*2-1:`W];
					stack[SP-2]	<= newSPData[`W*3-1:`W*2];
					SP			<= SP - 3;
				end
		end


> Also, why 255 elements and not 256?

Well, it goes from 0 to 255 inclusive, so there are 256 entries.

Cheers
   Simon

Article: 158501
Subject: Re: Simulation vs Synthesis
From: rickman <gnuarm@gmail.com>
Date: Fri, 4 Dec 2015 01:35:06 -0500
Links: << >>  << T >>  << A >>
On 12/4/2015 1:07 AM, Simon wrote:
> On Thursday, December 3, 2015 at 9:41:18 AM UTC-8, Mike Field wrote:
>>>       INFO: [Synth 8-5545] ROM "zp_reg[255]" won't be mapped to RAM because
>>> address size (32) is larger than maximum supported(25)"
>>
>>
>> The problem might be How are you indexing that small block of registers - is the address being used to index zp_reg also 8 bits?
>
> Yep. The only part that references anything larger than 8 bits is where I select out 8 bits from a 32-bit register, using constant bit-ranges.
>
> `define 		W			8
> `define 		NW			(`W-1)
>
> reg   		 [`NW:0]		stack[0:255];         // Stack-page
> reg			[`NW:0]		SP;				// Stack pointer
>
> 	...
>
> 	if ((action & `UPDATE_SP) == `UPDATE_SP)
> 		begin
> 			if (numSPBytes == 1)
> 				begin
> 					stack[SP]	<= newSPData[`NW:0];
> 					SP			<= SP - 1;
> 				end
> 			else if (numSPBytes == 2)
> 				begin
> 					stack[SP]	<= newSPData[`NW:0];
> 					stack[SP-1]	<= newSPData[`W*2-1:`W];
> 					SP			<= SP - 2;
> 				end
> 			else if (numSPBytes == 3)
> 				begin
> 					stack[SP]	<= newSPData[`NW:0];
> 					stack[SP-1]	<= newSPData[`W*2-1:`W];
> 					stack[SP-2]	<= newSPData[`W*3-1:`W*2];
> 					SP			<= SP - 3;
> 				end
> 		end
>
>
>> Also, why 255 elements and not 256?
>
> Well, it goes from 0 to 255 inclusive, so there are 256 entries.

Where is zp_reg[255] declared?  That is the ROM being complained about. 
  I assume it is considering it a ROM because it is not written to 
anywhere.

-- 

Rick

Article: 158502
Subject: Re: Simulation vs Synthesis
From: Simon <google@gornall.net>
Date: Thu, 3 Dec 2015 22:58:49 -0800 (PST)
Links: << >>  << T >>  << A >>
On Thursday, December 3, 2015 at 10:35:10 PM UTC-8, rickman wrote:
> On 12/4/2015 1:07 AM, Simon wrote:
> > On Thursday, December 3, 2015 at 9:41:18 AM UTC-8, Mike Field wrote:
> >>>       INFO: [Synth 8-5545] ROM "zp_reg[255]" won't be mapped to RAM because
> >>> address size (32) is larger than maximum supported(25)"
> >>
> >>
> >> The problem might be How are you indexing that small block of registers - is the address being used to index zp_reg also 8 bits?
> >
> > Yep. The only part that references anything larger than 8 bits is where I select out 8 bits from a 32-bit register, using constant bit-ranges.
> >
> > `define 		W			8
> > `define 		NW			(`W-1)
> >
> > reg   		 [`NW:0]		stack[0:255];         // Stack-page
> > reg			[`NW:0]		SP;				// Stack pointer
> >
> > 	...
> >
> > 	if ((action & `UPDATE_SP) == `UPDATE_SP)
> > 		begin
> > 			if (numSPBytes == 1)
> > 				begin
> > 					stack[SP]	<= newSPData[`NW:0];
> > 					SP			<= SP - 1;
> > 				end
> > 			else if (numSPBytes == 2)
> > 				begin
> > 					stack[SP]	<= newSPData[`NW:0];
> > 					stack[SP-1]	<= newSPData[`W*2-1:`W];
> > 					SP			<= SP - 2;
> > 				end
> > 			else if (numSPBytes == 3)
> > 				begin
> > 					stack[SP]	<= newSPData[`NW:0];
> > 					stack[SP-1]	<= newSPData[`W*2-1:`W];
> > 					stack[SP-2]	<= newSPData[`W*3-1:`W*2];
> > 					SP			<= SP - 3;
> > 				end
> > 		end
> >
> >
> >> Also, why 255 elements and not 256?
> >
> > Well, it goes from 0 to 255 inclusive, so there are 256 entries.
> 
> Where is zp_reg[255] declared?  That is the ROM being complained about. 
>   I assume it is considering it a ROM because it is not written to 
> anywhere.

They're both declared in the same module - you can see them at http://0x0000ff.com/6502/ (specifically http://0x0000ff.com/6502/6502.v towards the top) and they're both triggering the message...

INFO: [Synth 8-5545] ROM "stack_reg[0]" won't be mapped to RAM because address size (32) is larger than maximum supported(25)

Both 'zp' and 'stack' are in fact written to - you can see the logic in `EXECUTE in 6502.v above under `STORE_MEM. I'm not sure why it thinks it's a ROM. Unless that logic is being optimised away for the time being, of course. I haven't checked that yet.

Cheers
   Simon



Article: 158503
Subject: Re: Simulation vs Synthesis
From: rickman <gnuarm@gmail.com>
Date: Fri, 4 Dec 2015 02:18:49 -0500
Links: << >>  << T >>  << A >>
On 12/4/2015 1:58 AM, Simon wrote:
> On Thursday, December 3, 2015 at 10:35:10 PM UTC-8, rickman wrote:
>> On 12/4/2015 1:07 AM, Simon wrote:
>>> On Thursday, December 3, 2015 at 9:41:18 AM UTC-8, Mike Field wrote:
>>>>>        INFO: [Synth 8-5545] ROM "zp_reg[255]" won't be mapped to RAM because
>>>>> address size (32) is larger than maximum supported(25)"
>>>>
>>>>
>>>> The problem might be How are you indexing that small block of registers - is the address being used to index zp_reg also 8 bits?
>>>
>>> Yep. The only part that references anything larger than 8 bits is where I select out 8 bits from a 32-bit register, using constant bit-ranges.
>>>
>>> `define 		W			8
>>> `define 		NW			(`W-1)
>>>
>>> reg   		 [`NW:0]		stack[0:255];         // Stack-page
>>> reg			[`NW:0]		SP;				// Stack pointer
>>>
>>> 	...
>>>
>>> 	if ((action & `UPDATE_SP) == `UPDATE_SP)
>>> 		begin
>>> 			if (numSPBytes == 1)
>>> 				begin
>>> 					stack[SP]	<= newSPData[`NW:0];
>>> 					SP			<= SP - 1;
>>> 				end
>>> 			else if (numSPBytes == 2)
>>> 				begin
>>> 					stack[SP]	<= newSPData[`NW:0];
>>> 					stack[SP-1]	<= newSPData[`W*2-1:`W];
>>> 					SP			<= SP - 2;
>>> 				end
>>> 			else if (numSPBytes == 3)
>>> 				begin
>>> 					stack[SP]	<= newSPData[`NW:0];
>>> 					stack[SP-1]	<= newSPData[`W*2-1:`W];
>>> 					stack[SP-2]	<= newSPData[`W*3-1:`W*2];
>>> 					SP			<= SP - 3;
>>> 				end
>>> 		end
>>>
>>>
>>>> Also, why 255 elements and not 256?
>>>
>>> Well, it goes from 0 to 255 inclusive, so there are 256 entries.
>>
>> Where is zp_reg[255] declared?  That is the ROM being complained about.
>>    I assume it is considering it a ROM because it is not written to
>> anywhere.
>
> They're both declared in the same module - you can see them at http://0x0000ff.com/6502/ (specifically http://0x0000ff.com/6502/6502.v towards the top) and they're both triggering the message...
>
> INFO: [Synth 8-5545] ROM "stack_reg[0]" won't be mapped to RAM because address size (32) is larger than maximum supported(25)
>
> Both 'zp' and 'stack' are in fact written to - you can see the logic in `EXECUTE in 6502.v above under `STORE_MEM. I'm not sure why it thinks it's a ROM. Unless that logic is being optimised away for the time being, of course. I haven't checked that yet.

I am not familiar with Verilog, but I know it does various things that 
are not obvious by looking at the code.  That is the main difference 
with VHDL.  The statements...

stack[i]    = 0;
zp[i]       = i+8'h40;

both index the memories by 'i' which is an integer.  Does Verilog do 
something like assume the memory has to be of a range 0:2^32-1 because 
the index is 32 bits?

It has been awhile since I did much with VHDL, but I'm pretty sure the 
declarations define the size of the memory and using an integer index 
will only get you in trouble in simulation if the index value is out of 
range.  This would not be a problem in synthesis as long as it doesn't 
use block RAM.  This code initializes the memory and you can't 
initialize block RAM on reset, only on configuration.

-- 

Rick

Article: 158504
Subject: Re: Simulation vs Synthesis
From: Alan Reynolds <abreynolds@me.com>
Date: Fri, 4 Dec 2015 09:44:02 -0500
Links: << >>  << T >>  << A >>
This is a multi-part message in MIME format.

----------------7918460571023645663
Content-Type: text/plain; charset=iso-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

On 2015-12-04 07:18:49 +0000, rickman said:

> On 12/4/2015 1:58 AM, Simon wrote:
>> On Thursday, December 3, 2015 at 10:35:10 PM UTC-8, rickman wrote:
>>> On 12/4/2015 1:07 AM, Simon wrote:
>>>> On Thursday, December 3, 2015 at 9:41:18 AM UTC-8, Mike Field wrote:
>>>>>> INFO: [Synth 8-5545] ROM "zp_reg[255]" won't be mapped to RAM because
>>>>>> address size (32) is larger than maximum supported(25)"
>>>>> 
>>>>> 
>>>>> The problem might be How are you indexing that small block of registers 
>>>>> - is the address being used to index zp_reg also 8 bits?
>>>> 
>>>> Yep. The only part that references anything larger than 8 bits is where 
>>>> I select out 8 bits from a 32-bit register, using constant bit-ranges.
>>>> 
>>>> `define 		W			8
>>>> `define 		NW			(`W-1)
>>>> 
>>>> reg   		 [`NW:0]		stack[0:255];         // Stack-page
>>>> reg			[`NW:0]		SP;				// Stack pointer
>>>> 
>>>> 	...
>>>> 
>>>> 	if ((action & `UPDATE_SP) == `UPDATE_SP)
>>>> 		begin
>>>> 			if (numSPBytes == 1)
>>>> 				begin
>>>> 					stack[SP]	<= newSPData[`NW:0];
>>>> 					SP			<= SP - 1;
>>>> 				end
>>>> 			else if (numSPBytes == 2)
>>>> 				begin
>>>> 					stack[SP]	<= newSPData[`NW:0];
>>>> 					stack[SP-1]	<= newSPData[`W*2-1:`W];
>>>> 					SP			<= SP - 2;
>>>> 				end
>>>> 			else if (numSPBytes == 3)
>>>> 				begin
>>>> 					stack[SP]	<= newSPData[`NW:0];
>>>> 					stack[SP-1]	<= newSPData[`W*2-1:`W];
>>>> 					stack[SP-2]	<= newSPData[`W*3-1:`W*2];
>>>> 					SP			<= SP - 3;
>>>> 				end
>>>> 		end
>>>> 
>>>> 
>>>>> Also, why 255 elements and not 256?
>>>> 
>>>> Well, it goes from 0 to 255 inclusive, so there are 256 entries.
>>> 
>>> Where is zp_reg[255] declared?  That is the ROM being complained about.
>>> I assume it is considering it a ROM because it is not written to
>>> anywhere.
>> 
>> They're both declared in the same module - you can see them at 
>> http://0x0000ff.com/6502/ (specifically http://0x0000ff.com/6502/6502.v 
>> towards the top) and they're both triggering the message...
>> 
>> INFO: [Synth 8-5545] ROM "stack_reg[0]" won't be mapped to RAM because 
>> address size (32) is larger than maximum supported(25)
>> 
>> Both 'zp' and 'stack' are in fact written to - you can see the logic in 
>> `EXECUTE in 6502.v above under `STORE_MEM. I'm not sure why it thinks 
>> it's a ROM. Unless that logic is being optimised away for the time 
>> being, of course. I haven't checked that yet.
> 
> I am not familiar with Verilog, but I know it does various things that 
> are not obvious by looking at the code.  That is the main difference 
> with VHDL.  The statements...
> 
> stack[i]    = 0;
> zp[i]       = i+8'h40;
> 
> both index the memories by 'i' which is an integer.  Does Verilog do 
> something like assume the memory has to be of a range 0:2^32-1 because 
> the index is 32 bits?
> 
> It has been awhile since I did much with VHDL, but I'm pretty sure the 
> declarations define the size of the memory and using an integer index 
> will only get you in trouble in simulation if the index value is out of 
> range.  This would not be a problem in synthesis as long as it doesn't 
> use block RAM.  This code initializes the memory and you can't 
> initialize block RAM on reset, only on configuration.

Rick is correct. This code:
					for (i=0; i<256; i=i+1)
						begin
						   zp[i]       = i+8'h40;
                           stack[i]    = 0;
                        end
accesses zp with an integer address which is 32 bits. It is also 
supposed to be a ROM which means that this code snippet (without the 
stack[i]) should be in an "initial" statement, not an "always" 
statement for correct inferencing by humans and synthesis tools.

----------------7918460571023645663
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta http-equiv="Content-Style-Type" content="text/css">
<title></title>
<meta name="Generator" content="Cocoa HTML Writer">
<meta name="CocoaVersion" content="1404.13">
<style type="text/css">
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 15.0px; font: 14.0px Helvetica; color: #000000}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 15.0px; font: 14.0px Helvetica; color: #000000; min-height: 17.0px}
p.p3 {margin: 0.0px 0.0px 0.0px 12.0px; line-height: 17.0px; font: 14.0px Helvetica; color: #011892}
p.p4 {margin: 0.0px 0.0px 0.0px 24.0px; font: 14.0px Helvetica; color: #008e00}
p.p5 {margin: 0.0px 0.0px 0.0px 36.0px; font: 14.0px Helvetica; color: #941100}
p.p6 {margin: 0.0px 0.0px 0.0px 48.0px; font: 14.0px Helvetica; color: #011892}
p.p7 {margin: 0.0px 0.0px 0.0px 72.0px; font: 14.0px Helvetica; color: #941100}
p.p8 {margin: 0.0px 0.0px 0.0px 60.0px; font: 14.0px Helvetica; color: #008e00; min-height: 17.0px}
p.p9 {margin: 0.0px 0.0px 0.0px 60.0px; font: 14.0px Helvetica; color: #008e00}
p.p10 {margin: 0.0px 0.0px 0.0px 48.0px; font: 14.0px Helvetica; color: #011892; min-height: 17.0px}
p.p11 {margin: 0.0px 0.0px 0.0px 36.0px; font: 14.0px Helvetica; color: #941100; min-height: 17.0px}
p.p12 {margin: 0.0px 0.0px 0.0px 24.0px; font: 14.0px Helvetica; color: #008e00; min-height: 17.0px}
p.p13 {margin: 0.0px 0.0px 0.0px 12.0px; font: 14.0px Helvetica; color: #011892; min-height: 17.0px}
p.p14 {margin: 0.0px 0.0px 0.0px 12.0px; font: 14.0px Helvetica; color: #011892}
p.p15 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Helvetica; color: #000000; min-height: 17.0px}
p.p16 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Helvetica; color: #000000}
p.p17 {margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Courier; -webkit-text-stroke: #000000}
p.p18 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Helvetica}
p.p19 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Helvetica; min-height: 17.0px}
span.s1 {font-kerning: none}
span.Apple-tab-span {white-space:pre}
</style>
</head>
<body>
<p class="p1">On 2015-12-04 07:18:49 +0000, rickman said:</p>
<p class="p2"><br></p>
<p class="p3">On 12/4/2015 1:58 AM, Simon wrote:</p>
<p class="p4">On Thursday, December 3, 2015 at 10:35:10 PM UTC-8, rickman wrote:</p>
<p class="p5">On 12/4/2015 1:07 AM, Simon wrote:</p>
<p class="p6">On Thursday, December 3, 2015 at 9:41:18 AM UTC-8, Mike Field wrote:</p>
<p class="p7">INFO: [Synth 8-5545] ROM "zp_reg[255]" won't be mapped to RAM because</p>
<p class="p7">address size (32) is larger than maximum supported(25)"</p>
<p class="p8"><br></p>
<p class="p8"><br></p>
<p class="p9">The problem might be How are you indexing that small block of registers - is the address being used to index zp_reg also 8 bits?</p>
<p class="p10"><br></p>
<p class="p6">Yep. The only part that references anything larger than 8 bits is where I select out 8 bits from a 32-bit register, using constant bit-ranges.</p>
<p class="p10"><br></p>
<p class="p6">`define <span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>W<span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>8</p>
<p class="p6">`define <span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>NW<span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>(`W-1)</p>
<p class="p10"><br></p>
<p class="p6">reg<span class="Apple-converted-space">   <span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span> </span>[`NW:0]<span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>stack[0:255]; <span class="Apple-converted-space">        </span>// Stack-page</p>
<p class="p6">reg<span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>[`NW:0]<span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>SP;<span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>// Stack pointer</p>
<p class="p10"><br></p>
<p class="p6"><span class="Apple-tab-span">	</span>...</p>
<p class="p10"><br></p>
<p class="p6"><span class="Apple-tab-span">	</span>if ((action &amp; `UPDATE_SP) == `UPDATE_SP)</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>begin</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>if (numSPBytes == 1)</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>begin</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>stack[SP]<span class="Apple-tab-span">	</span>&lt;= newSPData[`NW:0];</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>SP<span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>&lt;= SP - 1;</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>end</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>else if (numSPBytes == 2)</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>begin</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>stack[SP]<span class="Apple-tab-span">	</span>&lt;= newSPData[`NW:0];</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>stack[SP-1]<span class="Apple-tab-span">	</span>&lt;= newSPData[`W*2-1:`W];</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>SP<span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>&lt;= SP - 2;</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>end</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>else if (numSPBytes == 3)</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>begin</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>stack[SP]<span class="Apple-tab-span">	</span>&lt;= newSPData[`NW:0];</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>stack[SP-1]<span class="Apple-tab-span">	</span>&lt;= newSPData[`W*2-1:`W];</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>stack[SP-2]<span class="Apple-tab-span">	</span>&lt;= newSPData[`W*3-1:`W*2];</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>SP<span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>&lt;= SP - 3;</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>end</p>
<p class="p6"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>end</p>
<p class="p10"><br></p>
<p class="p10"><br></p>
<p class="p9">Also, why 255 elements and not 256?</p>
<p class="p10"><br></p>
<p class="p6">Well, it goes from 0 to 255 inclusive, so there are 256 entries.</p>
<p class="p11"><br></p>
<p class="p5">Where is zp_reg[255] declared?<span class="Apple-converted-space">  </span>That is the ROM being complained about.</p>
<p class="p5">I assume it is considering it a ROM because it is not written to</p>
<p class="p5">anywhere.</p>
<p class="p12"><br></p>
<p class="p4">They're both declared in the same module - you can see them at http://0x0000ff.com/6502/ (specifically http://0x0000ff.com/6502/6502.v towards the top) and they're both triggering the message...</p>
<p class="p12"><br></p>
<p class="p4">INFO: [Synth 8-5545] ROM "stack_reg[0]" won't be mapped to RAM because address size (32) is larger than maximum supported(25)</p>
<p class="p12"><br></p>
<p class="p4">Both 'zp' and 'stack' are in fact written to - you can see the logic in `EXECUTE in 6502.v above under `STORE_MEM. I'm not sure why it thinks it's a ROM. Unless that logic is being optimised away for the time being, of course. I haven't checked that yet.</p>
<p class="p13"><br></p>
<p class="p14">I am not familiar with Verilog, but I know it does various things that are not obvious by looking at the code.<span class="Apple-converted-space">  </span>That is the main difference with VHDL.<span class="Apple-converted-space">  </span>The statements...</p>
<p class="p13"><br></p>
<p class="p14">stack[i]<span class="Apple-converted-space">    </span>= 0;</p>
<p class="p14">zp[i] <span class="Apple-converted-space">      </span>= i+8'h40;</p>
<p class="p13"><br></p>
<p class="p14">both index the memories by 'i' which is an integer.<span class="Apple-converted-space">  </span>Does Verilog do something like assume the memory has to be of a range 0:2^32-1 because the index is 32 bits?</p>
<p class="p13"><br></p>
<p class="p14">It has been awhile since I did much with VHDL, but I'm pretty sure the declarations define the size of the memory and using an integer index will only get you in trouble in simulation if the index value is out of range.<span class="Apple-converted-space">  </span>This would not be a problem in synthesis as long as it doesn't use block RAM.<span class="Apple-converted-space">  </span>This code initializes the memory and you can't initialize block RAM on reset, only on configuration.</p>
<p class="p15"><br></p>
<p class="p16">Rick is correct. This code:</p>
<p class="p17"><span class="s1"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>for (i=0; i&lt;256; i=i+1)</span></p>
<p class="p17"><span class="s1"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span>begin</span></p>
<p class="p17"><span class="s1"><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span><span class="Apple-tab-span">	</span> <span class="Apple-converted-space">  </span>zp[i] <span class="Apple-converted-space">      </span>= i+8'h40;</span></p>
<p class="p17"><span class="s1"><span class="Apple-converted-space">                           </span>stack[i]<span class="Apple-converted-space">    </span>= 0;</span></p>
<p class="p17"><span class="s1"><span class="Apple-converted-space">                        </span>end</span></p>
<p class="p18">accesses zp with an integer address which is 32 bits. It is also supposed to be a ROM which means that this code snippet (without the stack[i]) should be in an "initial" statement, not an "always" statement for correct inferencing by humans and synthesis tools.</p>
<p class="p19"><br></p>
</body>
</html>
----------------7918460571023645663--


Article: 158505
Subject: Re: Simulation vs Synthesis
From: Simon <google@gornall.net>
Date: Fri, 4 Dec 2015 07:43:17 -0800 (PST)
Links: << >>  << T >>  << A >>
On Friday, December 4, 2015 at 6:44:14 AM UTC-8, Alan Reynolds wrote:
>=20
> Rick is correct. This code:
>=20
> 					for (i=3D0; i<256; i=3Di+1)
>=20
> 						begin
>=20
> 						 =A0 zp[i] =A0 =A0 =A0 =3D i+8'h40;
>=20
> =A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 stack[i]=A0 =A0 =
=3D 0;
>=20
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 end
>=20
> accesses zp with an integer address which is 32 bits. It is also supposed=
 to be a ROM which means that this code snippet (without the stack[i]) shou=
ld be in an "initial" statement, not an "always" statement for correct infe=
rencing by humans and synthesis tools.

Thanks to both of you for this - I was completely missing that. I should be=
 used to how many times you can stare at something and your eyes just flick=
 to the next statement without registering the problem, but it still amazes=
 me...

It's not *actually* supposed to be a ROM, it's supposed to be R/W, but your=
 point about always/initial is well taken, that's another error...=20

Cheers
   Simon.

Article: 158506
Subject: Re: Simulation vs Synthesis
From: BobH <wanderingmetalhead.nospam.please@yahoo.com>
Date: Fri, 4 Dec 2015 16:09:53 -0700
Links: << >>  << T >>  << A >>
On 12/3/2015 11:58 PM, Simon wrote:

>>> Well, it goes from 0 to 255 inclusive, so there are 256 entries.
>>
>> Where is zp_reg[255] declared?  That is the ROM being complained about.
>>    I assume it is considering it a ROM because it is not written to
>> anywhere.
>
> They're both declared in the same module - you can see them at http://0x0000ff.com/6502/
> (specifically http://0x0000ff.com/6502/6502.v towards the top) and they're both triggering
> the message...
>
> INFO: [Synth 8-5545] ROM "stack_reg[0]" won't be mapped to RAM because address
> size (32) is larger than maximum supported(25)
>
> Both 'zp' and 'stack' are in fact written to - you can see the logic in `EXECUTE
> in 6502.v above under `STORE_MEM. I'm not sure why it thinks it's a ROM. Unless
? that logic is being optimised away for the time being, of course. I 
haven't checked that yet.

This may be because I am used to older syntaxes, but don't you need a 
start and a terminal number for array specification?
reg [7:0] zp_reg [0:255];

BobH


Article: 158507
Subject: Re: Simulation vs Synthesis
From: Simon <google@gornall.net>
Date: Sat, 5 Dec 2015 10:24:38 -0800 (PST)
Links: << >>  << T >>  << A >>
On Friday, December 4, 2015 at 3:10:25 PM UTC-8, BobH wrote:
> On 12/3/2015 11:58 PM, Simon wrote:
> 
> >>> Well, it goes from 0 to 255 inclusive, so there are 256 entries.
> >>
> >> Where is zp_reg[255] declared?  That is the ROM being complained about.
> >>    I assume it is considering it a ROM because it is not written to
> >> anywhere.
> >
> > They're both declared in the same module - you can see them at http://0x0000ff.com/6502/
> > (specifically http://0x0000ff.com/6502/6502.v towards the top) and they're both triggering
> > the message...
> >
> > INFO: [Synth 8-5545] ROM "stack_reg[0]" won't be mapped to RAM because address
> > size (32) is larger than maximum supported(25)
> >
> > Both 'zp' and 'stack' are in fact written to - you can see the logic in `EXECUTE
> > in 6502.v above under `STORE_MEM. I'm not sure why it thinks it's a ROM. Unless
> ? that logic is being optimised away for the time being, of course. I 
> haven't checked that yet.
> 
> This may be because I am used to older syntaxes, but don't you need a 
> start and a terminal number for array specification?
> reg [7:0] zp_reg [0:255];
> 
> BobH

Hmm, yes - the range is 256-long though, just like [0:1] gives you two values (0 and 1), [0:255] gives you 256 values (0,1,...,254,255). The declaration in 6502.v looks like:

   reg    [`NW:0]       		zp[0:255];

Or at least, that's my understanding. Are you saying that [0:255] only gives you 0,1,...,253,254 or something similar ?

Cheers
   Simon.

Article: 158508
Subject: Re: Simulation vs Synthesis
From: BobH <wanderingmetalhead.nospam.please@yahoo.com>
Date: Sat, 5 Dec 2015 13:13:02 -0700
Links: << >>  << T >>  << A >>
On 12/5/2015 11:24 AM, Simon wrote:
> On Friday, December 4, 2015 at 3:10:25 PM UTC-8, BobH wrote:
>> On 12/3/2015 11:58 PM, Simon wrote:
>>
>>>>> Well, it goes from 0 to 255 inclusive, so there are 256 entries.
>>>>
>>>> Where is zp_reg[255] declared?  That is the ROM being complained about.
>>>>     I assume it is considering it a ROM because it is not written to
>>>> anywhere.
>>>
>>> They're both declared in the same module - you can see them at http://0x0000ff.com/6502/
>>> (specifically http://0x0000ff.com/6502/6502.v towards the top) and they're both triggering
>>> the message...
>>>
>>> INFO: [Synth 8-5545] ROM "stack_reg[0]" won't be mapped to RAM because address
>>> size (32) is larger than maximum supported(25)
>>>
>>> Both 'zp' and 'stack' are in fact written to - you can see the logic in `EXECUTE
>>> in 6502.v above under `STORE_MEM. I'm not sure why it thinks it's a ROM. Unless
>> ? that logic is being optimised away for the time being, of course. I
>> haven't checked that yet.
>>
>> This may be because I am used to older syntaxes, but don't you need a
>> start and a terminal number for array specification?
>> reg [7:0] zp_reg [0:255];
>>
>> BobH
>
> Hmm, yes - the range is 256-long though, just like [0:1] gives you two values (0 and 1),
> [0:255] gives you 256 values (0,1,...,254,255). The declaration in 6502.v looks like:
>
>     reg    [`NW:0]       		zp[0:255];
>
> Or at least, that's my understanding. Are you saying that [0:255] only gives
> you 0,1,...,253,254 or something similar ?
>
> Cheers
>     Simon.
>
Hi Simon,
In several places in this discussion, I saw references to:
reg [`NW:0] zp [255];
I had not gone out to look at your web site with the code, as I usually 
don't follow links from people that I don't know on usenet. I went to 
your site and saw that it was actually:
reg [`NW:0] zp [0:255];
which seems correct. What I thought previously, was that the declaration 
was not correct, which might have been causing synthesis issues.

Regards,
BobH


Article: 158509
Subject: Re: Lattice diamond / MachXO2
From: athar.kaludi@gmail.com
Date: Sat, 5 Dec 2015 12:56:15 -0800 (PST)
Links: << >>  << T >>  << A >>
Hi All 
I am athar kaludi (skype lailajamil)
I NEED to Download LATTICE Semi Conductors Software for training fpga and other iCE stick - the link on page to download are all broken - I sent email to LATTICE 
no reply yet - PLEASE SOME HELP ME how to download software for lattice products . my Skype lailajamil
regards 
athar.kaludi@gmail.com

Article: 158510
Subject: Re: Simulation vs Synthesis
From: Simon <google@gornall.net>
Date: Sat, 5 Dec 2015 15:40:10 -0800 (PST)
Links: << >>  << T >>  << A >>
On Saturday, December 5, 2015 at 12:13:56 PM UTC-8, BobH wrote:
> On 12/5/2015 11:24 AM, Simon wrote:
> > On Friday, December 4, 2015 at 3:10:25 PM UTC-8, BobH wrote:
> >> On 12/3/2015 11:58 PM, Simon wrote:
> >>
> >>>>> Well, it goes from 0 to 255 inclusive, so there are 256 entries.
> >>>>
> >>>> Where is zp_reg[255] declared?  That is the ROM being complained about.
> >>>>     I assume it is considering it a ROM because it is not written to
> >>>> anywhere.
> >>>
> >>> They're both declared in the same module - you can see them at http://0x0000ff.com/6502/
> >>> (specifically http://0x0000ff.com/6502/6502.v towards the top) and they're both triggering
> >>> the message...
> >>>
> >>> INFO: [Synth 8-5545] ROM "stack_reg[0]" won't be mapped to RAM because address
> >>> size (32) is larger than maximum supported(25)
> >>>
> >>> Both 'zp' and 'stack' are in fact written to - you can see the logic in `EXECUTE
> >>> in 6502.v above under `STORE_MEM. I'm not sure why it thinks it's a ROM. Unless
> >> ? that logic is being optimised away for the time being, of course. I
> >> haven't checked that yet.
> >>
> >> This may be because I am used to older syntaxes, but don't you need a
> >> start and a terminal number for array specification?
> >> reg [7:0] zp_reg [0:255];
> >>
> >> BobH
> >
> > Hmm, yes - the range is 256-long though, just like [0:1] gives you two values (0 and 1),
> > [0:255] gives you 256 values (0,1,...,254,255). The declaration in 6502.v looks like:
> >
> >     reg    [`NW:0]       		zp[0:255];
> >
> > Or at least, that's my understanding. Are you saying that [0:255] only gives
> > you 0,1,...,253,254 or something similar ?
> >
> > Cheers
> >     Simon.
> >
> Hi Simon,
> In several places in this discussion, I saw references to:
> reg [`NW:0] zp [255];
> I had not gone out to look at your web site with the code, as I usually 
> don't follow links from people that I don't know on usenet. I went to 
> your site and saw that it was actually:
> reg [`NW:0] zp [0:255];
> which seems correct. What I thought previously, was that the declaration 
> was not correct, which might have been causing synthesis issues.
> 
> Regards,
> BobH

Ah - sorry :)

Cheers
   Simon.

Article: 158511
Subject: Error in converting code to VHDL
From: Jamil Hayder <engr.jamilhayder@gmail.com>
Date: Wed, 9 Dec 2015 03:32:35 -0800 (PST)
Links: << >>  << T >>  << A >>
Hi, Can some one please help out.Thanks. I am trying to convert below
code to VHDL using HDL Coder but getting error .Please can you have a look on it and see whats the
mistake in block diagram.

x1=[1 2 3 4 5 6 7 8 9];
x2=[3 4 5 6 7 8 9 2 1];
n=length(x1);
xc=zeros(2*n-1,1);
for i=1:2*n-1
  if(i>n)
      j1=1;
      k1=2*n-i;
      j2=i-n+1;
      k2=n;
  else
      j1=n-i+1;
      k1=n;
      j2=1;
      k2=i;
  end
  xc(i)=sum(conj(x1(j1:k1)).*x2(j2:k2));
end
xc=flipud(xc);


Error:

Cannot connect to model 'prc4'; please try Update Diagram (Ctrl-D).

Error due to multiple causes.

Errors occurred during parsing of MATLAB function 'MATLAB Function'(#24)

Error in port widths or dimensions. Output port 1 of 'prc4/MATLAB
Function/u' is a one dimensional vector with 1 elements. 

Article: 158512
Subject: modulo 2**32-1 arith
From: Ilya Kalistru <stebanoid@gmail.com>
Date: Tue, 15 Dec 2015 02:32:08 -0800 (PST)
Links: << >>  << T >>  << A >>
Hello.
I need to add two unsigned numbers modulo 2**32-1.
Now it's done in very inefficient way: at first clock cycle there is simple addition of two 32-bit unsigned numbers with 33-bit result and on second cycle if the result >= 2**32-1, we add 1 and take only 32 bits of that.
Does anybody know a better way to do that?

Article: 158513
Subject: Re: modulo 2**32-1 arith
From: GaborSzakacs <gabor@alacron.com>
Date: Tue, 15 Dec 2015 09:05:00 -0500
Links: << >>  << T >>  << A >>
Ilya Kalistru wrote:
> Hello.
> I need to add two unsigned numbers modulo 2**32-1.
> Now it's done in very inefficient way: at first clock cycle there is simple addition of two 32-bit unsigned numbers with 33-bit result and on second cycle if the result >= 2**32-1, we add 1 and take only 32 bits of that.
> Does anybody know a better way to do that?

Sounds like you want an "end around carry."  That means that the
carry out of a 32-bit full adder wraps back to its carry in.  In
that way a 32-bit full adder would do what you want in one cycle.

-- 
Gabor

Article: 158514
Subject: Re: modulo 2**32-1 arith
From: GaborSzakacs <gabor@alacron.com>
Date: Tue, 15 Dec 2015 10:52:09 -0500
Links: << >>  << T >>  << A >>
GaborSzakacs wrote:
> Ilya Kalistru wrote:
>> Hello.
>> I need to add two unsigned numbers modulo 2**32-1.
>> Now it's done in very inefficient way: at first clock cycle there is 
>> simple addition of two 32-bit unsigned numbers with 33-bit result and 
>> on second cycle if the result >= 2**32-1, we add 1 and take only 32 
>> bits of that.
>> Does anybody know a better way to do that?
> 
> Sounds like you want an "end around carry."  That means that the
> carry out of a 32-bit full adder wraps back to its carry in.  In
> that way a 32-bit full adder would do what you want in one cycle.
> 

Hmmm...  On second thought, that would not handle the case where
the inputs add up to exactly 2**32-1, since there would be no carry
out, but you would want to add one to end up with zero.

-- 
Gabor

Article: 158515
Subject: Re: modulo 2**32-1 arith
From: rickman <gnuarm@gmail.com>
Date: Tue, 15 Dec 2015 11:01:25 -0500
Links: << >>  << T >>  << A >>
On 12/15/2015 5:32 AM, Ilya Kalistru wrote:
> Hello. I need to add two unsigned numbers modulo 2**32-1. Now it's
> done in very inefficient way: at first clock cycle there is simple
> addition of two 32-bit unsigned numbers with 33-bit result and on
> second cycle if the result >= 2**32-1, we add 1 and take only 32 bits
> of that. Does anybody know a better way to do that?

Not sure how efficient your implementation would be this way.  You can 
achieve the same result by adding one to the sum and adding the high bit 
of this result to the original sum to get your answer.  This should give 
a simpler result because of the comparison your approach uses requires a 
full adder rather than the incrementer of this approach.


signal sum, temp, mod_result : unsigned (32 downto 0);
signal answer : unsigned (31 downto 0);

sum <= RESIZE(a, 33) + RESIZE(b, 33);

temp <= sum + 1;

mod_result <= sum + temp(32);

answer <= mod_result(31 downto 0);


Doing the addition with an end around carry will not cover the case of 
the sum being 2^n - 1.

No guarantees of the code above.  I'm a bit rusty these days.

-- 

Rick

Article: 158516
Subject: Re: modulo 2**32-1 arith
From: KJ <kkjennings@sbcglobal.net>
Date: Tue, 15 Dec 2015 12:38:49 -0800 (PST)
Links: << >>  << T >>  << A >>
On Tuesday, December 15, 2015 at 5:32:14 AM UTC-5, Ilya Kalistru wrote:
> Hello.
> I need to add two unsigned numbers modulo 2**32-1.
> Now it's done in very inefficient way: at first clock cycle there is simp=
le addition of two 32-bit unsigned numbers with 33-bit result and on second=
 cycle if the result >=3D 2**32-1, we add 1 and take only 32 bits of that.
> Does anybody know a better way to do that?

Maybe you should revisit the need for 'modulo 2**32-1' instead of 'modulo 2=
**32'.  Synthesis results of your method and Rickman's method indicate that=
 the modulo portion is consuming significantly more logic than the addition=
 itself.  Given the code posted below, here are the synthesis results using=
 Quartus to target a Cyclone IV GX:

Method#  Logic Elements  Notes
=3D=3D=3D=3D=3D=3D=3D  =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D  =3D=3D=
=3D=3D=3D
0        32              Sum <=3D A+B, no 'module 2**32-1' as a baseline
1        32              Sum <=3D A+B+1, as another reference point
2        76              Ilya's method
3        98              Rickman's method

Kevin Jennings

=3D=3D=3D=3D=3D START OF CODE =3D=3D=3D=3D=3D
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity Custom_Adder is generic(METHOD: integer range 0 to 3);
port(
    A:  in  unsigned(31 downto 0);
    B:  in  unsigned(31 downto 0);
    C:  out unsigned(31 downto 0)
);
end Custom_Adder;

architecture RTL of Custom_Adder is
    signal Sum:     unsigned(32 downto 0);
    signal Sum_P1:  unsigned(32 downto 0);
    constant MAX:   unsigned(32 downto 0) :=3D '0' & x"FFFF_FFFF";
begin
    Sum     <=3D resize(A, Sum'length) + resize(B, Sum'length);
    Sum_P1  <=3D Sum + 1;

    -- KJ note:  Performing all calculations on one clock cycle in order to=
 determine logic cells. =20

    GEN_METHOD0: if (METHOD =3D 0) generate
        C   <=3D Sum(C'range);
    end generate GEN_METHOD0;

    GEN_METHOD1: if (METHOD =3D 1) generate
        C   <=3D Sum_P1(C'range);
    end generate GEN_METHOD1;

    GEN_METHOD2: if (METHOD =3D 2) generate
        -- Ilya's method:
        -- at first clock cycle there is simple addition of two 32-bit unsi=
gned numbers with 33-bit result=20
        -- and on second cycle if the result >=3D 2**32-1, we add 1 and tak=
e only 32 bits of that
        C   <=3D Sum_P1(C'range) when (Sum >=3D MAX) else Sum(C'range);
    end generate GEN_METHOD2;

    GEN_METHOD3: if (METHOD =3D 3) generate
        signal Mod_Res: unsigned(32 downto 0);
    begin
        -- sum <=3D RESIZE(a, 33) + RESIZE(b, 33);=20
        -- temp <=3D sum + 1;=20
        -- mod_result <=3D sum + temp(32);=20
        -- answer <=3D mod_result(31 downto 0);=20
        -- The computation of 'mod_result' above=20
        Mod_Res <=3D Sum + unsigned'(Sum_P1(32 downto 32));
        C       <=3D Sum_P1(C'range) when (Sum_P1(32) =3D '1') else Sum(C'r=
ange);
    end generate GEN_METHOD3;
end RTL;
=3D=3D=3D=3D=3D END OF CODE =3D=3D=3D=3D=3D

Article: 158517
Subject: Re: modulo 2**32-1 arith
From: KJ <kkjennings@sbcglobal.net>
Date: Tue, 15 Dec 2015 12:44:50 -0800 (PST)
Links: << >>  << T >>  << A >>
On Tuesday, December 15, 2015 at 3:38:57 PM UTC-5, KJ wrote:
OOPS, correction to the Method3 code:  
Original:  C       <= Sum_P1(C'range) when (Sum_P1(32) = '1') else         C       <= Mod_Res(C'range);
Corrected:  C       <= Mod_Res(C'range);

Number of logic elements used is 97 rather than 98.

Kevin

Article: 158518
Subject: Re: modulo 2**32-1 arith
From: Rob Gaddi <rgaddi@highlandtechnology.invalid>
Date: Tue, 15 Dec 2015 21:31:15 -0000 (UTC)
Links: << >>  << T >>  << A >>
KJ wrote:

> On Tuesday, December 15, 2015 at 5:32:14 AM UTC-5, Ilya Kalistru wrote:
>> Hello.
>> I need to add two unsigned numbers modulo 2**32-1.
>> Now it's done in very inefficient way: at first clock cycle there is simple addition of two 32-bit unsigned numbers with 33-bit result and on second cycle if the result >= 2**32-1, we add 1 and take only 32 bits of that.
>> Does anybody know a better way to do that?
>
> Maybe you should revisit the need for 'modulo 2**32-1' instead of 'modulo 2**32'.  Synthesis results of your method and Rickman's method indicate that the modulo portion is consuming significantly more logic than the addition itself.  Given the code posted below, here are the synthesis results using Quartus to target a Cyclone IV GX:
>
> Method#  Logic Elements  Notes
> =======  ==============  =====
> 0        32              Sum <= A+B, no 'module 2**32-1' as a baseline
> 1        32              Sum <= A+B+1, as another reference point
> 2        76              Ilya's method
> 3        98              Rickman's method
>
> Kevin Jennings
>
> ===== START OF CODE =====
> library ieee;
> use ieee.std_logic_1164.all;
> use ieee.numeric_std.all;
>
> entity Custom_Adder is generic(METHOD: integer range 0 to 3);
> port(
>     A:  in  unsigned(31 downto 0);
>     B:  in  unsigned(31 downto 0);
>     C:  out unsigned(31 downto 0)
> );
> end Custom_Adder;
>
> architecture RTL of Custom_Adder is
>     signal Sum:     unsigned(32 downto 0);
>     signal Sum_P1:  unsigned(32 downto 0);
>     constant MAX:   unsigned(32 downto 0) := '0' & x"FFFF_FFFF";
> begin
>     Sum     <= resize(A, Sum'length) + resize(B, Sum'length);
>     Sum_P1  <= Sum + 1;
>
>     -- KJ note:  Performing all calculations on one clock cycle in order to determine logic cells.  
>
>     GEN_METHOD0: if (METHOD = 0) generate
>         C   <= Sum(C'range);
>     end generate GEN_METHOD0;
>
>     GEN_METHOD1: if (METHOD = 1) generate
>         C   <= Sum_P1(C'range);
>     end generate GEN_METHOD1;
>
>     GEN_METHOD2: if (METHOD = 2) generate
>         -- Ilya's method:
>         -- at first clock cycle there is simple addition of two 32-bit unsigned numbers with 33-bit result 
>         -- and on second cycle if the result >= 2**32-1, we add 1 and take only 32 bits of that
>         C   <= Sum_P1(C'range) when (Sum >= MAX) else Sum(C'range);
>     end generate GEN_METHOD2;
>
>     GEN_METHOD3: if (METHOD = 3) generate
>         signal Mod_Res: unsigned(32 downto 0);
>     begin
>         -- sum <= RESIZE(a, 33) + RESIZE(b, 33); 
>         -- temp <= sum + 1; 
>         -- mod_result <= sum + temp(32); 
>         -- answer <= mod_result(31 downto 0); 
>         -- The computation of 'mod_result' above 
>         Mod_Res <= Sum + unsigned'(Sum_P1(32 downto 32));
>         C       <= Sum_P1(C'range) when (Sum_P1(32) = '1') else Sum(C'range);
>     end generate GEN_METHOD3;
> end RTL;
> ===== END OF CODE =====

I'm guessing the requirement for modulo 2**32-1 is driven by the
algorithm, possibly some checksummy sort of thing.  I know Fletcher uses
2**N-1 modulo pretty heavily.

If your concern is speed rather than size, you could probably do it by
running two parallel adders, Y0 = A+B and Y1 = A+B+1.  If the Y1
add carries out then Y = Y1 else Y = Y0.

Alternatively, the way you do it in an optimized Fletcher is in blocks. 
Add a whole bunch of samples together with sufficient extra bits on the
high end to count the overflows, then periodicially stop taking in new
data and add the overflows back into the LSBs.  You could easily get
that "periodically" down to once out of every 1024 samples.

-- 
Rob Gaddi, Highland Technology -- www.highlandtechnology.com

Email address domain is currently out of order.  See above to fix.

Article: 158519
Subject: Re: modulo 2**32-1 arith
From: rickman <gnuarm@gmail.com>
Date: Tue, 15 Dec 2015 19:16:30 -0500
Links: << >>  << T >>  << A >>
On 12/15/2015 4:31 PM, Rob Gaddi wrote:
> KJ wrote:
>
>> On Tuesday, December 15, 2015 at 5:32:14 AM UTC-5, Ilya Kalistru wrote:
>>> Hello.
>>> I need to add two unsigned numbers modulo 2**32-1.
>>> Now it's done in very inefficient way: at first clock cycle there is simple addition of two 32-bit unsigned numbers with 33-bit result and on second cycle if the result >= 2**32-1, we add 1 and take only 32 bits of that.
>>> Does anybody know a better way to do that?
>>
>> Maybe you should revisit the need for 'modulo 2**32-1' instead of 'modulo 2**32'.  Synthesis results of your method and Rickman's method indicate that the modulo portion is consuming significantly more logic than the addition itself.  Given the code posted below, here are the synthesis results using Quartus to target a Cyclone IV GX:
>>
>> Method#  Logic Elements  Notes
>> =======  ==============  =====
>> 0        32              Sum <= A+B, no 'module 2**32-1' as a baseline
>> 1        32              Sum <= A+B+1, as another reference point
>> 2        76              Ilya's method
>> 3        98              Rickman's method
>>
>> Kevin Jennings

I'm not clear on how "Ilya's method" uses only 76 LEs. I assume an LE is 
a 4 input LUT and/or a register. I count at least 131. Producing Sum 
uses 33, producing Sum_P1 uses another 33, evaluating (Sum >= MAX) uses 
33 then there are 32 used in the mux to select the result. How can that 
reduce to 76 LEs?


>> ===== START OF CODE =====
>> library ieee;
>> use ieee.std_logic_1164.all;
>> use ieee.numeric_std.all;
>>
>> entity Custom_Adder is generic(METHOD: integer range 0 to 3);
>> port(
>>      A:  in  unsigned(31 downto 0);
>>      B:  in  unsigned(31 downto 0);
>>      C:  out unsigned(31 downto 0)
>> );
>> end Custom_Adder;
>>
>> architecture RTL of Custom_Adder is
>>      signal Sum:     unsigned(32 downto 0);
>>      signal Sum_P1:  unsigned(32 downto 0);
>>      constant MAX:   unsigned(32 downto 0) := '0' & x"FFFF_FFFF";
>> begin
>>      Sum     <= resize(A, Sum'length) + resize(B, Sum'length);
>>      Sum_P1  <= Sum + 1;
>>
>>      -- KJ note:  Performing all calculations on one clock cycle in order to determine logic cells.
>>
>>      GEN_METHOD0: if (METHOD = 0) generate
>>          C   <= Sum(C'range);
>>      end generate GEN_METHOD0;
>>
>>      GEN_METHOD1: if (METHOD = 1) generate
>>          C   <= Sum_P1(C'range);
>>      end generate GEN_METHOD1;
>>
>>      GEN_METHOD2: if (METHOD = 2) generate
>>          -- Ilya's method:
>>          -- at first clock cycle there is simple addition of two 32-bit unsigned numbers with 33-bit result
>>          -- and on second cycle if the result >= 2**32-1, we add 1 and take only 32 bits of that
>>          C   <= Sum_P1(C'range) when (Sum >= MAX) else Sum(C'range);
>>      end generate GEN_METHOD2;
>>
>>      GEN_METHOD3: if (METHOD = 3) generate
>>          signal Mod_Res: unsigned(32 downto 0);
>>      begin
>>          -- sum <= RESIZE(a, 33) + RESIZE(b, 33);
>>          -- temp <= sum + 1;
>>          -- mod_result <= sum + temp(32);
>>          -- answer <= mod_result(31 downto 0);
>>          -- The computation of 'mod_result' above
>>          Mod_Res <= Sum + unsigned'(Sum_P1(32 downto 32));
>>          C       <= Sum_P1(C'range) when (Sum_P1(32) = '1') else Sum(C'range);
>>      end generate GEN_METHOD3;
>> end RTL;
>> ===== END OF CODE =====
>
> I'm guessing the requirement for modulo 2**32-1 is driven by the
> algorithm, possibly some checksummy sort of thing.  I know Fletcher uses
> 2**N-1 modulo pretty heavily.
>
> If your concern is speed rather than size, you could probably do it by
> running two parallel adders, Y0 = A+B and Y1 = A+B+1.  If the Y1
> add carries out then Y = Y1 else Y = Y0.

That would reduce to the same complexity as my method as implemented 
above.  My approach can be optimized by producing temp (Y1) from the two 
inputs.  Then in parallel adding bit 32 into the sum of the two inputs 
that produces Y.  Take Y modulo 2^n as the final step using 66 LEs. Like 
this...

signal Y: unsigned(31 downto 0);
signal Y, Y1, A, B: unsigned(32 downto 0);
Y1 <= A + B + 1;
Y <= resize(A + B + resize(Y1(32 downto 32),33), 32);

If the tool is of any use it will add the upper bit of Y1 as a carry in 
to the sum of A + B utilizing a total of 65 LEs.

-- 

Rick

Article: 158520
Subject: Re: modulo 2**32-1 arith
From: KJ <kkjennings@sbcglobal.net>
Date: Wed, 16 Dec 2015 06:01:44 -0800 (PST)
Links: << >>  << T >>  << A >>
On Tuesday, December 15, 2015 at 7:16:42 PM UTC-5, rickman wrote:
>=20
> I'm not clear on how "Ilya's method" uses only 76 LEs. I assume an LE is=
=20
> a 4 input LUT and/or a register.

The target device family was stated.

> I count at least 131. Producing Sum=20
> uses 33, producing Sum_P1 uses another 33, evaluating (Sum >=3D MAX) uses=
=20
> 33 then there are 32 used in the mux to select the result. How can that=
=20
> reduce to 76 LEs?
>=20

It reduces because synthesis does not work how you how you have described i=
t above.

> That would reduce to the same complexity as my method as implemented=20
> above.  My approach can be optimized by producing temp (Y1) from the two=
=20
> inputs.  Then in parallel adding bit 32 into the sum of the two inputs=20
> that produces Y.  Take Y modulo 2^n as the final step using 66 LEs. Like=
=20
> this...
>=20
> signal Y: unsigned(31 downto 0);
> signal Y, Y1, A, B: unsigned(32 downto 0);
> Y1 <=3D A + B + 1;
> Y <=3D resize(A + B + resize(Y1(32 downto 32),33), 32);
>=20
> If the tool is of any use it will add the upper bit of Y1 as a carry in=
=20
> to the sum of A + B utilizing a total of 65 LEs.
>=20
Your updated algorithm is actually the same as your first.  There are also =
a few problems with your code that indicate that you did not bother to comp=
ile your design to produce actual results so your conclusions are simply sp=
eculating (and they are incorrect).
- You've declared 'Y' twice, once to be a 32 bit vector, the other 33 bits.=
  Minor thing, easily fixed
- The calculation of Y1 is not correct and will result in Y1(32) always bei=
ng 0 which then affects the computation of Y in the next line.  This error =
functionally changes the output to simply be the 32 bit sum of the inputs m=
od 2^32, not mod 2^32-1 as requested.
- The code you posted with the fix to remove the declaration of Y as a 33 b=
it number results in 32 logic cell usage not 66 as you speculated.  However=
 this number is not really meaningful because it is not computing what the =
OP wanted due to the second error.  When that second error is corrected as =
shown in the code posted below, it result in the same 97 logic cells as you=
r originally posted method.

Bottom line is that what you have outlined for your method takes more logic=
 simply because there is an additional adder required for your method that =
is not needed with Ilya's method.

You and Rob are also not correct in thinking that computing Sum+1 as A+B+1 =
will save anything.  In the updated code I've posted at the end, I've added=
 another generic control called SP1_METHOD which is used to control how 'Su=
m + 1' gets computed.  Sum + 1 can now be computed as:
    Sum_P1  <=3D Sum + 1; (as it was in the original code)
or
    Sum_P1  <=3D A+B+1;=20

Quartus sees through the smoke and produces the exact same logic independen=
t of SP1_METHOD.  However, Quartus can be made to overshoot:  if you comput=
e Sum + 1 as 'A+1+B' rather than 'A+B+1', then the resource usage shoots up=
 from 76 to 108 using Ilya's method.  This is the result posted for Method=
=3D6, SP1_Method=3D1.  Obviously, Quartus is able to spot the common 'A+B' =
expression and not recompute it.  The reason why changing the order of the =
two adds makes a difference is because the VHDL language specifies that the=
 expression be evaluated from left to right and 'A+B+1' is not the same as =
'A+1+B' in that context.  The LRM does not allow for a 'commutative propert=
y of addition'.  To take advantage of that mathematical property, the code =
must be explicitly written in a way that takes advantage of it.

Back to the OP's problem.  Barring a discovery of some other mathematical r=
elationship that can be taken advantage of (which is what is really needed)=
, what Ilya originally described is optimal.  In fact, spreading the comput=
ation over two clock cycles as he described is the best approach.  Whereas =
computing the output all within a single clock cycle takes 76 logic cells, =
spreading the computation into two clocks only takes 65 (see new Method=3D5=
).  The tradeoff is the additional clock cycle of latency.  Whether or not =
that is important in Ilya's application is for him to decide.

Results for all of the variations are:

         Logic Elements
Method#  SP1=3D0   SP1=3D1   Notes
0        32      32      C <=3D A+B, no modulo '2**32-1' as a baseline
1        32      32      C <=3D A+B+1, as another reference point=20
2        76      76      Ilya's method although implemented all in one cloc=
k cycle
3        97      97      Rickman's method #1
4        97      97      Rickman's method #2
5        65      65      Ilya's method as stated, two clock cycles
6        76      108     Same as method 2, but when SP1_METHOD=3D1 it will =
computes Sum + 1 as A+1+B rather than A+B+1

Kevin Jennings

=3D=3D=3D=3D START OF CODE =3D=3D=3D=3D
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity Custom_Adder is generic(
    SP1_METHOD:     integer range 0 to 1;   -- Controls how 'Sum + 1' is ca=
lculated
    METHOD:         integer range 0 to 6);
port(
    Clock:  in  std_ulogic;
    A:      in  unsigned(31 downto 0);
    B:      in  unsigned(31 downto 0);
    C:      out unsigned(31 downto 0)
);
end Custom_Adder;

-- Results:
--          Logic Elements
-- Method#  SP1=3D0   SP1=3D1   Notes
-- 0        32      32      C <=3D A+B, no modulo '2**32-1' as a baseline
-- 1        32      32      C <=3D A+B+1, as another reference point=20
-- 2        76      76      Ilya's method although implemented all in one c=
lock cycle
-- 3        97      97      Rickman's method #1
-- 4        97      97      Rickman's method #2
-- 5        65      65      Ilya's method as stated, two clock cycles
-- 6        76      108     Same as method 2, but when SP1_METHOD=3D1 it wi=
ll computes Sum + 1 as A+1+B rather than A+B+1
architecture RTL of Custom_Adder is
    signal Sum:     unsigned(32 downto 0);
    signal Sum_P1:  unsigned(32 downto 0);
    constant MAX:   unsigned(32 downto 0) :=3D '0' & x"FFFF_FFFF";
begin
    Sum     <=3D resize(A, Sum'length) + resize(B, Sum'length);
    Sum_P1  <=3D Sum + 1 when (SP1_METHOD =3D 0) else resize(A, Sum'length)=
 + resize(B, Sum'length) + 1;
    -- KJ note:  If you change Sum_P1 to compute A+1+B rather than A+B+1 th=
en the number of logic cells=20
    --           increases by 27 (i.e. almost one for each bit of the adder

    -- KJ note:  Performing all calculations on one clock cycle in order to=
 determine logic cells. =20

    GEN_METHOD0: if (METHOD =3D 0) generate
        C   <=3D Sum(C'range);
    end generate GEN_METHOD0;

    GEN_METHOD1: if (METHOD =3D 1) generate
        C   <=3D Sum_P1(C'range);
    end generate GEN_METHOD1;

    GEN_METHOD2: if (METHOD =3D 2) generate
        -- Ilya's method, implemented all in one clock cycle:
        -- at first clock cycle there is simple addition of two 32-bit unsi=
gned numbers with 33-bit result=20
        -- and on second cycle if the result >=3D 2**32-1, we add 1 and tak=
e only 32 bits of that
        C   <=3D Sum_P1(C'range) when (Sum >=3D MAX) else Sum(C'range);
    end generate GEN_METHOD2;

    GEN_METHOD3: if (METHOD =3D 3) generate
        signal Mod_Res: unsigned(32 downto 0);
    begin
        -- Rickman's first method
        -- sum <=3D RESIZE(a, 33) + RESIZE(b, 33);=20
        -- temp <=3D sum + 1;=20
        -- mod_result <=3D sum + temp(32);=20
        -- answer <=3D mod_result(31 downto 0);=20
        Mod_Res <=3D Sum + unsigned'(Sum_P1(32 downto 32));
        C       <=3D Mod_Res(C'range);
    end generate GEN_METHOD3;
    GEN_METHOD4: if (METHOD =3D 4) generate
        -- Rickman's second method
        -- signal Y: unsigned(31 downto 0);=20
        -- signal Y, Y1, A, B: unsigned(32 downto 0);   <-- KJ note:  Redcl=
ares 'Y'
        -- Y1 <=3D A + B + 1;     <-- KJ note:  Error:  must have 33 elemen=
ts not 32
        -- Y <=3D resize(A + B + resize(Y1(32 downto 32),33), 32);=20
        signal Y:   unsigned(31 downto 0);=20
        signal Y1:  unsigned(32 downto 0);=20
    begin
        Y1  <=3D resize(A, 33) + resize(B, 33) + 1;=20
        Y   <=3D resize(A + B + resize(Y1(32 downto 32),33), 32);
        C   <=3D Y;
    end generate GEN_METHOD4;

    GEN_METHOD5: if (METHOD =3D 5) generate
        -- Ilya's method:
        -- at first clock cycle there is simple addition of two 32-bit unsi=
gned numbers with 33-bit result=20
        -- and on second cycle if the result >=3D 2**32-1, we add 1 and tak=
e only 32 bits of that
        signal Sum_Dlyd:    unsigned(Sum'range);
        signal Sum_Dlyd_P1: unsigned(Sum'range);
    begin
        Sum_Dlyd    <=3D Sum when rising_edge(Clock);
        Sum_Dlyd_P1 <=3D Sum_Dlyd + 1;
        C   <=3D Sum_Dlyd_P1(C'range) when (Sum_Dlyd(32) =3D '1') else Sum_=
Dlyd(C'range);
    end generate GEN_METHOD5;

    GEN_METHOD6: if (METHOD =3D 6) generate
        -- Same as method 2 (Ilya's method, implemented all in one clock cy=
cle) except
        -- the ordering of operands for the computation of 'Sum + 1' is mod=
ified.
        signal Sum_P1:  unsigned(Sum'range);
    begin
        Sum_P1  <=3D Sum + 1 when (SP1_METHOD =3D 0) else resize(A, Sum'len=
gth) + 1 + resize(B, Sum'length);
        C   <=3D Sum_P1(C'range) when (Sum >=3D MAX) else Sum(C'range);
    end generate GEN_METHOD6;

end RTL;
=3D=3D=3D=3D END OF CODE =3D=3D=3D=3D

Article: 158521
Subject: Re: modulo 2**32-1 arith
From: KJ <kkjennings@sbcglobal.net>
Date: Wed, 16 Dec 2015 06:19:49 -0800 (PST)
Links: << >>  << T >>  << A >>
On Tuesday, December 15, 2015 at 4:33:51 PM UTC-5, Rob Gaddi wrote:
> I'm guessing the requirement for modulo 2**32-1 is driven by the
> algorithm, possibly some checksummy sort of thing.  I know Fletcher uses
> 2**N-1 modulo pretty heavily.
>=20

Could be right, at least there is some possible context now to the question=
.  Thanks.

> If your concern is speed rather than size, you could probably do it by
> running two parallel adders, Y0 =3D A+B and Y1 =3D A+B+1.  If the Y1
> add carries out then Y =3D Y1 else Y =3D Y0.
>=20

See my earlier post today.  Quartus will implement 'Y1=3DA+B+1' exactly the=
 same as 'Y1=3DY+1'.

> Alternatively, the way you do it in an optimized Fletcher is in blocks.=
=20
> Add a whole bunch of samples together with sufficient extra bits on the
> high end to count the overflows, then periodicially stop taking in new
> data and add the overflows back into the LSBs.  You could easily get
> that "periodically" down to once out of every 1024 samples.
>=20
Interesting idea, but I don't think it saves anything over Ilya's method.  =
Ilya's method as originally specified (i.e. spread out over two clock cycle=
s) takes essentially 2 logic cells per bit (but it takes ~2.4 per bit for s=
ingle clock cycle).  When it comes time to periodically update the overflow=
s that you mentioned, you will end up with another adder and consume anothe=
r 32 logic cells (one per bit).

Kevin Jennings

Article: 158522
Subject: Re: modulo 2**32-1 arith
From: thomas.entner99@gmail.com
Date: Wed, 16 Dec 2015 06:23:42 -0800 (PST)
Links: << >>  << T >>  << A >>
Am Dienstag, 15. Dezember 2015 11:32:14 UTC+1 schrieb Ilya Kalistru:
> Hello.
> I need to add two unsigned numbers modulo 2**32-1.
> Now it's done in very inefficient way: at first clock cycle there is simple addition of two 32-bit unsigned numbers with 33-bit result and on second cycle if the result >= 2**32-1, we add 1 and take only 32 bits of that.
> Does anybody know a better way to do that?

Which logic family?
What is "better"? Less ressources? Faster? Do you need a result every cycle?

If you can do with a result every other cycle, you could try following:
- Make an 32b adder with carry in and carry out
- First cycle: Add the two numbers with carry.
- Second cycle: Check if the output carry is set. If yes, we already have the result. Otherwise clear the input carry and use the new result.

This should be below 40 LEs. But I have not tried it out...

Regards,

Thomas

www.entner-electronics.com - Home of EEBlaster

Article: 158523
Subject: Re: modulo 2**32-1 arith
From: Rob Gaddi <rgaddi@highlandtechnology.invalid>
Date: Wed, 16 Dec 2015 17:09:41 -0000 (UTC)
Links: << >>  << T >>  << A >>
KJ wrote:

> On Tuesday, December 15, 2015 at 4:33:51 PM UTC-5, Rob Gaddi wrote:
>> I'm guessing the requirement for modulo 2**32-1 is driven by the
>> algorithm, possibly some checksummy sort of thing.  I know Fletcher uses
>> 2**N-1 modulo pretty heavily.
>> 
>
> Could be right, at least there is some possible context now to the question.  Thanks.
>
>> If your concern is speed rather than size, you could probably do it by
>> running two parallel adders, Y0 = A+B and Y1 = A+B+1.  If the Y1
>> add carries out then Y = Y1 else Y = Y0.
>> 
>
> See my earlier post today.  Quartus will implement 'Y1=A+B+1' exactly the same as 'Y1=Y+1'.
>

Y0=A+B is a 32-bit adder with CIN=0.  Y1=A+B+1 is a second 32-bit adder
with CIN=1.  Choosing between them is a 32-bit 2:1 mux.  So my rough
math puts my answer at 96 LEs and 32 flops.  But the worst case
propagation path is from the LSB, through the 32-bit Y1 carry chain, to
the mux select, to the output, which should be pretty screamingly
fast.

Whereas Ilya's (and rickman's as well, if I'm reading it right) use the
carry-out of the A+B+1 add as the carry in to the "real" add.  That
gives you the total prop delay of a 64-bit carry chain, plus some slop.

The OP, who seems to have vanished off into the ether, never specified
what "efficiency" he was trying to optimize on, or whether pipeline
delays were acceptable, etc.

>> Alternatively, the way you do it in an optimized Fletcher is in blocks. 
>> Add a whole bunch of samples together with sufficient extra bits on the
>> high end to count the overflows, then periodicially stop taking in new
>> data and add the overflows back into the LSBs.  You could easily get
>> that "periodically" down to once out of every 1024 samples.
>> 
> Interesting idea, but I don't think it saves anything over Ilya's method.  Ilya's method as originally specified (i.e. spread out over two clock cycles) takes essentially 2 logic cells per bit (but it takes ~2.4 per bit for single clock cycle).  When it comes time to periodically update the overflows that you mentioned, you will end up with another adder and consume another 32 logic cells (one per bit).
>

My above comment again.  If I'm right about the checksum, then Ilya's
method (at least on a reading) gives you an operating duty cycle of
50%, you can put new data no faster than every other clock because
you're waiting for the result of that "add one more decision" to
percolate back around.  Leave out the pipeline registers and your fmax
gets ugly.

If you just carry the excess and cook it periodically, you get a "data
accept" duty cycle of well past 99%.  There may be some cleverer way to
get to a 100% duty cycle pipeline, but with only one cup of coffee in me
I don't see it.

> Kevin Jennings

-- 
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.

Article: 158524
Subject: Re: modulo 2**32-1 arith
From: rickman <gnuarm@gmail.com>
Date: Wed, 16 Dec 2015 12:11:52 -0500
Links: << >>  << T >>  << A >>
The tool was not using the carry in.  Here is a version that uses the 
carry in.  66 LEs in one clock cycle.  It can be reduced to about 40 LEs 
if done in two clock cycles.

BTW, did you test any of the designs you synthesized?  How do you know 
they actually work?

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use std.textio.all;

ENTITY FPGA IS
   GENERIC (
	  WIDTH	: POSITIVE	:= 32
	);
   port(
	  -- Board Clock
	  A, B	: in	unsigned(WIDTH - 1 downto 0);
	  Y	: out	unsigned(WIDTH - 1 downto 0)
	);
END FPGA;

ARCHITECTURE behavior OF FPGA IS

   signal Y1 : unsigned(WIDTH downto 0);
   signal temp : unsigned(WIDTH downto 0);

BEGIN

   Y1 <= ('0' & A) + ('0' & B) + 1;
   temp <= (A & '1') + (B & Y1(32));
   Y <= temp (WIDTH downto 1);

END;

-- 

Rick



Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search