Messages from 113950

Article: 113950
Subject: Tools available to split the design into multiple FPGAs.
From: "subint" <subin.82@gmail.com>
Date: 29 Dec 2006 21:30:41 -0800
Links: << >> << T >> << A >>

Hi,
           i would like to know is there is any tools available from
xilinx or other to split the design into multiple fpga and
synthesize...
regards
subin

Article: 113951
Subject: Re: SPI slave problem
From: Thomas Reinemann <tom.reinemann@gmx.net>
Date: Sat, 30 Dec 2006 08:31:31 +0100
Links: << >> << T >> << A >>

Ralf Hildebrandt schrieb:
> Ben Jackson schrieb:
> 
>>> 	reg	num = 7;
>> That's almost certainly wrong.
> 
> Initial values are ignored during synthesis. -> Create a reset for it!
Is it true for Verilog? Because at least XST regards initial values in
VHDL. I know some years ago, they had been ignored.

Bye Tom

Article: 113952
Subject: (Improve Verilog skill) Recommend CPU core with good document and coding?
From: "Shenli" <zhushenli@gmail.com>
Date: 29 Dec 2006 23:44:35 -0800
Links: << >> << T >> << A >>

Hi all,

These days, I found my Verilog code reading speed is not fast like my
C/C++ reading speed. It take me a lot of time to understand Verilog
code than C/C++ code.

So, I want to read through a small CPU core (I prefer line <10k) to
improve my Verilog coding reading/writing skills. Please recommend a
small open CPU core or other things with good document and coding
style.

Any suggestions about improve Verilog code reading speed is welcome!

Best regards,
Shenli

Article: 113953
Subject: ERROR:NgdBuild:604
From: "Venu" <get2venu@gmail.com>
Date: 30 Dec 2006 02:36:45 -0800
Links: << >> << T >> << A >>

Hi People ,

I am using a custom BRAM , but it is not getting synthesised , it keeps
giving the error that
   ERROR:NgdBuild:604 - logical block
'my_transmitter_0/my_transmitter_0/dpram0'
   with type 'custom_bram_0' could not be resolved. A pin name
misspelling can
   cause this, a missing edif or ngc file, or the misspelling of a type
name.
   Symbol 'custom_bram_0' is not supported in target 'virtex2p'.

custom BRAM generated using ----> Xilinx Core Generator
Xilinx ISE functional simulation  -----> completed successfully
Xilinx EDK BFM simulation      -----> completed successfully
Netlist                                   ------> completed successully
Bit Stream Generation           --------> FAILURE

There has been traffic on this groups regarding the problem that I am
facing, but none of the solutions that have been proposed are working
in my case.

I have designed a OPB Master Slave Peripheral , in which I have used a
Dual Port RAM generated by XilinxCoreGenerator. I name this core
custom_bram.The following files are generated
1)custom_bram.asy     2)custom_bram.edn  3)custom_bram.sym
4)custom_bram.v
5)custom_bram.veo      6)custom_bram.vhd  7)custom_bram.vho
8)custom_bram.xco
9)custom_bram_flist.txt

I copied custom_bram.vhd into my user directory and instantiated it as
a component in the main module.

Things that I have tried:
1) custom_bram uses an entity called XilinxCoreLib.blkmemdp_v6_3
defined in the library XilinxCoreLib . I copied all the related files
from the blkmemdp_v6_3 into the pcores/hdl/vhdl directory.

2) i copied the custom_bram.edn file generated into a directory called
/pcores/netlist and in the my_transmitter_0.mpd file made an entry
specifying that
OPTIONS STYLE = MIX.

3) In the instantiation of the custom_bram i have tried
my_transmitter_v1_00_0.custom_bram ( as suggested in one of the posts
on this group)

4) I am using 4 block rams in this designs ... In of the documents that
I read , it was stated that you cannot make multiple instantiations of
the same module, so i made 4 copies of the custom_bram , renamed them
and referred to each only once

None of these have worked .... Any ideas ? :)


Thanks 
Venu

Article: 113954
Subject: Re: Tools available to split the design into multiple FPGAs.
From: "Hans" <hans64@ht-lab.com>
Date: Sat, 30 Dec 2006 10:57:07 GMT
Links: << >> << T >> << A >>


"subint" <subin.82@gmail.com> wrote in message 
news:1167456641.853139.193370@s34g2000cwa.googlegroups.com...
> Hi,
>           i would like to know is there is any tools available from
> xilinx or other to split the design into multiple fpga and
> synthesize...

Have a look at Certify 
(http://www.synplicity.com/products/certify/index.html), BYO 
(http://www.byo-solutions.com/index.htm)  and Auspy (http://www.auspy.com/),

Hans
www.ht-lab.com


> regards
> subin
>

Article: 113955
Subject: Re: (Improve Verilog skill) Recommend CPU core with good document and coding?
From: "Jon Beniston" <jon@beniston.com>
Date: 30 Dec 2006 06:56:18 -0800
Links: << >> << T >> << A >>


Shenli wrote:
> Hi all,
>
> These days, I found my Verilog code reading speed is not fast like my
> C/C++ reading speed. It take me a lot of time to understand Verilog
> code than C/C++ code.
>
> So, I want to read through a small CPU core (I prefer line <10k) to
> improve my Verilog coding reading/writing skills. Please recommend a
> small open CPU core or other things with good document and coding
> style.
>
> Any suggestions about improve Verilog code reading speed is welcome!

Have a look at the LatticeMico32:

http://www.latticesemi.com/products/intellectualproperty/ipcores/mico32/index.cfm

Cheers,
Jon

Article: 113956
Subject: Re: SPI slave problem
From: "KJ" <kkjennings@sbcglobal.net>
Date: Sat, 30 Dec 2006 12:34:07 -0500
Links: << >> << T >> << A >>


"Thomas Reinemann" <tom.reinemann@gmx.net> wrote in message 
news:en54kt$s5u$1@news.boerde.de...
> Ralf Hildebrandt schrieb:
>> Ben Jackson schrieb:
>>
>>>> reg num = 7;
>>> That's almost certainly wrong.
>>
>> Initial values are ignored during synthesis. -> Create a reset for it!
> Is it true for Verilog? Because at least XST regards initial values in
> VHDL. I know some years ago, they had been ignored.

As a blanket statement, Ralf is incorrect in stating that "initial values 
are ignored during synthesis".  First of all it depends on the target 
device:  Does the target device have a defined state at power up (CPLD) or 
after configuration (FPGA).  Many devices do have such a definition.  The 
second consideration is the tool set used to synthesize the bitstream from 
the source code.  Some tools might not support an initial value.  It really 
does not depend on the language itself but the synthesis tool.

In any case, it's not hard to find a device and tool that will support 
initial values.  Ralf's advice to use a reset though is well founded. 
Having something that depends solely on the power up reset state is 
'usually' not sound design practice.  Again though there are exceptions, the 
shift chain that one should use to generate a synchronous reset being a good 
example.

Kevin Jennings

Article: 113957
Subject: How to deal with the negative value
From: "ZHI" <threeinchnail@gmail.com>
Date: 30 Dec 2006 10:41:03 -0800
Links: << >> << T >> << A >>

I want to transmit a set of data R to FPGA board. The ABS|R|<=1.
Initally, I enlarge these data by multiplying 2^7. If the data is
negative, the data will plus the 2^16. Actually, I did not think too
much at the beginning. The result back from FPGA board is correct. Now,
I am thinking if the data is negative, the data should plus 2^8 instead
of 2^16. But the result is wrong. I am really confused now. Could you
tell me what's wrong with it? Thank you.

Article: 113958
Subject: Re: How to deal with the negative value
From: "ZHI" <threeinchnail@gmail.com>
Date: 30 Dec 2006 11:29:19 -0800
Links: << >> << T >> << A >>

I would like to add some information here. I transmit these data by
separating them as 2 parts like this:

            for i= 1: N^2
                    if R1(i) < 0
                          R1(i) = R1(i) + 2^8;
                     end
                       for j=1:2
                         if j==1
                           R3(k)= rem(R1(i),256);
                           k=k+1;
                         elseif j==2
                           R3(k)= floor(R1(i)/256);
                           k=k+1;
                         end
                       end
             end

Then I use a double-ports ram in FPGA to store these data,  I use
8bits wideth data port to receive these data. Data is out from a port
with 16bits width. So these data resume to the original ones. I guess
if these data actually needs 16bits, so I plus 2^8 will get the wrong
result. I cannot convince myself. Does anybody know something about it?
Thank you.

Article: 113959
Subject: Memory controller design
From: "Piotr Wyderski" <wyderski@mothers.against.spam-ii.uni.wroc.pl>
Date: Sat, 30 Dec 2006 23:48:16 +0100
Links: << >> << T >> << A >>

Hi,

I would like to connect many independent data source/targets
to a common data stream. There will be a 36-bit static RAM
block of 2^20 words (9x IDT71V428-12) running as fast
as possible, i.e. at ~83MHz, which is supposed to be the
main storage of the system and a number of completely
unsynchronized components, trying to send/receive their
data streams to/from the RAM block. The FPGA chip will
be a Spartan 3 or 3E, I haven't chosen it yet. The FPGA 
will host, among other things, the following components:

a) a 2-way 18-bit SIMD fixed-point complex math processor
running at 65 MHz. All its simple scalar instructions should
complete in 1 cycle, which is doable, as there are hardware
18x18 multipliers. It will thus consume 292,5 MiB/s of the
avaliable bandwidth.

b) a high-speed USB2.0 bidirectional 8-bit datalink running
at 48Mhz, which gives 48 MiB/s.

c) an Ethernet 100 controller, full duplex mode => ~20 MiB/s.

d) an LCD display driver, about 2 MiB/s.

e) many slow links (SPI-like, AC-97 TDMA etc.), won't consume
much bandwidth.

The total bandwidth is 373 MiB/s, which easily covers the
requirements. My idea is to implement a static DMA-like
RAM transaction slot allocator, which will grant the bus for
the CPU in 65 slots out of 83, in 11 for the USB link etc.,
but how to implement a bunch of low-latency half-duplex
bridges between the 83MHz domain and the remaining ones?
I don't want to waste my precious BRAMs for that purpose,
so what should I do?

    Best regards
    Piotr Wyderski

Article: 113960
Subject: Re: (Improve Verilog skill) Recommend CPU core with good document and coding?
From: "Shenli" <zhushenli@gmail.com>
Date: 30 Dec 2006 17:21:08 -0800
Links: << >> << T >> << A >>


Jon Beniston wrote:
> Shenli wrote:
> > Hi all,
> >
> > These days, I found my Verilog code reading speed is not fast like my
> > C/C++ reading speed. It take me a lot of time to understand Verilog
> > code than C/C++ code.
> >
> > So, I want to read through a small CPU core (I prefer line <10k) to
> > improve my Verilog coding reading/writing skills. Please recommend a
> > small open CPU core or other things with good document and coding
> > style.
> >
> > Any suggestions about improve Verilog code reading speed is welcome!
>
> Have a look at the LatticeMico32:
>
> http://www.latticesemi.com/products/intellectualproperty/ipcores/mico32/index.cfm
>
> Cheers,
> Jon

Hi Jon,

Thanks a lot for the information!

Is LatticeMico32 easy to understand? Or with good documents describe
the Verilog file? Is it with good testbench? 

Best regards,
Davy

Article: 113961
Subject: hi......
From: "salu" <karanashu@gmail.com>
Date: 30 Dec 2006 22:38:36 -0800
Links: << >> << T >> << A >>

can any one tell em dcm or clock tree
and if multiple cloks r there in my design how to handle it
bufg concept in clock in fpga
i know only dedicated pins in fpga to assign diff clocks whihc r input
to fpga

Article: 113962
Subject: Re: Memory controller design
From: Jerzy Gbur <furia1024@wp.pl>
Date: Sun, 31 Dec 2006 12:18:30 +0100
Links: << >> << T >> << A >>

Hi Piotr,

Piotr Wyderski napisa³(a):
> Hi,
> 
> I would like to connect many independent data source/targets
> to a common data stream. There will be a 36-bit static RAM
> block of 2^20 words (9x IDT71V428-12) running as fast
> as possible, i.e. at ~83MHz, which is supposed to be the
> main storage of the system and a number of completely
> unsynchronized components, trying to send/receive their
> data streams to/from the RAM block. The FPGA chip will
> be a Spartan 3 or 3E, I haven't chosen it yet. The FPGA will host, among 
> other things, the following components:
> 
> a) a 2-way 18-bit SIMD fixed-point complex math processor
> running at 65 MHz. All its simple scalar instructions should
> complete in 1 cycle, which is doable, as there are hardware
> 18x18 multipliers. It will thus consume 292,5 MiB/s of the
> avaliable bandwidth.
> 
> b) a high-speed USB2.0 bidirectional 8-bit datalink running
> at 48Mhz, which gives 48 MiB/s.
> 
> c) an Ethernet 100 controller, full duplex mode => ~20 MiB/s.
> 
> d) an LCD display driver, about 2 MiB/s.
> 
> e) many slow links (SPI-like, AC-97 TDMA etc.), won't consume
> much bandwidth.
> 
> The total bandwidth is 373 MiB/s, which easily covers the
> requirements. My idea is to implement a static DMA-like
> RAM transaction slot allocator, which will grant the bus for
> the CPU in 65 slots out of 83, in 11 for the USB link etc.,
> but how to implement a bunch of low-latency half-duplex
> bridges between the 83MHz domain and the remaining ones?
> I don't want to waste my precious BRAMs for that purpose,
> so what should I do?

IMHO you should use at least BRAM for preparing data to/from SRAM's BUS.
BUS side should work at 83MHz, but inner side should work faster to 
accomplish multiplexing data in adequate "slots".
I don't know how you like to match Address BUS and Data BUS, If I were 
you I use second BRAM for matching address.

Best Regards,

Jerzy Gbur

Article: 113963
Subject: Re: Memory controller design
From: "Piotr Wyderski" <wyderski@mothers.against.spam-ii.uni.wroc.pl>
Date: Sun, 31 Dec 2006 13:10:34 +0100
Links: << >> << T >> << A >>

Jerzy Gbur wrote:

> IMHO you should use at least BRAM for preparing data to/from SRAM's BUS.

Yes, but this way the fast random access time will be lost and the
whole system will behave like a DRAM-based system with a tiny
cache. Another option is to clock the CPU at 83MHz to match
the bus speed and add the HLD signal, like in the old good DMA
controllers. It simplifies a lot of things, but the initial question
"how to connect many slower participants to the bus?" remains
open. In this design some of them can be easily attached, as
83/2 = 41,5 and 83/4 = 20,75, so my USB and Ethernet links 
could work synchronously with the bus, but many other sources
(AC-97 codecs, display) cannot by synchronized this way.

> BUS side should work at 83MHz, but inner side should work faster to 
> accomplish multiplexing data in adequate "slots".

It depends what you call "inner side". The CPU is supposed to
work at 2--3 times higher frequency than I said, to hide its
internal simple pipeline and appear to be one cycle design.
But its memory interface is bounded by the available bandwidth.
There is a large data source/target domain that _must_ be clocked
at 65MHz, but I can connect it via a BRAM to the CPU domain.

> I don't know how you like to match Address BUS and Data BUS

What do you mean by "bus matching"?

    Best regards
    Piotr Wyderski

Article: 113964
Subject: Re: hi......
From: Austin <austin@xilinx.com>
Date: Sun, 31 Dec 2006 09:53:44 -0800
Links: << >> << T >> << A >>

salu,

http://www.xilinx.com/cgi-bin/search/googleSearch?btnG=Google+Search&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=iso-8859-1&client=xilinx&oe=iso-8859-1&proxystylesheet=xilinx&filter=0&requiredfields=&q=dcm+clock+tree&site=Documentation&submit2.x=0&submit2.y=0&submit2=Search
or
http://tinyurl.com/yjmq5r

Read all about DCM/s, clock trees.  Or refine your search for a specific 
product, and read less.

Austin

Article: 113965
Subject: xilinx xc9536?
From: <highZ>
Date: Sun, 31 Dec 2006 18:18:36 -0000
Links: << >> << T >> << A >>

Hello, there are some pins on xilinx xc9536 which are called global
clock1/2/3
global reset, etc, where are these explained?

Article: 113966
Subject: Help with ISE (multi-source in unit error)
From: "idp2" <ian.peikon@gmail.com>
Date: 31 Dec 2006 10:28:22 -0800
Links: << >> << T >> << A >>

Hi,

     I'm pretty new to verilog and I am trying to write code to compute
a mean and store it in RAM.  I update the ram each time a new sample
comes in and thus the ram becomes my second addend. Here is a bit of
the code.  Why am I getting multi-source in unit on cal_ram_di?

always @(posedge clk)
begin
	if(~done & stepCnt == 0)
	begin
		if(cal_cnt == 0)
		begin
			addend1 <= {8'h00, samp12}; //pad with zeros to the left, new sample
to add to running sum
			addend2 <= cal_ram_do[20:0];	//the current running sum for the
channel
			gotoffset <= 0;
		end
		else if(cal_cnt == 1)
		begin
			if(sampCtr == 0)
			begin
				cal_ram_di[20:0] <= addend1; //this is the first sample
				cal_ram_we <= 1; //assert the write enable so we can latch the data
in the RAM
			end
			else
			begin
				cal_ram_di[20:0] <= addend1 + addend2; //every other sample
				cal_ram_we <= 1;
			end
		end
		else if(cal_cnt == 2)
		begin
			if(sampCtr == 511)
				meanVal <= cal_ram_di[20:9]; //we have monitored for 512 samples so
divide by 2^9 (512) --- leave this as di.
		end
		else if(cal_cnt == 3)
		begin
			if(sampCtr == 511) //we have computed sums for all 512 samples so
compute the offset
			begin
				cal_ram_di[28:0] <= {(meanVal-12'h800), 16'h0000, 1'b0}; //put the
offset in the right spot because if it is not there i can't do the
offset subtraction the same the whole time
				cal_ram_we <= 1; //assert ram write enable and latch the data
				gotoffset <=1;
			end	
		end
	end
end

Article: 113967
Subject: Re: Memory controller design
From: "KJ" <kkjennings@sbcglobal.net>
Date: Sun, 31 Dec 2006 18:49:49 GMT
Links: << >> << T >> << A >>


"Piotr Wyderski" <wyderski@mothers.against.spam-ii.uni.wroc.pl> wrote in 
message news:en6qbk$p2s$1@news.dialog.net.pl...
> Hi,
>
> I would like to connect many independent data source/targets
> to a common data stream. There will be a 36-bit static RAM
<snip>
> The total bandwidth is 373 MiB/s, which easily covers the
> requirements. My idea is to implement a static DMA-like
> RAM transaction slot allocator, which will grant the bus for
> the CPU in 65 slots out of 83, in 11 for the USB link etc.,
> but how to implement a bunch of low-latency half-duplex
> bridges between the 83MHz domain and the remaining ones?
> I don't want to waste my precious BRAMs for that purpose,
> so what should I do?
>
The function that you're describing is an arbitrator; you have multiple 
sources that need to share access to a shared resource (the SRAM), the 
management of who gets control of that resource at any particular time is up 
to whatever arbitration function you choose to implement.

If you view it in that context your 'bunch of low-latency half-duplex 
bridges' will present as much of a challenge as you may think.  The best way 
to go about this is to start with the entity definition for the SRAM 
arbitration function.  Each potential master requires a private interface to 
the arbitrator, the arbitrator also has a master interface to the external 
SRAM itself.  So if you have 10 potential sources to the SRAM then the 
arbitrator will have 10 slave interfaces (to each of those sources) plus an 
SRAM master interface.

Next consider the requirements of each of those sources.  Do they have some 
sort of 'wait' signal that will cause it to hold address and write data 
(during a write) and cause it to hold address while waiting for a read to 
complete?  What kind of read cycle time performance is required?  It sounds 
like you have a handle on the bandwidth requirements but are there any 
latency requirements (i.e. how long can something 'wait')?

If you go about the process as figuring out the requirements of the 
arbitration function and work through the requirements that each master 
presents and the target SRAM slave then it should start to fall into place.

Kevin Jennings

Article: 113968
Subject: Re: Help with ISE (multi-source in unit error)
From: mk <kal*@dspia.*comdelete>
Date: Sun, 31 Dec 2006 19:37:12 GMT
Links: << >> << T >> << A >>

On 31 Dec 2006 10:28:22 -0800, "idp2" <ian.peikon@gmail.com> wrote:

>Hi,
>
>     I'm pretty new to verilog and I am trying to write code to compute
>a mean and store it in RAM.  I update the ram each time a new sample
>comes in and thus the ram becomes my second addend. Here is a bit of
>the code.  Why am I getting multi-source in unit on cal_ram_di?
>
>always @(posedge clk)
>begin
>	if(~done & stepCnt == 0)
>	begin
>		if(cal_cnt == 0)

Are you sure this is all the code which assigns to cal_ram_di? What
you show is a single always block so it's difficult to get a
multi-source out of it. Check where else you're using cal_ram_di to
see if you're declaring it as input or whether you're assigning to it
again.

Another comment is that you can change the "if (cal_cnt==0) to a case
statement which might give you better performance.

Article: 113969
Subject: Re: Help with ISE (multi-source in unit error)
From: "idp2" <ian.peikon@gmail.com>
Date: 31 Dec 2006 12:04:31 -0800
Links: << >> << T >> << A >>

That is only one of my always blocks that works with cal_ram_di.  I
have two others but they are based ont the conditions if(~done
&stepCnt==1) and if(~done & stepCnt ==2)...is that what is causing the
problem??  If so how do I fix that?
mk wrote:
> On 31 Dec 2006 10:28:22 -0800, "idp2" <ian.peikon@gmail.com> wrote:
>
> >Hi,
> >
> >     I'm pretty new to verilog and I am trying to write code to compute
> >a mean and store it in RAM.  I update the ram each time a new sample
> >comes in and thus the ram becomes my second addend. Here is a bit of
> >the code.  Why am I getting multi-source in unit on cal_ram_di?
> >
> >always @(posedge clk)
> >begin
> >	if(~done & stepCnt == 0)
> >	begin
> >		if(cal_cnt == 0)
>
> Are you sure this is all the code which assigns to cal_ram_di? What
> you show is a single always block so it's difficult to get a
> multi-source out of it. Check where else you're using cal_ram_di to
> see if you're declaring it as input or whether you're assigning to it
> again.
>
> Another comment is that you can change the "if (cal_cnt==0) to a case
> statement which might give you better performance.

Article: 113970
Subject: Re: Memory controller design
From: "Piotr Wyderski" <wyderski@mothers.against.spam-ii.uni.wroc.pl>
Date: Sun, 31 Dec 2006 21:44:45 +0100
Links: << >> << T >> << A >>

KJ wrote:

> Next consider the requirements of each of those sources.  Do they have
> some sort of 'wait' signal that will cause it to hold address and write 
> data (during a write) and cause it to hold address while waiting for a 
> read to complete?

Yes, they do.

> What kind of read cycle time performance is required?

The CPU must run as fast as possible because of its computationally
-intensive tasks, but no access time restriction is required, i.e. it is not
important whether a particular single load or store takes one or ten
cycles to complete, as long as they statistically complete in 1.28 cycle
(83/65) on average for a trurly random access pattern. The USB and
Ethernet links work similarly, as their master controllers are in the
FPGA itself (i.e. no external component screams "feed me!"), so
again, there are no real-time requirements. The only real-time
components are AC-97 codecs and the display (that is, its pixel bus),
but they are slow.

> but are there any latency requirements (i.e. how long can something 
> 'wait')?

Fortunately not, only the bandwidth matters. Well, several channels
have bounded maximal latency, but it is so long compared to the
RAM bus cycle that it could be easily fulfilled by an approprate
arbitration function. A simple round-robin prioritizer will be perfectly
enough.

> If you go about the process as figuring out the requirements of the 
> arbitration function and work through the requirements that each master 
> presents and the target SRAM slave then it should start to fall into 
> place.

Well, think of many DMA channels connected to much slower
clock domains, it's a good model. The problem is how to pass
their data and configuration parameters between the main clock
domain and their respective domains.

Now I think that a separate RAM clock domain is too hard to
be implemented reliably, so I can redesign the system in order
to run the CPU at the same clock rate. It will allow me to
implement the arbitrator in an old way, i.e. to add the HLD
signal to the CPU and state that the DMA controller has
higher priority, but it will require more (mostly unidirectional)
synchronization bridges elsewhere. They must be made of
CLBs, because I need BRAMs for better purposes.

    Best regards
    Piotr Wyderski

Article: 113971
Subject: Re: Memory controller design
From: nico@puntnl.niks (Nico Coesel)
Date: Sun, 31 Dec 2006 20:54:28 GMT
Links: << >> << T >> << A >>

"KJ" <kkjennings@sbcglobal.net> wrote:

>
>"Piotr Wyderski" <wyderski@mothers.against.spam-ii.uni.wroc.pl> wrote in 
>message news:en6qbk$p2s$1@news.dialog.net.pl...
>> Hi,
>>
>> I would like to connect many independent data source/targets
>> to a common data stream. There will be a 36-bit static RAM
><snip>
>> The total bandwidth is 373 MiB/s, which easily covers the
>> requirements. My idea is to implement a static DMA-like
>> RAM transaction slot allocator, which will grant the bus for
>> the CPU in 65 slots out of 83, in 11 for the USB link etc.,
>> but how to implement a bunch of low-latency half-duplex
>> bridges between the 83MHz domain and the remaining ones?
>> I don't want to waste my precious BRAMs for that purpose,
>> so what should I do?
>>
>If you view it in that context your 'bunch of low-latency half-duplex 
>bridges' will present as much of a challenge as you may think.  The best way 
>to go about this is to start with the entity definition for the SRAM 
>arbitration function.  Each potential master requires a private interface to 
>the arbitrator, the arbitrator also has a master interface to the external 
>SRAM itself.  So if you have 10 potential sources to the SRAM then the 
>arbitrator will have 10 slave interfaces (to each of those sources) plus an 
>SRAM master interface.
>
>Next consider the requirements of each of those sources.  Do they have some 
>sort of 'wait' signal that will cause it to hold address and write data 
>(during a write) and cause it to hold address while waiting for a read to 
>complete?  What kind of read cycle time performance is required?  It sounds 
>like you have a handle on the bandwidth requirements but are there any 
>latency requirements (i.e. how long can something 'wait')?
>
>If you go about the process as figuring out the requirements of the 
>arbitration function and work through the requirements that each master 
>presents and the target SRAM slave then it should start to fall into place.

This is not so difficult to implement. Using a priority encoder and a
state-machine which performs a memory transaction, the entire arbiter
is almost finished. The trick is to design the state-machine in a way
the maximum bandwidth can be used and the bandwidth is shared
properly.

There is also a different approach which has been discussed in this
group before. I believe it is called a ring bus. It seems pretty
clever and I will consider using it the next time I have to share a
memory between different devices.

Daniel Sauvageau wrote something about it before in a thread called
'ddr with multiple users':


Why use a ring bus?
- Nearly immune to wire delays since each node inserts bus pipelining 
FFs with distributed buffer control (big plus for ASICs)
- Low signal count (all things being relative) memory controller:
	- 36bits input (muxed command/address/data/etc.)
	- 36bits output (muxed command/address/data/etc.)
- Same interface regardless of how many memory clients are on the bus
- Can double as a general-purpose modular interconnect, this can be 
useful for node-to-node burst transfers like DMA
- Bandwidth and latency can be tailored by shuffling components, 
inserting extra memory controller taps or adding rings as necessary
- Basic arbitration is provided for free by node ordering

The only major down-side to ring buses is worst-case latency. Not much

of an issue for me since my primary interest is video 
processing/streaming - I can simply preload one line ahead and pretty 
much forget about latency.

Flexibility, scalability and routability are what makes ring buses so 
popular in modern large-scale, high-bandwidth ASICs and systems. It is

all a matter of trading some up-front complexity and latency for 
long-term gain.

-- 
Reply to nico@nctdevpuntnl (punt=.)
Bedrijven en winkels vindt U op www.adresboekje.nl

Article: 113972
Subject: Re: (Improve Verilog skill) Recommend CPU core with good document and coding?
From: joseph2k <quiettechblue@yahoo.com>
Date: Sun, 31 Dec 2006 20:55:23 GMT
Links: << >> << T >> << A >>

Shenli wrote:

> 
> Jon Beniston wrote:
>> Shenli wrote:
>> > Hi all,
>> >
>> > These days, I found my Verilog code reading speed is not fast like my
>> > C/C++ reading speed. It take me a lot of time to understand Verilog
>> > code than C/C++ code.
>> >
>> > So, I want to read through a small CPU core (I prefer line <10k) to
>> > improve my Verilog coding reading/writing skills. Please recommend a
>> > small open CPU core or other things with good document and coding
>> > style.
>> >
>> > Any suggestions about improve Verilog code reading speed is welcome!
>>
>> Have a look at the LatticeMico32:
>>
>>
http://www.latticesemi.com/products/intellectualproperty/ipcores/mico32/index.cfm
>>
>> Cheers,
>> Jon
> 
> Hi Jon,
> 
> Thanks a lot for the information!
> 
> Is LatticeMico32 easy to understand? Or with good documents describe
> the Verilog file? Is it with good testbench?
> 
> Best regards,
> Davy

You also could try opencores.org

-- 
 JosephKK
 Gegen dummheit kampfen die Gotter Selbst, vergebens.Â Â 
  --Schiller

Article: 113973
Subject: Re: SPI slave problem
From: tersono <ethel.thefrog@ntlworld.com>
Date: Sun, 31 Dec 2006 20:58:26 GMT
Links: << >> << T >> << A >>


Check your email.
--
Per ardua ad nauseam

Article: 113974
Subject: Re: xilinx xc9536?
From: Ben Jackson <ben@ben.com>
Date: Sun, 31 Dec 2006 15:50:22 -0600
Links: << >> << T >> << A >>

On 2006-12-31, <highZ> <> wrote:
> Hello, there are some pins on xilinx xc9536 which are called global
> clock1/2/3
> global reset, etc, where are these explained?

There's a document called something like "XC9500 device family datasheet".
Those pins are (optionally) connected to special internal routing resources
that make them suitable for use as input clocks and global set/reset.
Isn't there also a global tristate?

-- 
Ben Jackson AD7GD
<ben@ben.com>
http://www.ben.com/

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search