Messages from 108425

Article: 108425
Subject: Re: Xilinx ISE ver 8.2.02i is optimizing away and removing "redundant" logic - help!
From: "KJ" <kkjennings@sbcglobal.net>
Date: Mon, 11 Sep 2006 10:22:50 GMT
Links: << >> << T >> << A >>


"Weng Tianxiang" <wtxwtx@gmail.com> wrote in message
> Never spend time doing post-map simulation;
I wouldn't necessarily recommend this.  What you need to do is first get 
your experience up to a level where post-route simulation reveals no 
surprises.  The way to get that experience is to do a few of these in the 
first place and then get a feel for where your code is not quite up to 
snuff.

As an example, it's possible to get all the way through the build process 
and have no errors or warnings and have it still not work simply because you 
used a 'natural' to build a counter instead of unsigned (see my previous 
post for more on when this can be a problem).  In the hands of an 
experienced designer, use of 'natural' can be better than 'unsigned'; 
outside of those experienced hands is quite a different story.

There are also times (like ASIC designs) or contracting where post-route sim 
is required as a check off item that needs to be completed.

In any case, in the hands of an experienced designer doing an FPGA, 
post-route sim can typically be skipped as you suggest but as a general rule 
no it can not.  In fact, in the case of the original poster of this thread, 
the post-route simulation is waving the big red flag indicating that there 
is something wrong either with his design or testbench.....that's a good 
thing, better to find out sooner rather than later....the problem is that 
instead of simply debugging to find the cause of the problem he seems to 
want to flip whatever build time switches are available to make the problem 
somehow disappear.

KJ

Article: 108426
Subject: Re: ddr with multiple users
From: "KJ" <kkjennings@sbcglobal.net>
Date: Mon, 11 Sep 2006 10:44:21 GMT
Links: << >> << T >> << A >>


"David Ashley" <dash@nowhere.net.dont.email.me> wrote in message 
news:4505047b$1_1@x-privat.org...
> Weng Tianxiang wrote:
>> Hi Daniel,
>> Here is my suggestion.
>> For example, there are 5 components which have access to DDR controller
>> module.
>> What I would like to do is:
>> 1. Each of 5 components has an output buffer shared by DDR controller
>> module;
Not sure what is being 'shared'.  If it is the actual DDR output pins then 
this is problematic....you likely won't be able to meet DDR timing when 
those DDR signals are coming and spread out to 5 locations instead of just 
one as it would be with a standard DDR controller.  Even if it did work for 
5 it wouldn't scale well either (i.e. 10 users of the DDR).

If what is 'shared' is the output from the 5 component that feed in to the 
input of the DDR controller, than you're talking about internal tri-states 
which may be a problem depending on which target device is in question.

<snip>
>> In the command data, you may add any information you like.
>> The best benefit of this scheme is it has no delays and no penalty in
>> performance, and it has minimum number of buses.
You haven't convinced me of any of these points.  Plus how it would address 
the pecularities of DDRs themselves where there is a definite performance 
hit for randomly thrashing about in memory has not been addressed.
>>
>> Weng
>>
>
> Weng,
>
> Your strategy seems to make sense to me. I don't actually know what a
> ring buffer is. Your design seems appropriate for the imbalance built
> into the system -- that is, any of the 5 components can initiate a
> command at any time, however the DDR controller can only respond
> to one command at a time. So you don't need a unique link to each
> component for data coming from the DDR.
A unique link to an arbitrator though allows each component to 'think' that 
it is running independently and addressing DDR at the same time.  In other 
words, all 5 components can start up their own transaction at the exact same 
time.  The arbitration logic function would buffer up all 5, selecting one 
of them for output to the DDR.  When reading DDR this might not help 
performance much but for writing it can be a huge difference.

>
> However thinking a little more on it, each of the 5 components must
> have logic to ignore the data that isn't targeted at themselves. Also
> in order to be able to deal with data returned from the DDR at a
> later time, perhaps a component might store it in a fifo anyway.
>
> The approach I had sort of been envisioning involved for each
> component you have 2 fifos, one goes for commands and data
> from the component to the ddr, and the other is for data coming
> back from the ddr. The ddr controller just needs to decide which
> component to pull commands from --  round robin would be fine
> for my application. If it's a read command, it need only stuff the
> returned data in the right fifo.
That's one approach.  If you think some more on this you should be able to 
see a way to have a single fifo for the readback data from the DDR (instead 
of one per component).

KJ

Article: 108427
Subject: Re: Performance Appraisals
From: John Woodgate <jmw@jmwa.demon.co.uk>
Date: Mon, 11 Sep 2006 11:46:31 +0100
Links: << >> << T >> << A >>

In message <45052536.FAFBA9C6@earthlink.net>, dated Mon, 11 Sep 2006, 
Michael A. Terrell <mike.terrell@earthlink.net> writes
>   No one is going to pay attention to any idiot who posts in all caps.

Who did that?
-- 
OOO - Own Opinions Only. Try www.jmwa.demon.co.uk and www.isce.org.uk
There are benefits from being irrational - just ask the square root of 2.
John Woodgate, J M Woodgate and Associates, Rayleigh, Essex UK

Article: 108428
Subject: Re: simplyrisc-s1 free core
From: "Jon Beniston" <jon@beniston.com>
Date: 11 Sep 2006 03:47:10 -0700
Links: << >> << T >> << A >>


> I am trying it right now - seems like lot of fun, when trying it with
> Xilinx ISE it has already managed to make 3 different kinds of fatal
> crashes !!

And how exactly this is different from any other large design? :-)

Article: 108429
Subject: Re: simplyrisc-s1 free core
From: "Antti" <Antti.Lukats@xilant.com>
Date: 11 Sep 2006 03:53:37 -0700
Links: << >> << T >> << A >>

Jon Beniston schrieb:

> > I am trying it right now - seems like lot of fun, when trying it with
> > Xilinx ISE it has already managed to make 3 different kinds of fatal
> > crashes !!
>
> And how exactly this is different from any other large design? :-)

you said it :)

well the only difference is that this is the one large design I would
like to get going right now. hm.. the other last largish design I also
have interest is the OpenRisc1000 - that one does not crash but it
terminates the build saying that 1GB RAM is not enough !

so it can be that the only large designs that do not crash are
those that Xilinx is using for ISE harness testing.

Antti

Article: 108430
Subject: Xilinx Platform Studio 8.2i - Add custom peripheral, adress Space calculation
From: "peter.kampmann@googlemail.com" <peter.kampmann@googlemail.com>
Date: 11 Sep 2006 04:51:02 -0700
Links: << >> << T >> << A >>

Hi,

in order to integrate my VHDL Module into a SoC, I used the Create /
Import Custom Peripheral Wizard in the
Xilinx Platform Studio.

I followed two Articles on the Xilinx Webpage for that problem
(http://direct.xilinx.com/direct/ise7_tutorials/import_peripheral_tutorial.pdf#search=%22Xilinx%20custom%20peripheral%20base%20address%22

and
http://www.reconfigurable.com/xlnx/xweb/xil_tx_display.jsp?iLanguageID=1&category=&sGlobalNavPick=&sSecondaryNavPick=&multPartNum=3&sTechX_ID=rg_cust_periph)

In both articles, it seems that the C_BASEADDR and C_HIGHADDR are
automatically calculated by Platform Studio. (Via "Generate Adresses"
in "Add/Edit Core" Dialog)
Unfortunately, both articles are working with an older version of
Platform Studio and some Menues/Dialogs seem to have been vanished.

So does anyone know where the magic button, that calculates the
adresses is hidden? :-)

Thanks and regards,
Peter

Article: 108431
Subject: What would be the best evaluation board for machin vision algo?
From: "=?utf-8?B?66eI7Ims?=" <shineby@sogang.ac.kr>
Date: 11 Sep 2006 05:05:20 -0700
Links: << >> << T >> << A >>

I am looking for evaluation board for image processing and implementing
machine vision algo.
So far, I counld find 3 boards. First one is ML402 'Video starter kit'
and second one is V4IP-500, and the last is V4IP-1000.

V4IP-500, and 1000 evaluation board include camera and LCD in it. And
V4IP-1000 also has a CAM link and DIV in / out. However cameras I have
doesn't not support CAM link. It only has S-video, or RF TV out.

So I am so confused. Most of machine vision algo. needs much memory and
to check algorithm works well, sometimes in/out video interface should
be easy. And writeingimage file to memory and reading image files (
result image ) should be easy.

I know it is so hard to fulfill all these condition. But I think there
would be best solution. So if anyone have experences, please help me.

Thank you sooooo much for reading my post.

P.S. English is not my mother language, so please understand my poor
english.
Peace be with you!!!
IoI

Article: 108432
Subject: Re: Xilinx Platform Studio 8.2i - Add custom peripheral, adress Space calculation
From: "Antti" <Antti.Lukats@xilant.com>
Date: 11 Sep 2006 05:22:40 -0700
Links: << >> << T >> << A >>


peter.kampmann@googlemail.com schrieb:

> Hi,
>
> in order to integrate my VHDL Module into a SoC, I used the Create /
> Import Custom Peripheral Wizard in the
> Xilinx Platform Studio.
>
> I followed two Articles on the Xilinx Webpage for that problem
> (http://direct.xilinx.com/direct/ise7_tutorials/import_peripheral_tutorial.pdf#search=%22Xilinx%20custom%20peripheral%20base%20address%22
>
> and
> http://www.reconfigurable.com/xlnx/xweb/xil_tx_display.jsp?iLanguageID=1&category=&sGlobalNavPick=&sSecondaryNavPick=&multPartNum=3&sTechX_ID=rg_cust_periph)
>
> In both articles, it seems that the C_BASEADDR and C_HIGHADDR are
> automatically calculated by Platform Studio. (Via "Generate Adresses"
> in "Add/Edit Core" Dialog)
> Unfortunately, both articles are working with an older version of
> Platform Studio and some Menues/Dialogs seem to have been vanished.
>
> So does anyone know where the magic button, that calculates the
> adresses is hidden? :-)
>
> Thanks and regards,
> Peter

right pane
system ass view
(*) addresses
"generate address"

Antti

Article: 108433
Subject: Re: Xilinx Platform Studio 8.2i - Add custom peripheral, adress Space calculation
From: "Peter Kampmann" <peter.kampmann@googlemail.com>
Date: 11 Sep 2006 05:36:07 -0700
Links: << >> << T >> << A >>

Thanks, a lot!

Regards,
Peter

Antti schrieb:

> peter.kampmann@googlemail.com schrieb:
>
> > Hi,
> >
> > in order to integrate my VHDL Module into a SoC, I used the Create /
> > Import Custom Peripheral Wizard in the
> > Xilinx Platform Studio.
> >
> > I followed two Articles on the Xilinx Webpage for that problem
> > (http://direct.xilinx.com/direct/ise7_tutorials/import_peripheral_tutorial.pdf#search=%22Xilinx%20custom%20peripheral%20base%20address%22
> >
> > and
> > http://www.reconfigurable.com/xlnx/xweb/xil_tx_display.jsp?iLanguageID=1&category=&sGlobalNavPick=&sSecondaryNavPick=&multPartNum=3&sTechX_ID=rg_cust_periph)
> >
> > In both articles, it seems that the C_BASEADDR and C_HIGHADDR are
> > automatically calculated by Platform Studio. (Via "Generate Adresses"
> > in "Add/Edit Core" Dialog)
> > Unfortunately, both articles are working with an older version of
> > Platform Studio and some Menues/Dialogs seem to have been vanished.
> >
> > So does anyone know where the magic button, that calculates the
> > adresses is hidden? :-)
> >
> > Thanks and regards,
> > Peter
>
> right pane
> system ass view
> (*) addresses
> "generate address"
> 
> Antti

Article: 108434
Subject: Re: ddr with multiple users
From: "Weng Tianxiang" <wtxwtx@gmail.com>
Date: 11 Sep 2006 05:38:07 -0700
Links: << >> << T >> << A >>

KJ wrote:
> "David Ashley" <dash@nowhere.net.dont.email.me> wrote in message
> news:4505047b$1_1@x-privat.org...
> > Weng Tianxiang wrote:
> >> Hi Daniel,
> >> Here is my suggestion.
> >> For example, there are 5 components which have access to DDR controller
> >> module.
> >> What I would like to do is:
> >> 1. Each of 5 components has an output buffer shared by DDR controller
> >> module;
> Not sure what is being 'shared'.  If it is the actual DDR output pins then
> this is problematic....you likely won't be able to meet DDR timing when
> those DDR signals are coming and spread out to 5 locations instead of just
> one as it would be with a standard DDR controller.  Even if it did work for
> 5 it wouldn't scale well either (i.e. 10 users of the DDR).
>
> If what is 'shared' is the output from the 5 component that feed in to the
> input of the DDR controller, than you're talking about internal tri-states
> which may be a problem depending on which target device is in question.
>
> <snip>
> >> In the command data, you may add any information you like.
> >> The best benefit of this scheme is it has no delays and no penalty in
> >> performance, and it has minimum number of buses.
> You haven't convinced me of any of these points.  Plus how it would address
> the pecularities of DDRs themselves where there is a definite performance
> hit for randomly thrashing about in memory has not been addressed.
> >>
> >> Weng
> >>
> >
> > Weng,
> >
> > Your strategy seems to make sense to me. I don't actually know what a
> > ring buffer is. Your design seems appropriate for the imbalance built
> > into the system -- that is, any of the 5 components can initiate a
> > command at any time, however the DDR controller can only respond
> > to one command at a time. So you don't need a unique link to each
> > component for data coming from the DDR.
> A unique link to an arbitrator though allows each component to 'think' that
> it is running independently and addressing DDR at the same time.  In other
> words, all 5 components can start up their own transaction at the exact same
> time.  The arbitration logic function would buffer up all 5, selecting one
> of them for output to the DDR.  When reading DDR this might not help
> performance much but for writing it can be a huge difference.
>
> >
> > However thinking a little more on it, each of the 5 components must
> > have logic to ignore the data that isn't targeted at themselves. Also
> > in order to be able to deal with data returned from the DDR at a
> > later time, perhaps a component might store it in a fifo anyway.
> >
> > The approach I had sort of been envisioning involved for each
> > component you have 2 fifos, one goes for commands and data
> > from the component to the ddr, and the other is for data coming
> > back from the ddr. The ddr controller just needs to decide which
> > component to pull commands from --  round robin would be fine
> > for my application. If it's a read command, it need only stuff the
> > returned data in the right fifo.
> That's one approach.  If you think some more on this you should be able to
> see a way to have a single fifo for the readback data from the DDR (instead
> of one per component).
>
> KJ

Hi,
My scheme is not only a strategy, but a finished work. The following is
more to disclose.

1. What means sharing between 1 component and DDR controller system is:
The output fifo of one component are shared by one component and DDR
controller module, one component uses write half and DDR uses another
read half.

2. The output fifo uses the same technique as what I mentioned in the
previous email:
command word and data words are mixed, but there are more than that:
The command word contains either write or read commands.

So in the output fifo, data stream looks like this:
Read command, address, number of bytes;
Write command, address, number of bytes;
Data;
...
Data;
Write command, address, number of bytes;
Data;
...
Data;
Read command, address, number of bytes;
Read command, address, number of bytes;
...

3. In DDR controller side, there is small logic to pick read commands
from input command/data stream, then put them into a read command queue
that is used by DDR module to access read commands. You don't have to
worry why read command is put behind a write command. For all
components, if a read command is issued after a write command, the read
command cannot be executed until write data is fully written into DDR
system to avoid interfering the write/read order.

4. The DDR has its output fifo and a different output bus. The output
fifo plays a buffer that separate coupling between DDR its own
operations and output function.

DDR read data from DDR memory and put data into its output fifo. There
is output bus driver that picks up data from the DDR output buffer,
then put it in output bus in a format that target component likes best.
Then the output bus is shared by 5 components which read their own
data, like a wireless communication channel: they only listen and get
their own data on the output bus, never inteference with others.

5. All components work at their full speeds.

6. Arbitor module resides in DDR controller module. It doesn't control
which component should output data, but it controls which fifo should
be read first to avoid its fullness and determine how to insert
commands into DDR command streams that will be sent to DDR chip. In
that way, all output fifo will work in full speeds according to their
own rules.

7. Every component must have a read fifo to store data read from DDR
output bus. One cannot skip the read fifo, because you must have a
capability to adjust read speed for each component and read data from
DDR output bus will disappear after 1 clock.

In short, each component has a write fifo whose read side is used by
DDR controller and a read fifo that picks data from DDR controller
output bus.

In the result, the number of wires used for communications between DDR
controller and all components are dramatically reduced at least by more
than 100 wires for a 5 component system.

What is the other problem?

Weng

Article: 108435
Subject: Re: Xilinx ISE ver 8.2.02i is optimizing away and removing "redundant" logic - help!
From: "Weng Tianxiang" <wtxwtx@gmail.com>
Date: 11 Sep 2006 06:12:46 -0700
Links: << >> << T >> << A >>

KJ wrote:
> "Weng Tianxiang" <wtxwtx@gmail.com> wrote in message
> news:1157941504.597171.318920@i42g2000cwa.googlegroups.com...
> > I widely use the equation like:
> > a <= a +1;
> >
> > Usually a is an unsigned (or std_logic_vector) for a counter, it
> > doesn't matter whether the equation is in a process or in a concurrent
> > area.
>
> It matters very much whether it is in a clocked process or not.  If 'a<=a+1'
> is in an unclocked process or concurrent statement you've just created a
> latch.  As a general guideline if you ever have a signal on both sides of
> the '<=' in an area outside of a clocked process you've got a latch.
>
>
> > No any problem.
>
> I doubt that.  a<= a+1 outside of a clocked process will (at best) produce a
> counter that increments by one at whatever uncontrolled propogation delay of
> the device you have
>
> >
> > I don't see why VHDL dislikes it or it cannot be synthesized.
>
> It can be synthesized....it just is highly unlikely to do what you want it
> to do.
>
> KJ

Hi KJ,
No, I disagree with you about that it would generate a latch.

Actually it is a combinational logic. Through one of Xilinx tools, you
can check that it just generates combinational logic. That is all.

No latch would be generated if 'a <= a+1;' is in a concurrent area and
even in a process without clock. I don't know Verilog, but know well
about VHDL.

Counter a with 'a <= a+1;' means same thing as a variable in a process,
but different in simulation: 'a' can be reviewed in simulation ModelSim
on every clock like any signal, but cannot be seen if it is a variable.

Weng

Article: 108436
Subject: Re: Performance Appraisals
From: Jerry Avins <jya@ieee.org>
Date: Mon, 11 Sep 2006 09:25:07 -0400
Links: << >> << T >> << A >>

PARTICLEREDDY (STRAYDOG) wrote:
> DONT YOU THINK THIS IS WASTE OF TIME SPENDING TIME ON SOME NON CORE NON
> TECHNOLOGY RELATED MATTER. END UP THIS DISCUSSION SORRY TO SAY THIS..
> BUT I SEE THAT OUR GENIUS PEOPLE IN THIS GROUP ARE SPENDING MUCH AMOUNT
> OF THEIR BRAINS IN THIS KIND OF DISCUSSION..PLEASE DO INVEST YOUR
> EFFORTS IN MORE TECHNOLOGY RELATED THINGS..
> 
> I AM NOT PREACHING ANY..ITS THEIR INTEREST..BUT SEE MANY A QUESTIONS OF
> TRUE TECHNOLOGIES ARE NOT ATTENDED.
> 
> REGARDS
> PARTICLEREDDY.

Stop shouting. Bug off.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Article: 108437
Subject: Re: ddr with multiple users
From: "KJ" <Kevin.Jennings@Unisys.com>
Date: 11 Sep 2006 06:38:59 -0700
Links: << >> << T >> << A >>

Weng Tianxiang wrote:
> KJ wrote:
> > "David Ashley" <dash@nowhere.net.dont.email.me> wrote in message
> > news:4505047b$1_1@x-privat.org...
> > > Weng Tianxiang wrote:
> > >> Hi Daniel,
> > >> Here is my suggestion.
> > >> For example, there are 5 components which have access to DDR controller
> > >> module.
> > >> What I would like to do is:
> > >> 1. Each of 5 components has an output buffer shared by DDR controller
> > >> module;
> > Not sure what is being 'shared'.  If it is the actual DDR output pins then
> > this is problematic....you likely won't be able to meet DDR timing when
> > those DDR signals are coming and spread out to 5 locations instead of just
> > one as it would be with a standard DDR controller.  Even if it did work for
> > 5 it wouldn't scale well either (i.e. 10 users of the DDR).
> >
> > If what is 'shared' is the output from the 5 component that feed in to the
> > input of the DDR controller, than you're talking about internal tri-states
> > which may be a problem depending on which target device is in question.
> >
> > <snip>
> > >> In the command data, you may add any information you like.
> > >> The best benefit of this scheme is it has no delays and no penalty in
> > >> performance, and it has minimum number of buses.
> > You haven't convinced me of any of these points.  Plus how it would address
> > the pecularities of DDRs themselves where there is a definite performance
> > hit for randomly thrashing about in memory has not been addressed.
> > >>
> > >> Weng
> > >>
> > >
> > > Weng,
> > >
> > > Your strategy seems to make sense to me. I don't actually know what a
> > > ring buffer is. Your design seems appropriate for the imbalance built
> > > into the system -- that is, any of the 5 components can initiate a
> > > command at any time, however the DDR controller can only respond
> > > to one command at a time. So you don't need a unique link to each
> > > component for data coming from the DDR.
> > A unique link to an arbitrator though allows each component to 'think' that
> > it is running independently and addressing DDR at the same time.  In other
> > words, all 5 components can start up their own transaction at the exact same
> > time.  The arbitration logic function would buffer up all 5, selecting one
> > of them for output to the DDR.  When reading DDR this might not help
> > performance much but for writing it can be a huge difference.
> >
> > >
> > > However thinking a little more on it, each of the 5 components must
> > > have logic to ignore the data that isn't targeted at themselves. Also
> > > in order to be able to deal with data returned from the DDR at a
> > > later time, perhaps a component might store it in a fifo anyway.
> > >
> > > The approach I had sort of been envisioning involved for each
> > > component you have 2 fifos, one goes for commands and data
> > > from the component to the ddr, and the other is for data coming
> > > back from the ddr. The ddr controller just needs to decide which
> > > component to pull commands from --  round robin would be fine
> > > for my application. If it's a read command, it need only stuff the
> > > returned data in the right fifo.
> > That's one approach.  If you think some more on this you should be able to
> > see a way to have a single fifo for the readback data from the DDR (instead
> > of one per component).
> >
> > KJ
>
> Hi,
> My scheme is not only a strategy, but a finished work. The following is
> more to disclose.
>
> 1. What means sharing between 1 component and DDR controller system is:
> The output fifo of one component are shared by one component and DDR
> controller module, one component uses write half and DDR uses another
> read half.
>
> 2. The output fifo uses the same technique as what I mentioned in the
> previous email:
> command word and data words are mixed, but there are more than that:
> The command word contains either write or read commands.
>
> So in the output fifo, data stream looks like this:
> Read command, address, number of bytes;
> Write command, address, number of bytes;
> Data;
> ...
> Data;
> Write command, address, number of bytes;
> Data;
> ...
> Data;
> Read command, address, number of bytes;
> Read command, address, number of bytes;
> ...
>
> 3. In DDR controller side, there is small logic to pick read commands
> from input command/data stream, then put them into a read command queue
> that is used by DDR module to access read commands. You don't have to
> worry why read command is put behind a write command. For all
> components, if a read command is issued after a write command, the read
> command cannot be executed until write data is fully written into DDR
> system to avoid interfering the write/read order.
>
> 4. The DDR has its output fifo and a different output bus. The output
> fifo plays a buffer that separate coupling between DDR its own
> operations and output function.
>
> DDR read data from DDR memory and put data into its output fifo. There
> is output bus driver that picks up data from the DDR output buffer,
> then put it in output bus in a format that target component likes best.
> Then the output bus is shared by 5 components which read their own
> data, like a wireless communication channel: they only listen and get
> their own data on the output bus, never inteference with others.
>
> 5. All components work at their full speeds.
>
> 6. Arbitor module resides in DDR controller module. It doesn't control
> which component should output data, but it controls which fifo should
> be read first to avoid its fullness and determine how to insert
> commands into DDR command streams that will be sent to DDR chip. In
> that way, all output fifo will work in full speeds according to their
> own rules.
>
> 7. Every component must have a read fifo to store data read from DDR
> output bus. One cannot skip the read fifo, because you must have a
> capability to adjust read speed for each component and read data from
> DDR output bus will disappear after 1 clock.
>
> In short, each component has a write fifo whose read side is used by
> DDR controller and a read fifo that picks data from DDR controller
> output bus.
>
> In the result, the number of wires used for communications between DDR
> controller and all components are dramatically reduced at least by more
> than 100 wires for a 5 component system.
>
> What is the other problem?
>
Weng,

OK, I'm a bit clearer now on what you have now.  What you've described
is (I think) also functionally identical to what I was suggesting
earlier (which is also a working, tested and shipping design).

>From a design reuse standpoint it is not quite as good as what I
suggested though.  A better partioning would be to have the fifos and
control logic in a standalone module.  Each component would talk point
to point with this new module on one side (equivalent to your
components writing commands and data into the fifo).  The function of
this module would be to select (based on whatever arbitration algorithm
is preferable) and output commands over a point to point connection to
a standard DDR Controller (this is equivalent to your DDR Controller
'read' side of the fifo).  This module is essentially the bus
arbitration module.

Whether implemented as a standalone module (as I've done) or embedded
into a customized DDR Controller (as you've done) ends up with the same
functionality and should result in the same logic/resource usage and
result in a working design that can run the DDRs at the best possible
rate.

But in my case, I now have a standalone arbitration module with
standardized interfaces that can be used to arbitrate with totally
different things other than DDRs.  In my case, I instantiated three
arbitrators that connected to three separate DDRs (two with six
masters, one with 12) and a fourth arbitrator that connected 13 bus
masters to a single PCI bus.  No code changes are required, only change
the generics when instantiating the module to essentially 'tune' it to
the particular usage.

One other point:  you probably don't need a read data fifo per
component, you can get away with just one single fifo inside the
arbitration module.  That fifo would not hold the read data but just
the code to tell the arbitrator who to route the read data back to.
The arbitor would write this code into the fifo at the point where it
initiates a read to the DDR controller.  The read data itself could be
broadcast to all components in parallel once it arrives back.  Only one
component though would get the signal flagging that the data was valid
based on a simple decode of  the above mentioned code that the arbitor
put into the small read fifo.  In other words, this fifo would only
need to be wide enough to handle the number of users (i.e. 5 masters
would imply a 3 bit code is needed) and only deep enough to handle
whatever the latency is between initiating a read command to the DDR
controller and when the data actually comes back.

KJ

Article: 108438
Subject: Re: Xilinx ISE ver 8.2.02i is optimizing away and removing "redundant" logic - help!
From: "KJ" <Kevin.Jennings@Unisys.com>
Date: 11 Sep 2006 06:57:31 -0700
Links: << >> << T >> << A >>

Weng Tianxiang wrote:
> >
> > It can be synthesized....it just is highly unlikely to do what you want it
> > to do.
> >
> > KJ
>
> Hi KJ,
> No, I disagree with you about that it would generate a latch.
>
> Actually it is a combinational logic. Through one of Xilinx tools, you
> can check that it just generates combinational logic. That is all.

You're right, it wouldn't be a typical latch but "a<=a+1" is a form of
combinatorial feedback (i.e. there is a combinatorial path from 'a'
back to itself) which while not really a latch is a form that one must
almost always avoid (no 'almost' inside FPGAs though).  In any case
"a<=a+1" would be pretty useless if instantiated in a concurrent area.
What I also said was...

> > I doubt that.  a<= a+1 outside of a clocked process will (at best) produce a
> > counter that increments by one at whatever uncontrolled propogation delay of
> > the device you have

Think about it.  For starters, how does 'a' get initialized to
anything?  Ignoring that for the moment and assuming that 'a' was
somehow magically '0', at some time.  Then the logic would be trying to
update a to be '1' (by virtue of the a<= a+1).   But now a is '1' and
will want to be updated to be '2', and then '3', '4', etc.  All seems
well, it's a counter after all....But since this is not in a clocked
process then 'a' would be changing at whatever propogation delay there
is in computing 'a+1'....which is useless.  In a real device those
outputs probably wouldn't even resemble a counter either.  In any
simulation environment you'll error out with an iteration limit error
because signal 'a' will never settle down (again assuming that it ever
got to be defined in the first place).

>
> Counter a with 'a <= a+1;' means same thing as a variable in a process,
> but different in simulation: 'a' can be reviewed in simulation ModelSim
> on every clock like any signal, but cannot be seen if it is a variable.

What clock?  You said we're in a concurrent statement!

I think what you really mean to say is....

b <= a+1;  -- But this is updating a new signal called 'b', not 'a'.
process(clock)
begin
  if rising_edge(clock) then
    a<=b;
  end if;
end process;

where the 'b<=a+1' is the concurrent statement.

Personally though, I would've written it as

process(clock)
begin
  if rising_edge(clock) then
    a<=a+1;
  end if;
end process;

But in either case, we're talking about the counter being implemented
in a synchronous process, "a<=a+1" in a concurrent statement won't
work.

KJ

Article: 108439
Subject: Lattice eval board with PCIe and SATA
From: "Antti" <Antti.Lukats@xilant.com>
Date: 11 Sep 2006 07:20:10 -0700
Links: << >> << T >> << A >>

Hi

http://www.latticesemi.com/products/developmenthardware/fpgafspcboards/scpciexpressx1evaluationb.cfm

I wonder if that board is available and if it really supports SATA as
it advertized to support?

On the website there is no price information what usually is bad news
regarding actual board availability :(

Antti

Article: 108440
Subject: Re: simplyrisc-s1 free core
From: Andreas Ehliar <ehliar@lysator.liu.se>
Date: Mon, 11 Sep 2006 14:54:46 +0000 (UTC)
Links: << >> << T >> << A >>

On 2006-09-11, Antti <Antti.Lukats@xilant.com> wrote:
> well the only difference is that this is the one large design I would
> like to get going right now. hm.. the other last largish design I also
> have interest is the OpenRisc1000 - that one does not crash but it
> terminates the build saying that 1GB RAM is not enough !

That is rather odd because I have successfully synthesized a system
containing an or1200, an sdram controller, a vga controller and
some other peripherals on a machine with 512MB RAM. Probably using
ISE 7.1 IIRC. (Nowadays I have more memory in the machine though...)

Did you define SYNTHESIZE and the correct XILINX defines in
or1200_defines.v? Otherwise I know that the synthesis will run
for longer than I care to wait...

/Andreas

Article: 108441
Subject: VHDL or Verilog or SystemC?
From: "jetq88" <jetq5188@gmail.com>
Date: 11 Sep 2006 08:09:09 -0700
Links: << >> << T >> << A >>

hello all,

I have  experience with ARM microcontroller in C/C++ programming, now
job function force me to have to expand my skill into FPGA, don't know
anyone out there in the same shoes having experience with this
transition, which Hardware language, VHDL or Verilog HDL or System C or
Handle C  will make my life easier? I guess if you master one, it
should make you quicker to jump into another one, but for me with
limited hardware experience, which one can make this transition much
smoother.

thanks

jet

Article: 108442
Subject: Re: Performance Appraisals
From: "PeteS" <PeterSmith1954@googlemail.com>
Date: 11 Sep 2006 08:14:04 -0700
Links: << >> << T >> << A >>

Jerry Avins wrote:
> PARTICLEREDDY (STRAYDOG) wrote:
> > DONT YOU THINK THIS IS WASTE OF TIME SPENDING TIME ON SOME NON CORE NON
> > TECHNOLOGY RELATED MATTER. END UP THIS DISCUSSION SORRY TO SAY THIS..
> > BUT I SEE THAT OUR GENIUS PEOPLE IN THIS GROUP ARE SPENDING MUCH AMOUNT
> > OF THEIR BRAINS IN THIS KIND OF DISCUSSION..PLEASE DO INVEST YOUR
> > EFFORTS IN MORE TECHNOLOGY RELATED THINGS..
> >
> > I AM NOT PREACHING ANY..ITS THEIR INTEREST..BUT SEE MANY A QUESTIONS OF
> > TRUE TECHNOLOGIES ARE NOT ATTENDED.
> >
> > REGARDS
> > PARTICLEREDDY.


>
> Stop shouting. Bug off.
>


> Jerry
> --
> Engineering is the art of making what you want from things you can get.
> =AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=
=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=
=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF=AF

And besides, all work and no play makes a dull person (and a dull
employee too ;)

Cheers

PeteS

Article: 108443
Subject: Re: simplyrisc-s1 free core
From: "Antti" <Antti.Lukats@xilant.com>
Date: 11 Sep 2006 08:41:42 -0700
Links: << >> << T >> << A >>

Andreas Ehliar schrieb:

> On 2006-09-11, Antti <Antti.Lukats@xilant.com> wrote:
> > well the only difference is that this is the one large design I would
> > like to get going right now. hm.. the other last largish design I also
> > have interest is the OpenRisc1000 - that one does not crash but it
> > terminates the build saying that 1GB RAM is not enough !
>
> That is rather odd because I have successfully synthesized a system
> containing an or1200, an sdram controller, a vga controller and
> some other peripherals on a machine with 512MB RAM. Probably using
> ISE 7.1 IIRC. (Nowadays I have more memory in the machine though...)
>
> Did you define SYNTHESIZE and the correct XILINX defines in
> or1200_defines.v? Otherwise I know that the synthesis will run
> for longer than I care to wait...
>
> /Andreas

?
I did take the ORP thing from opencores and have tried to define
everthing out
to make the minimal possible system, but all I get is

ERROR:Portability:3 - This Xilinx application has run out of memory or
has encountered a memory conflict.  Current memory usage is 2091920 kb.

BTW there is no SYNTHESIZE thing in the or1200_defines.v :(
there is something about xilinx memories and the synthese really
uses xilinx RAMB16 prims, but what else to check I dont know.

it simply runs out of memory, well its 8.1 SP3, maybe it works on 7.1
??

Antti

Article: 108444
Subject: Re: ddr with multiple users
From: "Weng Tianxiang" <wtxwtx@gmail.com>
Date: 11 Sep 2006 08:55:37 -0700
Links: << >> << T >> << A >>


KJ wrote:
> Weng Tianxiang wrote:
> > KJ wrote:
> > > "David Ashley" <dash@nowhere.net.dont.email.me> wrote in message
> > > news:4505047b$1_1@x-privat.org...
> > > > Weng Tianxiang wrote:
> > > >> Hi Daniel,
> > > >> Here is my suggestion.
> > > >> For example, there are 5 components which have access to DDR controller
> > > >> module.
> > > >> What I would like to do is:
> > > >> 1. Each of 5 components has an output buffer shared by DDR controller
> > > >> module;
> > > Not sure what is being 'shared'.  If it is the actual DDR output pins then
> > > this is problematic....you likely won't be able to meet DDR timing when
> > > those DDR signals are coming and spread out to 5 locations instead of just
> > > one as it would be with a standard DDR controller.  Even if it did work for
> > > 5 it wouldn't scale well either (i.e. 10 users of the DDR).
> > >
> > > If what is 'shared' is the output from the 5 component that feed in to the
> > > input of the DDR controller, than you're talking about internal tri-states
> > > which may be a problem depending on which target device is in question.
> > >
> > > <snip>
> > > >> In the command data, you may add any information you like.
> > > >> The best benefit of this scheme is it has no delays and no penalty in
> > > >> performance, and it has minimum number of buses.
> > > You haven't convinced me of any of these points.  Plus how it would address
> > > the pecularities of DDRs themselves where there is a definite performance
> > > hit for randomly thrashing about in memory has not been addressed.
> > > >>
> > > >> Weng
> > > >>
> > > >
> > > > Weng,
> > > >
> > > > Your strategy seems to make sense to me. I don't actually know what a
> > > > ring buffer is. Your design seems appropriate for the imbalance built
> > > > into the system -- that is, any of the 5 components can initiate a
> > > > command at any time, however the DDR controller can only respond
> > > > to one command at a time. So you don't need a unique link to each
> > > > component for data coming from the DDR.
> > > A unique link to an arbitrator though allows each component to 'think' that
> > > it is running independently and addressing DDR at the same time.  In other
> > > words, all 5 components can start up their own transaction at the exact same
> > > time.  The arbitration logic function would buffer up all 5, selecting one
> > > of them for output to the DDR.  When reading DDR this might not help
> > > performance much but for writing it can be a huge difference.
> > >
> > > >
> > > > However thinking a little more on it, each of the 5 components must
> > > > have logic to ignore the data that isn't targeted at themselves. Also
> > > > in order to be able to deal with data returned from the DDR at a
> > > > later time, perhaps a component might store it in a fifo anyway.
> > > >
> > > > The approach I had sort of been envisioning involved for each
> > > > component you have 2 fifos, one goes for commands and data
> > > > from the component to the ddr, and the other is for data coming
> > > > back from the ddr. The ddr controller just needs to decide which
> > > > component to pull commands from --  round robin would be fine
> > > > for my application. If it's a read command, it need only stuff the
> > > > returned data in the right fifo.
> > > That's one approach.  If you think some more on this you should be able to
> > > see a way to have a single fifo for the readback data from the DDR (instead
> > > of one per component).
> > >
> > > KJ
> >
> > Hi,
> > My scheme is not only a strategy, but a finished work. The following is
> > more to disclose.
> >
> > 1. What means sharing between 1 component and DDR controller system is:
> > The output fifo of one component are shared by one component and DDR
> > controller module, one component uses write half and DDR uses another
> > read half.
> >
> > 2. The output fifo uses the same technique as what I mentioned in the
> > previous email:
> > command word and data words are mixed, but there are more than that:
> > The command word contains either write or read commands.
> >
> > So in the output fifo, data stream looks like this:
> > Read command, address, number of bytes;
> > Write command, address, number of bytes;
> > Data;
> > ...
> > Data;
> > Write command, address, number of bytes;
> > Data;
> > ...
> > Data;
> > Read command, address, number of bytes;
> > Read command, address, number of bytes;
> > ...
> >
> > 3. In DDR controller side, there is small logic to pick read commands
> > from input command/data stream, then put them into a read command queue
> > that is used by DDR module to access read commands. You don't have to
> > worry why read command is put behind a write command. For all
> > components, if a read command is issued after a write command, the read
> > command cannot be executed until write data is fully written into DDR
> > system to avoid interfering the write/read order.
> >
> > 4. The DDR has its output fifo and a different output bus. The output
> > fifo plays a buffer that separate coupling between DDR its own
> > operations and output function.
> >
> > DDR read data from DDR memory and put data into its output fifo. There
> > is output bus driver that picks up data from the DDR output buffer,
> > then put it in output bus in a format that target component likes best.
> > Then the output bus is shared by 5 components which read their own
> > data, like a wireless communication channel: they only listen and get
> > their own data on the output bus, never inteference with others.
> >
> > 5. All components work at their full speeds.
> >
> > 6. Arbitor module resides in DDR controller module. It doesn't control
> > which component should output data, but it controls which fifo should
> > be read first to avoid its fullness and determine how to insert
> > commands into DDR command streams that will be sent to DDR chip. In
> > that way, all output fifo will work in full speeds according to their
> > own rules.
> >
> > 7. Every component must have a read fifo to store data read from DDR
> > output bus. One cannot skip the read fifo, because you must have a
> > capability to adjust read speed for each component and read data from
> > DDR output bus will disappear after 1 clock.
> >
> > In short, each component has a write fifo whose read side is used by
> > DDR controller and a read fifo that picks data from DDR controller
> > output bus.
> >
> > In the result, the number of wires used for communications between DDR
> > controller and all components are dramatically reduced at least by more
> > than 100 wires for a 5 component system.
> >
> > What is the other problem?
> >
> Weng,
>
> OK, I'm a bit clearer now on what you have now.  What you've described
> is (I think) also functionally identical to what I was suggesting
> earlier (which is also a working, tested and shipping design).
>
> >From a design reuse standpoint it is not quite as good as what I
> suggested though.  A better partioning would be to have the fifos and
> control logic in a standalone module.  Each component would talk point
> to point with this new module on one side (equivalent to your
> components writing commands and data into the fifo).  The function of
> this module would be to select (based on whatever arbitration algorithm
> is preferable) and output commands over a point to point connection to
> a standard DDR Controller (this is equivalent to your DDR Controller
> 'read' side of the fifo).  This module is essentially the bus
> arbitration module.
>
> Whether implemented as a standalone module (as I've done) or embedded
> into a customized DDR Controller (as you've done) ends up with the same
> functionality and should result in the same logic/resource usage and
> result in a working design that can run the DDRs at the best possible
> rate.
>
> But in my case, I now have a standalone arbitration module with
> standardized interfaces that can be used to arbitrate with totally
> different things other than DDRs.  In my case, I instantiated three
> arbitrators that connected to three separate DDRs (two with six
> masters, one with 12) and a fourth arbitrator that connected 13 bus
> masters to a single PCI bus.  No code changes are required, only change
> the generics when instantiating the module to essentially 'tune' it to
> the particular usage.
>
> One other point:  you probably don't need a read data fifo per
> component, you can get away with just one single fifo inside the
> arbitration module.  That fifo would not hold the read data but just
> the code to tell the arbitrator who to route the read data back to.
> The arbitor would write this code into the fifo at the point where it
> initiates a read to the DDR controller.  The read data itself could be
> broadcast to all components in parallel once it arrives back.  Only one
> component though would get the signal flagging that the data was valid
> based on a simple decode of  the above mentioned code that the arbitor
> put into the small read fifo.  In other words, this fifo would only
> need to be wide enough to handle the number of users (i.e. 5 masters
> would imply a 3 bit code is needed) and only deep enough to handle
> whatever the latency is between initiating a read command to the DDR
> controller and when the data actually comes back.
>
> KJ

Hi KJ,
1. My design never use module design methodology. I use a big file to
contain all logic statements except modules from Xilinx core.

If a segment is to be used for other project, just a copy and paste to
do the same things as module methodology does, but all signal names
never change cross all function modules.

2. Individual read fifo is needed for each component. The reason is
issuing a read command and the data read back are not synchronous and
one must have its own read fifo to store its own read data. After
reading data falled into its read fifo, each components can decide what
next to do on its own situation.

If only one read buffer is used, big problems would arise. For example,
if you have PCI-x/PCI bus, if their modules have read data, they cannot
immediately transfer the read data until they get PCI-x/PCI bus
control. That process may last very long, for example 1K clocks,
causing other read data blocked by one read buffer design.

3. Strategically, by using my method one has a great flexibility to do
anything you want in the fastest speed and with minimum wire
connections among DDR controller and all components.

Actually in my design there is no arbitor, because there is no common
bus to arbitrate. There is onle write-fifo select logic to decide which
write fifo should be picked first to write its data into DDR chip,
based on many factors, not only because one write fifo has data.

The many write factors include:
a. write priority;
b. write address if it falls into the same bank+column of current write
command;
c. if write fifo is approaching to be full, depending on the source
date input rate;
d. ...

4. Different components have different priority to access to DDR
controller. You may imagine, for example, there are 2 PowerPC, one
PCI-e, one PCI-x, one Gigabit stream. You may put priority table as
like this to handle read commands:
a. two PowerPC has top priority and they have equal rights to access
DDR;
b. PCI-e may the lowest one in priority, because it is a package
protocol, any delays do few damages to the performance if any.
c. ...

Weng

Article: 108445
Subject: Re: Xilinx ISE ver 8.2.02i is optimizing away and removing "redundant" logic - help!
From: "Weng Tianxiang" <wtxwtx@gmail.com>
Date: 11 Sep 2006 09:01:29 -0700
Links: << >> << T >> << A >>


David Ashley wrote:
> Weng Tianxiang wrote:
> > Never spend time doing post-map simulation;
> > Never spend time using DOS command lines;
> > Never spend time turning off Xilinx's optimization;
>
> Weng,
>
> Can you clarify the 2nd one about "DOS command lines"?
> I'm using xilinx webpack tools under linux, operating
> from the command line. Actually I've built up a Makefile
> that invokes the commands. Is there some gotcha I need
> to know about? I prefer command line tools operated by
> "make" as opposed to IDE's.
>
> Below's the important pieces of the Makefile. The commands
> I got from the pacman source build script, converted to unix
> make syntax. Works fine.
>
> -Dave
>
>
> XILINX=/Xilinx
> NAME=main
> SETUP=LD_LIBRARY_PATH=$(XILINX)/bin/lin XILINX=$(XILINX) \
> 		PATH=$(PATH):$(XILINX)/bin/lin
>
>
> bitfile:  step0 step1 step2 step3 step4 step5
>
> step0:
> 	$(SETUP) xst -ifn $(NAME).scr -ofn $(NAME).srp
> step1:
> 	$(SETUP) ngdbuild -nt on -uc $(NAME).ucf $(NAME).ngc $(NAME).ngd
> step2:
> 	$(SETUP) map -pr b $(NAME).ngd -o $(NAME).ncd $(NAME).pcf
> step3:
> 	$(SETUP) par -w -ol high $(NAME).ncd $(NAME).ncd $(NAME).pcf
> step4:
> 	$(SETUP) trce -v 10 -o $(NAME).twr $(NAME).ncd $(NAME).pcf
> step5:
> 	$(SETUP) bitgen $(NAME).ncd $(NAME).bit -w #-f $(NAME).ut
> hwtest:
> 	sudo xc3sprog $(NAME).bit
>
> -----
> main.scr contains this:
>
> run
> -ifn main.prj
> -ifmt VHDL
> -ofn main.ngc
> -ofmt NGC -p XC3S500E-FG320-4
> -opt_mode Area
> -opt_level 2
>
> ------
> main.prj just lists the vhd source files.
>
>
>
>
>
> --
> David Ashley                http://www.xdr.com/dash
> Embedded linux, device drivers, system architecture

Hi David,
I never use DOS commands and all options are accessable through Xilinx
ISE window system so that I don't know how to answer any questions
about it.

Weng

Article: 108446
Subject: Re: Xilinx ISE ver 8.2.02i is optimizing away and removing "redundant" logic - help!
From: "Weng Tianxiang" <wtxwtx@gmail.com>
Date: 11 Sep 2006 09:06:06 -0700
Links: << >> << T >> << A >>


KJ wrote:
> Weng Tianxiang wrote:
> > >
> > > It can be synthesized....it just is highly unlikely to do what you want it
> > > to do.
> > >
> > > KJ
> >
> > Hi KJ,
> > No, I disagree with you about that it would generate a latch.
> >
> > Actually it is a combinational logic. Through one of Xilinx tools, you
> > can check that it just generates combinational logic. That is all.
>
> You're right, it wouldn't be a typical latch but "a<=a+1" is a form of
> combinatorial feedback (i.e. there is a combinatorial path from 'a'
> back to itself) which while not really a latch is a form that one must
> almost always avoid (no 'almost' inside FPGAs though).  In any case
> "a<=a+1" would be pretty useless if instantiated in a concurrent area.
> What I also said was...
>
> > > I doubt that.  a<= a+1 outside of a clocked process will (at best) produce a
> > > counter that increments by one at whatever uncontrolled propogation delay of
> > > the device you have
>
> Think about it.  For starters, how does 'a' get initialized to
> anything?  Ignoring that for the moment and assuming that 'a' was
> somehow magically '0', at some time.  Then the logic would be trying to
> update a to be '1' (by virtue of the a<= a+1).   But now a is '1' and
> will want to be updated to be '2', and then '3', '4', etc.  All seems
> well, it's a counter after all....But since this is not in a clocked
> process then 'a' would be changing at whatever propogation delay there
> is in computing 'a+1'....which is useless.  In a real device those
> outputs probably wouldn't even resemble a counter either.  In any
> simulation environment you'll error out with an iteration limit error
> because signal 'a' will never settle down (again assuming that it ever
> got to be defined in the first place).
>
> >
> > Counter a with 'a <= a+1;' means same thing as a variable in a process,
> > but different in simulation: 'a' can be reviewed in simulation ModelSim
> > on every clock like any signal, but cannot be seen if it is a variable.
>
> What clock?  You said we're in a concurrent statement!
>
> I think what you really mean to say is....
>
> b <= a+1;  -- But this is updating a new signal called 'b', not 'a'.
> process(clock)
> begin
>   if rising_edge(clock) then
>     a<=b;
>   end if;
> end process;
>
> where the 'b<=a+1' is the concurrent statement.
>
> Personally though, I would've written it as
>
> process(clock)
> begin
>   if rising_edge(clock) then
>     a<=a+1;
>   end if;
> end process;
>
> But in either case, we're talking about the counter being implemented
> in a synchronous process, "a<=a+1" in a concurrent statement won't
> work.
>
> KJ

Hi KJ,
 b <= a+1;  -- But this is updating a new signal called 'b', not 'a'.
process(clock)
 begin
   if rising_edge(clock) then
     a<=b;
   end if;
 end process;
 
 where the 'b<=a+1' is the concurrent statement.

You are right.

Weng

Article: 108447
Subject: Re: ddr with multiple users
From: "Weng Tianxiang" <wtxwtx@gmail.com>
Date: 11 Sep 2006 09:25:23 -0700
Links: << >> << T >> << A >>


Weng Tianxiang wrote:
> KJ wrote:
> > Weng Tianxiang wrote:
> > > KJ wrote:
> > > > "David Ashley" <dash@nowhere.net.dont.email.me> wrote in message
> > > > news:4505047b$1_1@x-privat.org...
> > > > > Weng Tianxiang wrote:
> > > > >> Hi Daniel,
> > > > >> Here is my suggestion.
> > > > >> For example, there are 5 components which have access to DDR controller
> > > > >> module.
> > > > >> What I would like to do is:
> > > > >> 1. Each of 5 components has an output buffer shared by DDR controller
> > > > >> module;
> > > > Not sure what is being 'shared'.  If it is the actual DDR output pins then
> > > > this is problematic....you likely won't be able to meet DDR timing when
> > > > those DDR signals are coming and spread out to 5 locations instead of just
> > > > one as it would be with a standard DDR controller.  Even if it did work for
> > > > 5 it wouldn't scale well either (i.e. 10 users of the DDR).
> > > >
> > > > If what is 'shared' is the output from the 5 component that feed in to the
> > > > input of the DDR controller, than you're talking about internal tri-states
> > > > which may be a problem depending on which target device is in question.
> > > >
> > > > <snip>
> > > > >> In the command data, you may add any information you like.
> > > > >> The best benefit of this scheme is it has no delays and no penalty in
> > > > >> performance, and it has minimum number of buses.
> > > > You haven't convinced me of any of these points.  Plus how it would address
> > > > the pecularities of DDRs themselves where there is a definite performance
> > > > hit for randomly thrashing about in memory has not been addressed.
> > > > >>
> > > > >> Weng
> > > > >>
> > > > >
> > > > > Weng,
> > > > >
> > > > > Your strategy seems to make sense to me. I don't actually know what a
> > > > > ring buffer is. Your design seems appropriate for the imbalance built
> > > > > into the system -- that is, any of the 5 components can initiate a
> > > > > command at any time, however the DDR controller can only respond
> > > > > to one command at a time. So you don't need a unique link to each
> > > > > component for data coming from the DDR.
> > > > A unique link to an arbitrator though allows each component to 'think' that
> > > > it is running independently and addressing DDR at the same time.  In other
> > > > words, all 5 components can start up their own transaction at the exact same
> > > > time.  The arbitration logic function would buffer up all 5, selecting one
> > > > of them for output to the DDR.  When reading DDR this might not help
> > > > performance much but for writing it can be a huge difference.
> > > >
> > > > >
> > > > > However thinking a little more on it, each of the 5 components must
> > > > > have logic to ignore the data that isn't targeted at themselves. Also
> > > > > in order to be able to deal with data returned from the DDR at a
> > > > > later time, perhaps a component might store it in a fifo anyway.
> > > > >
> > > > > The approach I had sort of been envisioning involved for each
> > > > > component you have 2 fifos, one goes for commands and data
> > > > > from the component to the ddr, and the other is for data coming
> > > > > back from the ddr. The ddr controller just needs to decide which
> > > > > component to pull commands from --  round robin would be fine
> > > > > for my application. If it's a read command, it need only stuff the
> > > > > returned data in the right fifo.
> > > > That's one approach.  If you think some more on this you should be able to
> > > > see a way to have a single fifo for the readback data from the DDR (instead
> > > > of one per component).
> > > >
> > > > KJ
> > >
> > > Hi,
> > > My scheme is not only a strategy, but a finished work. The following is
> > > more to disclose.
> > >
> > > 1. What means sharing between 1 component and DDR controller system is:
> > > The output fifo of one component are shared by one component and DDR
> > > controller module, one component uses write half and DDR uses another
> > > read half.
> > >
> > > 2. The output fifo uses the same technique as what I mentioned in the
> > > previous email:
> > > command word and data words are mixed, but there are more than that:
> > > The command word contains either write or read commands.
> > >
> > > So in the output fifo, data stream looks like this:
> > > Read command, address, number of bytes;
> > > Write command, address, number of bytes;
> > > Data;
> > > ...
> > > Data;
> > > Write command, address, number of bytes;
> > > Data;
> > > ...
> > > Data;
> > > Read command, address, number of bytes;
> > > Read command, address, number of bytes;
> > > ...
> > >
> > > 3. In DDR controller side, there is small logic to pick read commands
> > > from input command/data stream, then put them into a read command queue
> > > that is used by DDR module to access read commands. You don't have to
> > > worry why read command is put behind a write command. For all
> > > components, if a read command is issued after a write command, the read
> > > command cannot be executed until write data is fully written into DDR
> > > system to avoid interfering the write/read order.
> > >
> > > 4. The DDR has its output fifo and a different output bus. The output
> > > fifo plays a buffer that separate coupling between DDR its own
> > > operations and output function.
> > >
> > > DDR read data from DDR memory and put data into its output fifo. There
> > > is output bus driver that picks up data from the DDR output buffer,
> > > then put it in output bus in a format that target component likes best.
> > > Then the output bus is shared by 5 components which read their own
> > > data, like a wireless communication channel: they only listen and get
> > > their own data on the output bus, never inteference with others.
> > >
> > > 5. All components work at their full speeds.
> > >
> > > 6. Arbitor module resides in DDR controller module. It doesn't control
> > > which component should output data, but it controls which fifo should
> > > be read first to avoid its fullness and determine how to insert
> > > commands into DDR command streams that will be sent to DDR chip. In
> > > that way, all output fifo will work in full speeds according to their
> > > own rules.
> > >
> > > 7. Every component must have a read fifo to store data read from DDR
> > > output bus. One cannot skip the read fifo, because you must have a
> > > capability to adjust read speed for each component and read data from
> > > DDR output bus will disappear after 1 clock.
> > >
> > > In short, each component has a write fifo whose read side is used by
> > > DDR controller and a read fifo that picks data from DDR controller
> > > output bus.
> > >
> > > In the result, the number of wires used for communications between DDR
> > > controller and all components are dramatically reduced at least by more
> > > than 100 wires for a 5 component system.
> > >
> > > What is the other problem?
> > >
> > Weng,
> >
> > OK, I'm a bit clearer now on what you have now.  What you've described
> > is (I think) also functionally identical to what I was suggesting
> > earlier (which is also a working, tested and shipping design).
> >
> > >From a design reuse standpoint it is not quite as good as what I
> > suggested though.  A better partioning would be to have the fifos and
> > control logic in a standalone module.  Each component would talk point
> > to point with this new module on one side (equivalent to your
> > components writing commands and data into the fifo).  The function of
> > this module would be to select (based on whatever arbitration algorithm
> > is preferable) and output commands over a point to point connection to
> > a standard DDR Controller (this is equivalent to your DDR Controller
> > 'read' side of the fifo).  This module is essentially the bus
> > arbitration module.
> >
> > Whether implemented as a standalone module (as I've done) or embedded
> > into a customized DDR Controller (as you've done) ends up with the same
> > functionality and should result in the same logic/resource usage and
> > result in a working design that can run the DDRs at the best possible
> > rate.
> >
> > But in my case, I now have a standalone arbitration module with
> > standardized interfaces that can be used to arbitrate with totally
> > different things other than DDRs.  In my case, I instantiated three
> > arbitrators that connected to three separate DDRs (two with six
> > masters, one with 12) and a fourth arbitrator that connected 13 bus
> > masters to a single PCI bus.  No code changes are required, only change
> > the generics when instantiating the module to essentially 'tune' it to
> > the particular usage.
> >
> > One other point:  you probably don't need a read data fifo per
> > component, you can get away with just one single fifo inside the
> > arbitration module.  That fifo would not hold the read data but just
> > the code to tell the arbitrator who to route the read data back to.
> > The arbitor would write this code into the fifo at the point where it
> > initiates a read to the DDR controller.  The read data itself could be
> > broadcast to all components in parallel once it arrives back.  Only one
> > component though would get the signal flagging that the data was valid
> > based on a simple decode of  the above mentioned code that the arbitor
> > put into the small read fifo.  In other words, this fifo would only
> > need to be wide enough to handle the number of users (i.e. 5 masters
> > would imply a 3 bit code is needed) and only deep enough to handle
> > whatever the latency is between initiating a read command to the DDR
> > controller and when the data actually comes back.
> >
> > KJ
>
> Hi KJ,
> 1. My design never use module design methodology. I use a big file to
> contain all logic statements except modules from Xilinx core.
>
> If a segment is to be used for other project, just a copy and paste to
> do the same things as module methodology does, but all signal names
> never change cross all function modules.
>
> 2. Individual read fifo is needed for each component. The reason is
> issuing a read command and the data read back are not synchronous and
> one must have its own read fifo to store its own read data. After
> reading data falled into its read fifo, each components can decide what
> next to do on its own situation.
>
> If only one read buffer is used, big problems would arise. For example,
> if you have PCI-x/PCI bus, if their modules have read data, they cannot
> immediately transfer the read data until they get PCI-x/PCI bus
> control. That process may last very long, for example 1K clocks,
> causing other read data blocked by one read buffer design.
>
> 3. Strategically, by using my method one has a great flexibility to do
> anything you want in the fastest speed and with minimum wire
> connections among DDR controller and all components.
>
> Actually in my design there is no arbitor, because there is no common
> bus to arbitrate. There is onle write-fifo select logic to decide which
> write fifo should be picked first to write its data into DDR chip,
> based on many factors, not only because one write fifo has data.
>
> The many write factors include:
> a. write priority;
> b. write address if it falls into the same bank+column of current write
> command;
> c. if write fifo is approaching to be full, depending on the source
> date input rate;
> d. ...
>
> 4. Different components have different priority to access to DDR
> controller. You may imagine, for example, there are 2 PowerPC, one
> PCI-e, one PCI-x, one Gigabit stream. You may put priority table as
> like this to handle read commands:
> a. two PowerPC has top priority and they have equal rights to access
> DDR;
> b. PCI-e may the lowest one in priority, because it is a package
> protocol, any delays do few damages to the performance if any.
> c. ...
>
> Weng

Hi KJ,
If you like, please put your module interface in the group and I would
like to indicate which wires are redundent if my design was
implemented.

"In my case, I instantiated three
arbitrators that connected to three separate DDRs (two with six
masters, one with 12) and a fourth arbitrator that connected 13 bus
masters to a single PCI bus."

What you did is to expand PCI bus arbitor idea to DDR input bus.
In my design DDR doesn't need a bus arbitor at all. All components
connected with a DDR controller have no common bus to share and they
provide the best performance over yours. So from this point of view, my
DDR controller interface has nothing common with yours. Both work, but
in different strategies.

My strategy is more complex than yours, but with best performance. It
saves a middle write fifo for DDR controller: DDR controller has no
special write fifo, it uses all component write fifo as its write fifo,
saving clocks and memory space, getting best performance for DDR
controller.

Weng

Article: 108448
Subject: Re: VHDL or Verilog or SystemC?
From: Tim Wescott <tim@seemywebsite.com>
Date: Mon, 11 Sep 2006 09:28:33 -0700
Links: << >> << T >> << A >>

jetq88 wrote:
> hello all,
> 
> I have  experience with ARM microcontroller in C/C++ programming, now
> job function force me to have to expand my skill into FPGA, don't know
> anyone out there in the same shoes having experience with this
> transition, which Hardware language, VHDL or Verilog HDL or System C or
> Handle C  will make my life easier? I guess if you master one, it
> should make you quicker to jump into another one, but for me with
> limited hardware experience, which one can make this transition much
> smoother.
> 
> thanks
> 
> jet
> 
If your job is forcing you to expand, I'd say you should use the 
language that's prevalent in your company.  AFAIK the choice between 
Verilog and VHDL is somewhat arbitrary -- most people I know who favor 
one over the other choose which ever one they learned first.  VHDL seems 
to be more amenable to structured design -- but I can't say because I 
only know Verilog.

What I _can_ say is that if you fall into a group of ten FPGA folks who 
all use language A and you only know language B, you should learn 
language A quick, because you're not going to be productive using the 
'wrong' thing.

As for System C or Handel C -- I'm a luddite, so I view them with deep 
suspicion.  I read magazine articles about them being used successfully, 
but I have yet to see widespread adoption, particularly in the FPGA world.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Posting from Google?  See http://cfaj.freeshell.org/google/

"Applied Control Theory for Embedded Systems" came out in April.
See details at http://www.wescottdesign.com/actfes/actfes.html

Article: 108449
Subject: Re: ddr with multiple users
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: 11 Sep 2006 19:02:17 +0200
Links: << >> << T >> << A >>

Weng Tianxiang wrote:
><big cut>
> 1. My design never use module design methodology. I use a big file to
> contain all logic statements except modules from Xilinx core.
> 
> If a segment is to be used for other project, just a copy and paste to
> do the same things as module methodology does, but all signal names
> never change cross all function modules.

This is an interesting point. I just finished "VHDL for Logic Synthesis"
by Andrew Rushton, a book recommended by earlier post a few weeks
ago so I bought a copy. Rushton goes to great pains to say multiple
times:

"The natural form of hierarchy in VHDL, at least when it is used for RTL
design, is the component. Do not be tempted to use subprograms as a
form of hierarchical design! Any entity/architecture pair can be used
as a component in a higher level architecture. Thus, complex circuits
can be built up in stages from lower level components."

I was convinced by his arguments + examples. I'd think having a
modular component approach wouldn't harm you, because during
synthesis redundant interfaces + wires + logic would likely get
optimized away. So the overriding factor is choosing which is easiest
to implement, understand, maintain, share, etc. IE human factors.

Having said that as a 'c' programmer I almost never create libraries.
I have source code that does what I want, for a specific task. Later
if I have to do something similiar, I go look at what I've already done
and copy sections of code out as needed. Perfect example is the
Berkeley Sockets layer, the library calls are so obscure all you want
to do is cut and paste something you managed to get working
before, to do the same thing again...Alternative would be to wrap
the sockets interface in something else, supposedly simpler. But
then it wouldn't have all the functionality...

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search