Messages from 107075

Article: 107075
Subject: Re: Why No Process Shrink On Prior FPGA Devices ?
From: "Antti" <Antti.Lukats@xilant.com>
Date: 24 Aug 2006 07:55:19 -0700
Links: << >> << T >> << A >>

tweed_deluxe schrieb:

> I'm wondering what intrinsic ecomomic, technical, or "other" barriers
> have precluded FPGA device vendors from taking this step.   In other
> words, why are there no advertised, periodic refreshes of older
> generation FPGA devices.

it's just virtually impossible.

if you make a process schrink on some MCU for example after
the schrink the datasheet can remain almost the same with minor
changes.

now if we would technology shrink some FPGA family then the amount
of work to be done for new 'characterization' of the silicon is
enourmous.

and if you know the mask set pricing then you can easily understand
that this is not an option for any FPGA vendor.

some pincompat could be achived between families, maybe, but
if you want to have V2Pro in 'shrinked' technology (eg cheaper) then
it want happen. 

Antti
http://xilant.com

Article: 107076
Subject: Modelsim XE problem with Xilinx ISE 8.1i and 8.2i
From: "Dan K" <danielgkNOSPAM@visi.com>
Date: Thu, 24 Aug 2006 09:59:56 -0500
Links: << >> << T >> << A >>

This is an all VHDL design.
Modelsim XE is installed as full VHDL.
Design uses a number of block rams built with Xilinx coregen, which produces 
the VDHL and Verilog files along with a bunch of other files.

Here's the problem that started showing up with ISE 8.1i webpack and is 
still there with ISE 8.2 full (not the webpack):

When I first fire up the simulation, Modelsim sees the verilog files and 
puts them in the Modelsim work workspace (thinking that they are part of the 
design I would guess).  Then it proceeds to compile a bunch of stuff and 
then fails because Modelsim XE does not support a mixed design with both 
VHDL and Verilog.  If I then remove the verilog files from the project, and 
delete them out of the Modelsim work workspace, the next time I fire up 
Modelsim everything runs fine. - If the fix was just to delete the verilog 
files I could live with that, but there's more...

If I delete the verilog files before Modelsim has EVER run on this project, 
Modelsim fails immediately saying the verilog files are missing from the 
project directory and refuses to do anything until they are put back in the 
project directory.  So the verilog files have to be there so Modelsim can 
run the first time (which does a bunch of stuff and then errors out for a 
mixed design) and then the verilog files have to be removed so Modelsim will 
run correctly.

Can someone tell me what's going on?  This problem never showed up until 
Xilinx ISE 8.1 (which I was running with Modelsim XE 5.5) and its still 
there with my new upgrade to Xilinx ISE 8.2i and Modelsim XE 6.1e.  Note 
that I was also running Modelsim XE 5.5 when I was back at Xilinx ISE 6.2i 
and the problem was not present then.

Thanks

Dan

Article: 107077
Subject: Re: ISERDES strange simulation behaviour
From: "Antti" <Antti.Lukats@xilant.com>
Date: 24 Aug 2006 08:03:03 -0700
Links: << >> << T >> << A >>

GaLaKtIkUs=99 schrieb:

> In the Virtex-4 user guide (ug070.pdf p.365 table 8-4) it is clearly
> indicated that for INTERFACE_TYPE=3DNETWOKING and DATA_RATE=3DSDR the
> latency should be 2 CLKDIV clock periods.
> I instantiated an ISERDES of DATA_WIDTH=3D6 but I see that valid output
> appears on the next CLKDIV rizing edge.
> Any explanations?
>
> Merci d'avance!

advice: dont belive the simulator, its not always correct.
place the iserdes and chipscope ILA into dummy toplevel, load some FPGA
and look what happens in real silicon.

Antti

Article: 107078
Subject: Re: Why No Process Shrink On Prior FPGA Devices ?
From: "Brannon" <brannonking@yahoo.com>
Date: 24 Aug 2006 08:24:22 -0700
Links: << >> << T >> << A >>


> now if we would technology shrink some FPGA family then the amount
> of work to be done for new 'characterization' of the silicon is
> enourmous.
>
> and if you know the mask set pricing then you can easily understand
> that this is not an option for any FPGA vendor.

So let me conclude this to make sure I understand:

1. You cannot shrink FPGAs because when you shrink them, you have to
update all the design PAR files to match the new timing. 2.
Characterizing the timing on these internal lines is a pain in the
butt. 3. Hence, nobody wants to invest that money when they could be
spending their time getting the timing right on their latest designs.

Sound right?

Article: 107079
Subject: Re: Xilinx Floorplanner
From: "Brad Smallridge" <bradsmallridge@dslextreme.com>
Date: Thu, 24 Aug 2006 08:31:18 -0700
Links: << >> << T >> << A >>


> The strategy is more art than science.

That's what I was expecting.

> It is sort of like putting together a puzzle which has many possible 
> solutions.  You should start with a block diagram of your design, grouping 
> the pieces together on paper to minimize the lengths of critical 
> interconnect.  Then you use that as a guide to placing the pieces.  It 
> helps tremendously to do the floorplanning hierarchically rather than 
> attempting it on a flat design, as it is far easier to optimize small 
> pieces and then place them in the larger design than it is to optimize the 
> whole thing at once. Unfortunately, the floorplanner is not all that 
> hierarchical.  You can use the hierarchy browser to work on the design 
> more or less hierarchically.

It seemed that what I was looking at was completely flattened, part of the
mapping process? The only thing that was grouped was the carry chains for
some of the counters in the design. Everything else was in top.  Should
I be adding constraints to my vhdl code instead of using the floorplanner?

>  Basically you want critical connections to be short, preferably in the 
> same row/column for source and destination.  Carry chains are typically 
> the long pole in the tent, so you want flip-flops outputting to carry 
> chain logic located in close proximity to the carry chain.  Best bet is to 
> play with it with some small designs to get a feel for how it works and to 
> start learning some placement strategies.


>> What's the difference between the post place and the post map 
>> floorplanner
>> as shown on the process pane of the ISE 7.1 ?
> Post map floorplanner only shows the stuff you manually placed.  The rest 
> of the design is not yet placed, so that information is not shown. The 
> post PAR floorplanner brings up an additional pane that shows the actual 
> placement of all of the elements in the design.  You can use the post PAR 
> floorplanner to tweak an automatic placement in order to improve the 
> timing.
>
>>
>> How do I add registers to allow a bus to transverse across the chip and
>> not have the synth tool pack the registers into an SRL16?
> It has to be done in your RTL of course.

I still don't know what RTL is.

> The easiest way to prevent SRL16 inference is to put a reset on the 
> flip-flops.

That's clever.

>  You can also do it by putting syn_keeps on the signal between each 
> flip-flop, or a syn_preserve on the flip-flops, or with a synthesis 
> directive.  I've had varying success with the synthesis directive, often 
> finding it doesn't work right.  I've also had difficulties from synthesis 
> version to synthesis tool version of different behaviors for syn_keep and 
> syn_preserve in certain cases (the most notable is inputs to carry 
> chains...synplicity currently infers an additional lut if you put a syn 
> keep on an instantiated carry chain bit input).
>
> Hope that helps.

Very much, thanks.

Brad Smallridge
aivision
dot com

Article: 107080
Subject: Re: DCM vs. PLL
From: "Barry Brown" <b0_nws2@agilent.com>
Date: Thu, 24 Aug 2006 08:43:32 -0700
Links: << >> << T >> << A >>

Xilinx offers a reference design in XAPP265 for a 7:1 serdes (see app A).  I
have used the receiver successfully at 60MHz clock in Virtex II to
deserialize data that was transmitted by National "Channel Link"
transmitters.  Each clock requires one DCM and two global clocks.

I noticed that the DCM phase shift value calculated per the app note was not
optimum.  By changing the phase shift value back and forth until I found the
limits of good data, I empirically determined a value to put the clock in
the center of the data eye.

Barry


"Rob" <robnstef@frontiernet.net> wrote in message
news:BcQGg.8182$oa1.7226@news02.roc.ny...
> Serious question:  Does Altera's PLL's offer an advantage (veratility,
> jitter, etc) over Xilinx's DCM's?  I'm to understand that the DCM is not a
> PLL, correct?  What is the working principal behind the DCM (any
literature
> links?)
>
> This question arises from an upcoming design where we have three serial
LVDS
> interfaces that need to go into a V2PRO part.  I implemented this
interface
> within an Altera Stratix part without a problem; but I'm told by the group
> responsible for the V2PRO design that their FPGA doesn't have the
resources
> to handle the aforementioned interface, which will necessitate putting
> deserializers on the board at an added cost.
>
> Here's some information on the LVDS interface:
> Each interface has its own clock and 3 serialized lanes, for a total of 3
> clocks (45MHz) and 9 x 7 deep (315MHz fast clock) serialized lanes.
>
> Much obliged,
> Rob
>
>
>
>
>

Article: 107081
Subject: Re: DCM vs. PLL
From: Austin Lesea <austin@xilinx.com>
Date: Thu, 24 Aug 2006 08:45:41 -0700
Links: << >> << T >> << A >>

Rob,

See below,

Austin

Rob wrote:
> I appreciate the feedback guys.  Sorry for the delayed response--I've been 
> tied up all day.
> 
> The interface, though it looks like Camera Link, is not.  The V2PRO part 
> would need to receive from three separate IC's ,each generating its own 
> 45MHz clock and three lanes of x7 serialized data.  So, yes, the FPGA would 
> need to take the 45MHz and derive the 315MHz fast clock to pick off the 
> data.  So, based on this thread, there's nothing inherent about the DCM 
> which would preclude it from this task.
Yes.
> 
> I wonder why this other group is telling me that the V2PRO can't do the job? 
> Perhaps the V2PRO30 doesn't have enough DCM's?  I believe the FPGA would 
> need to have three, since there are three IC's generating the data.  You 
> couldn't just use one of the three clocks for all the data lanes because the 
> three chips could have some slight variation in timing.
Three DCMs, one for each clock.  That works.  No problem here.
  The implemenation I
> chose was to use three PLL's, and clock everything into a FIFO to switch and 
> sycnchronize all the data from the three chips into a single clock domain. 
> I implemented this within a Stratix (sorry Austin--nothing personal.
That is OK.  Some of my best friends use Altera FPGAs...
 When I
> entered into the FPGA world my group used Altera) and it works.  The design 
> must be transferred to another group for a different design and the 
> constraint (long story) is to use the V2PRO.  The contingency plan is to use 
> National's 90C flavor chips to deserialize the data before the V2PRO, thus 
> sending it parallel data.
> 
> My original post--due to my ignorance about the device--was to inquire about 
> the DCM and its capability and limitations (if any) with this interface.
As I said, the X7 requires the DFS section of the DCM, and will add
jitter due to synthesis (the CLKFX output).  But, you can also use the
phase shift feature, and dynamically move the CLKFX output  in relation
to the first rising edge of the 45 MHz input.  The phase control will be
0 to 255 steps in 256 of one 45 MHz period. 1/45 MHz = 22.2 ns.  So
22.2/256 = 87 ps per step, and with the min phase step of the V2Pro DCM
being ~ 30 ps, that is going to be from 2 to 3 tap steps automatically
selected based on the XXX/256 of a 45 MHz period that you wish to move
the 315 MHz output.  You can find the exact best phase shift that
recovers all the 7 data bits error free, and then use that number as a
fixed phase shift for all pcbs (as it all being digital, once you find
the best sample point, it shouldn't move).  Each DCM may have its own
best phase shift.

You could also do M=7, D=2, and only go up to 315/2 MHz, use CLKFX
rising edge and falling edge for DDR recovery of all 7 bits, but with
everything running at half the 315 MHz rate, for less power and more
timing margin (with some complexity in the interface).

Article: 107082
Subject: Re: Why No Process Shrink On Prior FPGA Devices ?
From: "Peter Alfke" <peter@xilinx.com>
Date: 24 Aug 2006 08:48:46 -0700
Links: << >> << T >> << A >>

Let me throw some conservative numbers at this:
$ 10 million for the mask sets for a new family
$ 50 million in design, characterization, software upgrade,
manufacturing logistics
another $ 50 M in "lost opportunity cost", because we would have to
delay other work
Plus a much higher cost caused by the potential loss of the leadership
position that Xilinx enjoys in this industry.
Redesigning an old family would be an irresponsible waste of our
resources.
Peter Alfke

Brannon wrote:
> > now if we would technology shrink some FPGA family then the amount
> > of work to be done for new 'characterization' of the silicon is
> > enourmous.
> >
> > and if you know the mask set pricing then you can easily understand
> > that this is not an option for any FPGA vendor.
>
> So let me conclude this to make sure I understand:
>
> 1. You cannot shrink FPGAs because when you shrink them, you have to
> update all the design PAR files to match the new timing. 2.
> Characterizing the timing on these internal lines is a pain in the
> butt. 3. Hence, nobody wants to invest that money when they could be
> spending their time getting the timing right on their latest designs.
> 
> Sound right?

Article: 107083
Subject: Re: Why No Process Shrink On Prior FPGA Devices ?
From: Austin Lesea <austin@xilinx.com>
Date: Thu, 24 Aug 2006 08:53:51 -0700
Links: << >> << T >> << A >>

Brannon,

That, and more.

Back when going from 1u, to .8u, to .65u, etc. was as simple as just
making a mask where everything was smaller, shrinking was just good
business.  Cheaper parts, maybe even faster parts, same functionality.

But now, making something smaller is a complete re-design, with all
circuits getting completely re-simulated, and redone.  And finally the
layout has changed such that a plain shrink would violate all the design
rules.

Basically, not an option anymore.

The last shrink we did was 0.18u to 0.15u in Spartan 2E for cost reasons
(years ago).  It involved a lot of work, but just slightly less than a
completely new product, so it made sense.

Austin

Brannon wrote:
>> now if we would technology shrink some FPGA family then the amount
>> of work to be done for new 'characterization' of the silicon is
>> enourmous.
>>
>> and if you know the mask set pricing then you can easily understand
>> that this is not an option for any FPGA vendor.
> 
> So let me conclude this to make sure I understand:
> 
> 1. You cannot shrink FPGAs because when you shrink them, you have to
> update all the design PAR files to match the new timing. 2.
> Characterizing the timing on these internal lines is a pain in the
> butt. 3. Hence, nobody wants to invest that money when they could be
> spending their time getting the timing right on their latest designs.
> 
> Sound right?
>

Article: 107084
Subject: Xilinx BRAMs question - help needed ..
From: me_2003@walla.co.il
Date: 24 Aug 2006 08:57:36 -0700
Links: << >> << T >> << A >>

Hi all,
My problem is as follows, I need to use a BRAM to create a true dual
port memory (6x512).
My memory should be written to/read from simultaneously. But !!! my
problem is that I need to fill the memory with bit at a time i.e.
whenever I get a bit in my input I also get two signals - address and
bit field (0-5) and that bit should be written to that address correct
bit location. Now if another bit arrives to the same address but to a
different bit field it should be also written to it (like a bit-wise OR
between current value and new value).

I thought to implement it with Xilinx's BRAM by Or'ing the RAM output
with the new written input but I noticed (too late..) that the BRAM
output comes out one cycle after the address. This makes it a problem
for me because now I need 2 clock cycles for every one of these
"bit-enabled write" operations.

I hope that my post is not too long and exhausting and I would
appreciate any ideas from you guys...

Thanks, Mordehay.   

p.s - I'm using V4 (lx).

Article: 107085
Subject: Re: Xilinx BRAMs question - help needed ..
From: "Peter Alfke" <peter@xilinx.com>
Date: 24 Aug 2006 09:06:59 -0700
Links: << >> << T >> << A >>

I did not really understand the question, but:
You can configure the BRAM 1-bit wide, and thus address each bit
individually.
It seems to me that this solves all your problems.
You can configure the two ports separately, e.g. one can be 1-bit wide,
the other 9 bits wide.
You can write into both ports simultaneously.
You cannot in one cycle write data that also depends on the prior
content of that location, but BRAMs are fairly fast. You did not
mention your clock rate.
Peter Alfke, Xilinx

me_2...@walla.co.il wrote:
> Hi all,
> My problem is as follows, I need to use a BRAM to create a true dual
> port memory (6x512).
> My memory should be written to/read from simultaneously. But !!! my
> problem is that I need to fill the memory with bit at a time i.e.
> whenever I get a bit in my input I also get two signals - address and
> bit field (0-5) and that bit should be written to that address correct
> bit location. Now if another bit arrives to the same address but to a
> different bit field it should be also written to it (like a bit-wise OR
> between current value and new value).
>
> I thought to implement it with Xilinx's BRAM by Or'ing the RAM output
> with the new written input but I noticed (too late..) that the BRAM
> output comes out one cycle after the address. This makes it a problem
> for me because now I need 2 clock cycles for every one of these
> "bit-enabled write" operations.
>
> I hope that my post is not too long and exhausting and I would
> appreciate any ideas from you guys...
> 
> Thanks, Mordehay.   
> 
> p.s - I'm using V4 (lx).

Article: 107086
Subject: Re: Global signal conservation
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: 24 Aug 2006 18:27:21 +0200
Links: << >> << T >> << A >>

Marc Randolph wrote:
> Could you name the parts you are referring to?  I find it surprising
> that an inverter within the global clock network would take that long,
> and more surprising that you can get off the global clock net and
> through a LUT-based inverter.  So surprised, in fact, that I had to go
> try it for myself.  I put three different examples in the design,
> rising_edge(clk), falling_edge(clk) as well as a clk_inv <= NOT clk
> were each clocking an output.  The design used one global clocks and
> the inverter immedately in front of the FF (which doesn't take 0.5 ns).
>   Synthesis tool (Synplicity) choice or tool options may have something
> to do with this.  Of course, it pulled the FF for the outputs with
> inverted clocks out of the IOBs, but that's a separate issue

This is the spartan-3e series. I had just assumed the LUT would be
involved in inverting a clock.

I'm looking at the spartan-3e family data sheet, and I have the
diagram for the CLB up. On the right side are FFX and FFY
and these are only clocked by a single CK line, and there is no
option for inverting it. Are you talking about inverting the signal
or the clock?

There is a note on page 22 of ds312-spartan-3e-family.pdf
  1. Options to invert signal polarity as well as other options that
   enable lines for various functions are not shown.

I spent some time looking but didn't find any optional inverter
on the clock.

Hmmm. That rising_edge() vs falling_edge() issue is interesting.
The design I'm studying uses unisim fddrrse entities. These
expect 2 clocks -- true and inverted. But the vhdl for the
entity could be rewritten to take a single clock, but just
use rising_edge and falling_edge in its implementation.
Xilinx didn't do this.

So this brings up a question -- would it be better coding practice
to make use of rising_edge/falling_edge, or instead use 2 clocks?
Unisim seems to opt for the 2nd option. Is compatibility with
unisim important? In fact the things in unisim that are
used by the designs I'm studying (opencores ddr controller for
example) are trivial -- having unisim even involved doesn't
really buy anything. And unisim I think is xilinx specific.

-Dave

Article: 107087
Subject: DDR controller on Spartan-3e 500
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: 24 Aug 2006 18:32:35 +0200
Links: << >> << T >> << A >>

Hi,

I want to control a 16 bit DDR module with the 256 pin
Spartan-3e 500 package. There are too many pins on the
DDR to fit in one quadrant of the device, so it will have
to use 2. Can this work? I was thinking address and
control lines on one quadrant, and the data on another.

The 320 pin package brings out enough IO so that a
single quadrant can control everything, but that's more
expensive.

I'm wondering if the timing will be ok (100 mhz or
faster is what I'm shooting for) with the logic spread
across quadrants.

Thanks--
Dave

Article: 107088
Subject: Re: Global signal conservation
From: "Peter Alfke" <peter@xilinx.com>
Date: 24 Aug 2006 09:35:58 -0700
Links: << >> << T >> << A >>

David Ashley wrote:
> > There is a note on page 22 of ds312-spartan-3e-family.pdf
>   1. Options to invert signal polarity as well as other options that
>    enable lines for various functions are not shown.
>
> I spent some time looking but didn't find any optional inverter
> on the clock.

David, when we say that something "is not shown", it means that we
uncluttered the drawing by not showing something (that really is
there).
No wonder you could not find it in the schematic, it "is not shown"
The conditional inverter is usually some kind of XOR circuit, carefully
balanced, so that it does not affect the through-delay. 
Peter Alfke

Article: 107089
Subject: Re: USB PHYs and drivers that folks have used
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: 24 Aug 2006 18:49:24 +0200
Links: << >> << T >> << A >>

KJ wrote:
> OpenCores.org and googling did not give any such assurances which is my 
> reason for querying for people who may have actually used the above 
> mentioned cores or suggest ones that they have used and qualified for use in 
> a product.

I'm planning on using a Cypress CY7C67200 in a project. It has 2 USB
ports and one can be on-the-go, meaning it can switch between host
and peripheral mode as necessary. I really just need 2 host ports.

The chip is 8051 based with support hardware. However it seems to
have some mode where it can get out of the way and be controlled
by an external device, which would be the FPGA. It's all theory, I
need to prove out functionality.

-Dave

Article: 107090
Subject: Re: Global signal conservation
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: 24 Aug 2006 18:55:19 +0200
Links: << >> << T >> << A >>

Peter Alfke wrote:
> David Ashley wrote:
> 
>>>There is a note on page 22 of ds312-spartan-3e-family.pdf
>>
>>  1. Options to invert signal polarity as well as other options that
>>   enable lines for various functions are not shown.
>>
>>I spent some time looking but didn't find any optional inverter
>>on the clock.
> 
> 
> David, when we say that something "is not shown", it means that we
> uncluttered the drawing by not showing something (that really is
> there).
> No wonder you could not find it in the schematic, it "is not shown"
> The conditional inverter is usually some kind of XOR circuit, carefully
> balanced, so that it does not affect the through-delay. 
> Peter Alfke
> 

Peter,

Very good info. BTW is there any mention of the optional
inverter on the CK input in the datasheet? If this is a word-of-mouth
type of tidbit I'm worried about what *else* I need to learn this way :^).

Anyway this clears up the whole question I had. Just because the
DCM outputs an inverted clock doesn't mean you gain any advantage
by using it -- because the CLB's can use the true clock or inverted
and there is no penalty. This is really good stuff I need to know
but didn't.

Thanks all,

-Dave

Article: 107091
Subject: Re: JOP as SOPC component
From: Tommy Thorn <foobar@nowhere.void>
Date: Thu, 24 Aug 2006 10:03:06 -0700
Links: << >> << T >> << A >>

KJ wrote:
.... a (AFAICT) correct description of Avalon.

> During a read, Avalon allows the delay between the clock cycle with "read 
> and not(waitrequest)" and the eventual clock cycle with "readatavalid" to be 
> either fixed or variable.  If fixed, then SOPC Builder allows the fixed 
> latency number to be entered into the class.ptf file for the slave and no 
> readdatavalid output from the slave is required.  All that does though is 
> cause SOPC Builder to synthesize the code itself to generate readdatavalid 
> as if it came from the slave code itself.  If the readdatavalid output IS 
> part of the component then SOPC Builder allows the latency delay to be 
> variable; whether it actually is or not is up to the slave's VHDL/Verilog 
> design code.  Bottom line is that Avalon does have a mechanism built right 
> into the basic specification that allows a master device to start up another 
> read or write cycle one clock cycle prior to readdata actually having been 
> provided.

Ah, we only differ in perspective. Yes, Avalon _allows_ you to write 
slaves like that and if your fabric consists only of such slaves, then 
yes, they are the same. But variable latency does _not_ work like that, 
thus you can't make such an assumption in general if you wish the fabric 
to be able to accommodate arbitrary Avalon slaves.

> Given the description that Martin posted on how his SimpCon interface logic 
> works it 'appears' that he believes that this ability to start up another 
> cycle prior to completion (meaning the data from the read has actually been 
> returned) is what is giving SimpCon the edge over Avalon.  At least that's 
> how it appears to me, which is why I asked him to walk me through the 
> transaction to find where I'm missing something.  My basic confusion is not 
> understanding just exactly where in the read transaction does SimpCon 'pull 
> ahead' of Avalon and give 'JOP on SimpCon' the performance edge over 'JOP on 
> Avalon'.

That was not my understanding. SimpCon allows Martin to get an "early 
warning" that a transaction is about to complete. As I mentioned, this 
is not an uncommon idea and it works great for point-to-point 
interfaces. My claim is that it doesn't scale if you wish to use SimpCon 
like a general purpose fabric like Avalon.

Being able to "start up another cycle prior to completion" is what I 
mean by multiple outstanding requests (known as "posted reads" in PCI 
lingo). It is definitely a feature of Avalon.

> Anyway, hopefully that explains why it's not abusing Avalon in any way.

My wording was poor. Another way to say it is "to use Avalon in a 
constrained way". Used this way you cannot hook up slaves with variable 
latency, so it's not really Avalon, it's a subset of Avalon.

Cheers,
Tommy

Article: 107092
Subject: Re: Global signal conservation
From: "Alan Nishioka" <alan@nishioka.com>
Date: 24 Aug 2006 10:14:46 -0700
Links: << >> << T >> << A >>

David Ashley wrote:
> I'm looking at the spartan-3e family data sheet, and I have the
> diagram for the CLB up. On the right side are FFX and FFY
> and these are only clocked by a single CK line, and there is no
> option for inverting it. Are you talking about inverting the signal
> or the clock?
>
> There is a note on page 22 of ds312-spartan-3e-family.pdf
>   1. Options to invert signal polarity as well as other options that
>    enable lines for various functions are not shown.
>
> I spent some time looking but didn't find any optional inverter
> on the clock.

When using Xilinx, the best way to see what hardware is actually there
is to use fpga_editor.  You don't even need a design; just create a new
one, make up a name, and select the part you want to look at.  Then you
can double-click on the slice and see what is inside of it.

Alan Nishioka

Article: 107093
Subject: Re: Who should buffer, fabric or slave? [was: JOP as SOPC component]
From: Tommy Thorn <foobar@nowhere.void>
Date: Thu, 24 Aug 2006 10:28:05 -0700
Links: << >> << T >> << A >>


> However, issuing multiple requests to different slaves and
> than delivering them in order is a pain for the switch
> logic. You have to remember your request order and handle
> the results arriving in a different order. However, for
> this issue a slave that holds the data till used can
> simplify the switching a little bit...

See attachment. I do that in 137 line of simple Verilog, but of course 
it depends on slaves replying to requests in the order received.

> Perhaps I should state how I see SimpCon: A *simple*
> SoC interconnect that allows for lower latency and
> pipelining to some extent. The main application I have
> in mind is a single master (CPU) with multiple slaves
> (memory and IO). The interconnect/address decoding
> should be simple - and it is - see an example at:
> http://www.opencores.org/cvsweb.cgi/~checkout~/jop/vhdl/scio/scio_min.vhd

Well, then it's not an alternative to Avalon or WISHBONE. Multimaster is 
an absolute requirement for many designs and multiple outstanding 
requests ("pipelined" in Avalon lingo) is needed for performance. You 
are obviously fully free to do whatever you want, but this discussion 
started with a desire for a open/free interconnect fabric alternative to 
WISHBONE.

The subset of Avalon I use is *simple* IMO. I'd be happy to show you 
everything, but the only remotely complicated part is what I attached.

I'll ponder on your "slaves buffer readdata" idea, but whoever designed 
Avalon must have considered it.

> Besides component declaration and IO signal routing
> the interconnect is just 18 lines of VHDL. The read
> MUX is driven by registered select, which helps in
> the critical path when you have planty of slaves.

I have n-1 arbitration structure for n masters. The interface overhead 
in slaves themselves are too trivial to count, but I have many more 
slaves than arbitration. If logic resources are the prime concern, then 
you should keep the interface overhead in slaves as simple as possible. 
Forcing them to buffer readdata doesn't sound like a simplification.

I do like the apparent symmetry between the input and output ports this 
entails.

Cheers,
Tommy

 filename="arbitration.v"

module arbitration
            (input         clock

            // Master port 1
            ,input         transfer_request1
            ,input  [31:0] address1
            ,input         wren1
            ,input  [31:0] wrdata1
            ,input  [ 3:0] wrmask1
            ,output        wait_request1
            ,output        read_data_valid1
            ,output [31:0] read_data1

            // Master port 2
            ,input         transfer_request2
            ,input  [31:0] address2
            ,input         wren2
            ,input  [31:0] wrdata2
            ,input  [ 3:0] wrmask2
            ,output        wait_request2
            ,output        read_data_valid2
            ,output [31:0] read_data2

            // Target port
            ,output        transfer_request
            ,output [31:0] address
            ,output        wren
            ,output [31:0] wrdata
            ,output [ 3:0] wrmask
            ,input         wait_request
            ,input         read_data_valid
            ,input  [31:0] read_data
            );

   /*
    * Data routing fifo.  Size must cover all potential outstanding
    * transactions.
    */
   parameter   FIFO_SIZE_LG2 = 4;
   parameter   debug = 0;

   reg         data_for_1[(1 << FIFO_SIZE_LG2) - 1:0];
   reg [FIFO_SIZE_LG2-1:0] rp = 0;
   reg [FIFO_SIZE_LG2-1:0] wp = 0;

   wire [FIFO_SIZE_LG2-1:0] wp_next = wp + 1;
   wire [FIFO_SIZE_LG2-1:0] rp_next = rp + 1;

   assign      transfer_request = transfer_request1 | transfer_request2;
   assign      read_data1       = read_data;
   assign      read_data2       = read_data;
   assign      read_data_valid1 = read_data_valid & data_for_1[rp];
   assign      read_data_valid2 = read_data_valid & ~data_for_1[rp];

   wire        en1              = transfer_request1 & ~wait_request1;
   wire        en2              = transfer_request2 & ~wait_request2;
   assign      address          = en1 ? address1 : address2;
   assign      wren             = en1 ? wren1    : wren2;
   assign      wrdata           = en1 ? wrdata1  : wrdata2;
   assign      wrmask           = en1 ? wrmask1  : wrmask2;

   always @(posedge clock) begin
      if ((en1 | en2) & ~wren) begin
         data_for_1[wp] <= en1;
         wp <= wp_next;
         if (wp_next == rp)
           if(debug)$display("%05d ARB: FIFO OVERFLOW! wp: %d rp: %d", $time, wp_next, rp);
         else
           if(debug)$display("%05d ARB: FIFO remembered a read req wp: %d rp: %d", $time, wp_next, rp);
      end

      if (read_data_valid) begin
         rp <= rp_next;
         if (rp == wp)
           if(debug)$display("%05d ARB: FIFO UNDERFLOW! wp: %d rp: %d", $time, wp, rp_next);
         else
           if(debug)$display("%05d ARB: FIFO routed read data wp: %d rp: %d", $time, wp_next, rp);
      end
   end

   /*
    * Share based
    */
   parameter SHARES_1 =  5; // > 0
   parameter SHARES_2 = 10; // > 0
   parameter LIKE_AVALON = 1;

   parameter OVERFLOW_BIT = 6;

   reg         current_master = 0;
   reg [OVERFLOW_BIT:0] countdown = SHARES_1 - 2;
   assign      wait_request1 = wait_request | transfer_request2 & current_master;
   assign      wait_request2 = wait_request | transfer_request1 & ~current_master;

   reg [31:0]  count1 = 1, count2 = 1;

   always @(posedge clock) begin
      if (transfer_request1 | transfer_request2)
        if(debug)
          $display("%05d ARB: Req %d/%d  Arbit %d/%d  W:%d %d (shares left %d, cummulative ratio %f)",
                 $time,
                 transfer_request1, transfer_request2,
                 transfer_request1 & ~wait_request1,
                 transfer_request2 & ~wait_request2,
                 wren1, wren2,
                 countdown + 2,
                 1.0 * count1 / count2);

      /* statistics */
      count1 <= count1 + (transfer_request1 & ~wait_request1);
      count2 <= count2 + (transfer_request2 & ~wait_request2);

      /* The arbitration is only relevant when two masters try to
       * initiate at the same time.  We swap priorities when the
       * current master runs out of shares.
       *
       * Notice, unlike Avalon, a master does not forfeit its shares if
       * it temporarily skips a request.  IMO this leads to better QOS
       * for a master that initiates on a less frequent rate.
       *
       * In this model, the arbitration tries to approximate a
       * SHARES_1 : SHARE_2 ratio for Master 1 and Master 2
       * transactions (as much as the available requests will allow it).
       */

      if (~wait_request) begin
         if (transfer_request1 | transfer_request2) begin
            countdown <= countdown - 1;
            if (countdown[OVERFLOW_BIT]) begin
               current_master <= ~current_master;
               countdown <= (current_master ? SHARES_1 - 2 : SHARES_2 - 2);
            end
         end
      end
   end
endmodule

Article: 107094
Subject: Re: Open source Xilinx JTAG Programmer released on sourceforge.net
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: 24 Aug 2006 19:28:59 +0200
Links: << >> << T >> << A >>

Andreas Ehliar wrote:
> No need to do that, the S3E starter kit is supported by Bryan's
> modified XC3Sprog, available at http://inisyn.org/src/xup/ .
> 
> /Andreas

Andreas,

Very good! It says USB1 is not supported, get a USB2
card. I'm not sure what this means. I'm able to download
bit files to spartan-3e starter board with no problems under
windows -- but I don't think my usb controller is usb 2.0
(as in high speed). Will the xup work on my machine?

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Article: 107095
Subject: Re: high level languages for synthesis
From: David Ashley <dash@nowhere.net.dont.email.me>
Date: 24 Aug 2006 19:39:15 +0200
Links: << >> << T >> << A >>

Sanka Piyaratna wrote:
> Hi,
> 
> What is your opinion on high level languages such as systems C, handel-C
> etc. for FPGA development instead of VHDL/Verilog?
> 
> Sanka

Gulp.

I've just embarked on trying to master VHDL, and now
you're telling me there's something easier. Doh!

-Dave
PS I don't think I answered your question.

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Article: 107096
Subject: QuickLogic
From: "Chuck Levin" <clevin1234@comcast.net>
Date: Thu, 24 Aug 2006 13:43:52 -0400
Links: << >> << T >> << A >>

Hi,

  I was thinking about using QuickLogic for a low power FPGA design. Does
anyone have any experiences they would like to share about their devices or
tools ?

Thanks

Article: 107097
Subject: Re: high level languages for synthesis
From: "Antti" <Antti.Lukats@xilant.com>
Date: 24 Aug 2006 10:50:15 -0700
Links: << >> << T >> << A >>

David Ashley schrieb:

> Sanka Piyaratna wrote:
> > Hi,
> >
> > What is your opinion on high level languages such as systems C, handel-C
> > etc. for FPGA development instead of VHDL/Verilog?
> >
> > Sanka
>
> Gulp.
>
> I've just embarked on trying to master VHDL, and now
> you're telling me there's something easier. Doh!
>
> -Dave
> PS I don't think I answered your question.
>
> --
> David Ashley                http://www.xdr.com/dash
> Embedded linux, device drivers, system architecture

no C is not answer.
sure some C to FPGA tools work pretty nicely.

I think the ability to use any HDL if required is a bit +
quite often some imported IP is in the "other HDL"
so you cant do it all in single language. and always
rewriting from one into another isnt also an option.

Antti

Article: 107098
Subject: Re: DDR controller on Spartan-3e 500
From: nico@puntnl.niks (Nico Coesel)
Date: Thu, 24 Aug 2006 18:00:28 GMT
Links: << >> << T >> << A >>

David Ashley <dash@nowhere.net.dont.email.me> wrote:

>Hi,
>
>I want to control a 16 bit DDR module with the 256 pin
>Spartan-3e 500 package. There are too many pins on the
>DDR to fit in one quadrant of the device, so it will have
>to use 2. Can this work? I was thinking address and
>control lines on one quadrant, and the data on another.
>
>The 320 pin package brings out enough IO so that a
>single quadrant can control everything, but that's more
>expensive.
>
>I'm wondering if the timing will be ok (100 mhz or
>faster is what I'm shooting for) with the logic spread
>across quadrants.

100MHz shouldn't be a problem. I have a designed a DDR controller
which has its data path spread over 2 fpgas in a PQ204 package.

-- 
Reply to nico@nctdevpuntnl (punt=.)
Bedrijven en winkels vindt U op www.adresboekje.nl

Article: 107099
Subject: Re: DDR controller on Spartan-3e 500
From: "John_H" <newsgroup@johnhandwork.com>
Date: Thu, 24 Aug 2006 18:01:34 GMT
Links: << >> << T >> << A >>

You should have no problems with the functional division and frequency as 
you've specified.  Synchronous logic and global clocks make it easy.

"David Ashley" <dash@nowhere.net.dont.email.me> wrote in message 
news:44edd4a3$1_3@x-privat.org...
> Hi,
>
> I want to control a 16 bit DDR module with the 256 pin
> Spartan-3e 500 package. There are too many pins on the
> DDR to fit in one quadrant of the device, so it will have
> to use 2. Can this work? I was thinking address and
> control lines on one quadrant, and the data on another.
>
> The 320 pin package brings out enough IO so that a
> single quadrant can control everything, but that's more
> expensive.
>
> I'm wondering if the timing will be ok (100 mhz or
> faster is what I'm shooting for) with the logic spread
> across quadrants.
>
> Thanks--
> Dave

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search