Messages from 153400

Article: 153400
Subject: Re: LUT6 FPGAs and Carry Logic
From: Jan Bruns <jansaccount@arcor.de>
Date: 16 Feb 2012 15:50:15 GMT
Links: << >> << T >> << A >>


Kolja Sulimma:
> The problem here is that users tend to evaluate the 
> capabilites of an FPGA mainly as logic, while really 
> you pay mostly for routing. Logic is a very small 
> portion of the silicon area. Of course the vendors 
> don't publish the numbers, but university research
> suggests the area of LUT and LUT configuration is 
> only a few percent of total area.

That's what I expected. This becomes pretty obvious
if you imagine a LUT2 FPGA, where everyone should
intuitively understand that the entire silicon would 
be filled up with routing resources. And LUT4 can't
be far off.

> Therefore when going from 4-LUT to 6-LUT you don't 
> get a 4x area increase (16 entries to 64 entries) 
> but more like a 60% increase (going from 4 inputs 
> that must be routed to 6 inputs that must be routed 
> in a somewhat worse than linear routing area).

So let's compare Spartans:

Spartan6  LUT6: about  7 ins, about 3 outs = 10 ports
Spartan3 Slice: about 10 ins, about 6 outs = 16 ports

Where the port count for the Sparta3 Slice doesn't
include the FXMUX path, but the full XB/YB (I doubt
this path has/needs full routing caps, anyway).

So from what you said about area with taking routing
resources into account, the Spartan3 Slice might very
well consume a little more area, although it has only
about half the SRAM bits.

What do we get for that?

For SLICEL, I think of:

2*any 4 inp-func: LUT4:yes, LUT6:no
2*any 4 inp-func, paired invert: LUT4:yes, LUT6:no
any 5-inp func: both
any 6-inp func: LUT4:no LUT6:yes
MUX4: both
half/partial populated Carry:  LUT4:yes, LUT6:no
2 Bit full Adder: both
2 Bits of long Adder: LUT4:yes, LUT6:one, but 2?
2 Bits of long MulAdder: LUT4:yes, LUT6:one, but 2?
1 Bit ALU (fast Carry): maybe both
--with dual Ext-feedin: LUT4:yes(paired with DPram), LUT6:no
Large Chain Logic: LUT4: 8Bit/Slice, LUT6:6Bit/LUT
DblLUTed Chain Logic: LUT4: no, BX, only, LUT6: yes


For SLICEM, I also think of:
64x1 RAM: LUT4:no, LUT6 yes
32x2 RAM: LUT4:no, LUT6 yes
32x1 RAM: LUT4:yes, LUT6 yes
16x2 RAM: LUT4:yes, LUT6 yes
16x1 RAM+Adder: LUT4:yes, LUT6 no

Well, for the SLICEM-Part, the LUT6 might be a better 
choice, but for SLICEL, I'd still prefer the LUT4,
given 50% area overhead, although I'm missing a
little partial bit more of static MUXes and FF-paths
(independent clock-inverters, or something).

Gruss

Jan Bruns


-- 
Ein paar Fotos: http://abnuto.de/gal/

Article: 153401
Subject: Re: LUT6 FPGAs and Carry Logic
From: rickman <gnuarm@gmail.com>
Date: Thu, 16 Feb 2012 12:45:51 -0800 (PST)
Links: << >> << T >> << A >>

On Feb 16, 7:07 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> Martin Thompson <martin.j.thomp...@trw.com> wrote:
>
> (snip)
>
> > Don't ask me - I'm not making the decisions.  Ultimately, Xilinx
> > presumably decided it was a "win" in business terms: "We'll make the
> > most money doing it this way."
>
> Well, they do have some competition. If they don't design
> and build what works for their customers, they will lose out.
>
> >> I don't believe there's no market for LUT4 FPGAs using current
> >> silicon process.
> > No-one is saying there is not a market.  Just that it's not
> > big enough for Xilinx to be targetting it.
>
> As I understand it, 6LUT is better for larger chips.
>
> For smaller ones, it likely doesn't make so much difference.
> There is some advantage as far as synthesis software of
> keeping a minimum number of different architectures.
>
> Still, 4LUT chips should be around for a while.
>
> -- glen

I believe that is what it comes down to.  Given the fact that routing
is a huge percentage of the chip area (and so cost) this becomes a
more important factor as the chips get larger.  After all, routing
does go up at a faster rate than linear.  So minimizing routing is
more important in larger chips.  The tradeoff provides for lower costs
with LUT6 in larger devices.

The other side of the coin is more "wasted" logic when larger LUTs are
underutilized.  So it would seem that we have reached the point where
the LUT6 is optimal for many if not the vast majority of designs.

I don't know that there is a performance penalty in using LUT6.  I
would expect that is minimal since the muxes in the LUTs are done with
transmission gates with very little delay, but I don't really know.
If so, the only issue then becomes cost.  So if you design is one of
the minority designs that can indeed be done more efficiently in a
LUT4 architecture, then you will pay a bit more for a LUT6 based
part... but given the advantages of smaller feature size you will
likely get lower costs with the newer parts than sticking with an old
generation.

As to design reworks required to optimize a design for a newer part, I
expect that would be done for speed and/or cost.  My experience is
that Xilinx is more than willing to help you with that, especially if
it means a design win over a competitor.  But would anyone really
expect much lost ground from a LUT4 design to a current LUT6 design?
Software changes can greatly impact results, but I can't see needing
to touch a design from a Spartan 3 to get it to run well in a newer
device given the large improvements in the hardware from using a much
smaller process.  I suppose if you have used hard constraints you may
have to remove them.  But you knew the risk when you used those
features, no?

Rick

Article: 153402
Subject: Re: LUT6 FPGAs and Carry Logic
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Thu, 16 Feb 2012 21:06:17 +0000 (UTC)
Links: << >> << T >> << A >>

rickman <gnuarm@gmail.com> wrote:

(snip, I wrote)
>> For smaller ones, it likely doesn't make so much difference.
>> There is some advantage as far as synthesis software of
>> keeping a minimum number of different architectures.

(snip)
> I believe that is what it comes down to.  Given the fact that routing
> is a huge percentage of the chip area (and so cost) this becomes a
> more important factor as the chips get larger.  After all, routing
> does go up at a faster rate than linear.  So minimizing routing is
> more important in larger chips.  The tradeoff provides for lower 
> costs with LUT6 in larger devices.

> The other side of the coin is more "wasted" logic when larger LUTs are
> underutilized.  So it would seem that we have reached the point where
> the LUT6 is optimal for many if not the vast majority of designs.

One that I am interested in, though, is that 6LUT should be
much better for building the MUX needed for barrel shifters.
A 4LUT makes a two input MUX, but 6LUT can make a 4 input
(and two select line) MUX. Other than that, I haven't though
much about how useful differnet sizes are. The less logic
between FF's, the less advantage to larger ones.

> I don't know that there is a performance penalty in using LUT6.  I
> would expect that is minimal since the muxes in the LUTs are done with
> transmission gates with very little delay, but I don't really know.
> If so, the only issue then becomes cost.  So if you design is one of
> the minority designs that can indeed be done more efficiently in a
> LUT4 architecture, then you will pay a bit more for a LUT6 based
> part... but given the advantages of smaller feature size you will
> likely get lower costs with the newer parts than sticking with an old
> generation.

Well, they have to be designed not to glitch when switching
between entries with the same output value. That doesn't
naturally happen with an SRAM. Also, with transmission gates
you can't go through too many without a buffer, but presumably
that is part of optimizing the cell.

-- glen

Article: 153403
Subject: Re: LUT6 FPGAs and Carry Logic
From: Kolja Sulimma <ksulimma@googlemail.com>
Date: Thu, 16 Feb 2012 14:39:59 -0800 (PST)
Links: << >> << T >> << A >>

On Feb 16, 4:50=A0pm, Jan Bruns <jansacco...@arcor.de> wrote:

> Spartan6 =A0LUT6: about =A07 ins, about 3 outs =3D 10 ports
> Spartan3 Slice: about 10 ins, about 6 outs =3D 16 ports
>
> So from what you said about area with taking routing
> resources into account, the Spartan3 Slice might very
> well consume a little more area, although it has only
> about half the SRAM bits.

Not. It will consume a lot more area if you include routing.
Routing grows faster than linear (look up "rent exponent").
Of course it can cover more flexible circuit areas because
you can chose much more combinations of input signals
with two 4-luts compared to one 6-lut (except if you have
high fanin random logic. But the area is much larger.

The point is: It does not matter if a LUT-6 on average has
lower utilization, as LUT area is virtually free.  What matters
is routing utilization.

There is research that clearly shows that from an efficiency
standpoint FPGAs are best that can't achieve 100% LUT utilization
because they have sparse routing.

The reasons why vendors choose to provide lots of routing anyway is:
a) customers don't understand this and tend to start whining when they
don't get 100% LUT utilization instead of beeing happy that they get
better wire utilization. (Remember: Wires are the expensive part)

b) It get's hard to predict what can be implemented and what can't.

c) software gets harder to do and slower with worse routing
ressources.

So you pay a premium to be able to reliably plan your design and to
simplify marketing.

Back to LUT size: Have a look at figure 3.3 in this:
http://www.eecg.utoronto.ca/~jayar/pubs/theses/Ahmed/EliasAhmed.pdf

area is virtually constant in that analysis for LUT sizes from 4 to 6.
But with LUT size 6 you get much better software runtimes.

Kolja

Article: 153404
Subject: Re: LUT6 FPGAs and Carry Logic
From: Jan Bruns <jansaccount@arcor.de>
Date: 17 Feb 2012 03:07:38 GMT
Links: << >> << T >> << A >>


Kolja Sulimma:

>> Spartan6  LUT6: about  7 ins, about 3 outs = 10 ports Spartan3 Slice:
>> about 10 ins, about 6 outs = 16 ports

>> So from what you said about area with taking routing resources into
>> account, the Spartan3 Slice might very well consume a little more area,
>> although it has only about half the SRAM bits.
 
> Not. It will consume a lot more area if you include routing. Routing
> grows faster than linear (look up "rent exponent"). Of course it can
> cover more flexible circuit areas because you can chose much more
> combinations of input signals with two 4-luts compared to one 6-lut
> (except if you have high fanin random logic. But the area is much
> larger.

Take some area A of silicon and put n_1 blocks of type T_1 into it.
Take another area A of silicon and put n_2 blocks of a similar type 
T_2 into it.

If  n_1*portcount(T_1)  = n_2*portcount(T_2)  then
portcount(A) won't depend on what blocktype was implemented,
and I don't see any reason why one or the other should consume
more routing overhead.
 
> The point is: It does not matter if a LUT-6 on average has lower
> utilization, as LUT area is virtually free.  What matters is routing
> utilization.

If the utilization of a given LUT goes low, the routing will on average
become lesser "localized", so that wires become longer,
 
> There is research that clearly shows that from an efficiency standpoint
> FPGAs are best that can't achieve 100% LUT utilization because they have
> sparse routing.
> 
> The reasons why vendors choose to provide lots of routing anyway is: a)
> customers don't understand this and tend to start whining when they
> don't get 100% LUT utilization instead of beeing happy that they get
> better wire utilization. (Remember: Wires are the expensive part)
> 
> b) It get's hard to predict what can be implemented and what can't.
> 
> c) software gets harder to do and slower with worse routing ressources.
> 
> 
> So you pay a premium to be able to reliably plan your design and to
> simplify marketing.
> 
> Back to LUT size: Have a look at figure 3.3 in this:
> http://www.eecg.utoronto.ca/~jayar/pubs/theses/Ahmed/EliasAhmed.pdf
> 

Thanks for sharing that link. 

However, my understanding from that presentation is, that LUT4,,6 give
the same overall area utilization,  LUT>6 would give shortest delays, 
and LUT4..6 all give the same best area*delay product. 

> area is virtually constant in that analysis for LUT sizes from 4 to 6.
> But with LUT size 6 you get much better software runtimes.

Overall area (including routing) doesn't significantly change from LUT4 
to LUT6, and even the delay was similar from LUT4 to LUT6.

But these results don't represent the fact, that the Xilinx Lut4-design
has an enormous fit to many practically relevant problems (for example,
adders ans busmuxes are very frequently used). Even the software 
generated technology mapping makes heavy use of these additional Lut4 
features, that are almost for free, compared to the theoretical, simple
LUT4 design.


The technology mapping might become easier for synthesis software, if
the CLB design comes nearer to the bare LUT (with LUT6, the Carry seems
to become the only additional specialized circuit), but the Xilinx 
software is already able to make good use of their LUT4 specials, it's
only that it doesn't always notice the ideal, obvious solution.

Gruss

Jan Bruns

-- 
Ein paar Fotos: http://abnuto.de/gal/

Article: 153405
Subject: Re: LUT6 FPGAs and Carry Logic
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Fri, 17 Feb 2012 05:37:25 +0000 (UTC)
Links: << >> << T >> << A >>

Kolja Sulimma <ksulimma@googlemail.com> wrote:

(snip)
> There is research that clearly shows that from an efficiency
> standpoint FPGAs are best that can't achieve 100% LUT utilization
> because they have sparse routing.

I have done place and route on pipelined arrays with different
numbers of cells per chip, and found that speed goes fairly
close to inversely proportional to the number of cells, over
a fairly wide range.

-- glen

Article: 153406
Subject: Re: LUT6 FPGAs and Carry Logic
From: Jan Bruns <jansaccount@arcor.de>
Date: 17 Feb 2012 06:06:04 GMT
Links: << >> << T >> << A >>


glen herrmannsfeldt:

>(snip)
>> There is research that clearly shows that from an efficiency
>> standpoint FPGAs are best that can't achieve 100% LUT utilization
>> because they have sparse routing.

> I have done place and route on pipelined arrays with different numbers
> of cells per chip, and found that speed goes fairly close to inversely
> proportional to the number of cells, over a fairly wide range.

Some pipeline control signals crossing the data-path and getting slower 
with wider fanouts? 

Gruss

Jan Bruns

-- 
Ein paar Fotos: http://abnuto.de/gal/

Article: 153407
Subject: Re: LUT6 FPGAs and Carry Logic
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Fri, 17 Feb 2012 08:53:07 +0000 (UTC)
Links: << >> << T >> << A >>

Jan Bruns <jansaccount@arcor.de> wrote:

(snip, I wrote)
>> I have done place and route on pipelined arrays with different numbers
>> of cells per chip, and found that speed goes fairly close to inversely
>> proportional to the number of cells, over a fairly wide range.

> Some pipeline control signals crossing the data-path and getting slower 
> with wider fanouts? 

It is a linear array of fairly simple cells. I believe it is
that the routes get longer and slower as things get more
tightly packed together.

-- glen

Article: 153408
Subject: Re: LUT6 FPGAs and Carry Logic
From: Jan Bruns <jansaccount@arcor.de>
Date: 17 Feb 2012 09:49:24 GMT
Links: << >> << T >> << A >>

glen herrmannsfeldt:
> Jan Bruns <jansaccount@arcor.de> wrote:

> (snip, I wrote)
>>> I have done place and route on pipelined arrays with different numbers
>>> of cells per chip, and found that speed goes fairly close to inversely
>>> proportional to the number of cells, over a fairly wide range.
> 
>> Some pipeline control signals crossing the data-path and getting slower
>> with wider fanouts?
> 
> It is a linear array of fairly simple cells. I believe it is that the
> routes get longer and slower as things get more tightly packed together.

Just some days ago, I had a similar problem.
There was a horizontal data flow, with the parallel data lines
vertically aligned.

The bottleneck was one CLB column using a couple of "control
signals"  sourced elsewhere.
The timing heavily scaled down with bus size, and timinganlysis
showed up a couple of ns of routing delay, just for the control.

Luckily, the critical CLB row had some unused regs, so
I used them to replicate the most critical controls.

At first, this didn't work out as expected. It even got worse
than without the replication. This was caused by the way
I've arranged the replicates, with more vertical direct lines 
than available. So the router came up with solutions like routing 
a critical, local CLB signal once around that CLB (a lot of hops 
through a handful of neighbor switch matrices).

A simple rearrangment of the replicate usage however fully
solved that further problem (by halving the direct neighbor route
consumption). 

Althugh there are now some more signals on the switches (remember 
the original signals still need to go to the replicate regs), now 
all the replicates have direct neighbor connects (or better) to 
the LUTs. So timing doesn't scale anymore with bus width.

Gruss

Jan Bruns

-- 
Ein paar Fotos: http://abnuto.de/gal/

Article: 153409
Subject: Re: problem with Global Clock pin and normal IO pin as Clock input
From: "nba83" <nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com>
Date: Fri, 17 Feb 2012 22:51:45 -0600
Links: << >> << T >> << A >>

>nba83 wrote:
>> hi
>> i am trying to detect falling edge of a 200ns pulse(WriteStrobe)
>> synchronously with this code. GlobalClk is 100MHz(10ns) oscillator clk
>> attached to global clk pin of Xilinx Spartan 3 XC3s400-5I. the problem i
am
>> facing is that about 1000 falling edges 100 of them are missed. i used
>> IBUFG at the input clk but the output is the same. but if I connect the
>> oscillator to a normal io pin with the constraint CLOCK_DEDICATED_ROUTE
=
>> FALSE; i can detect all the falling edges without error. i don't know
>> what's the problem. any help would be appreciated :)
>> 	 always @(posedge GlobalClk)
>> 	 begin
>> 		pre_WriteStrobe <= WriteStrobe;
>> 		if(  pre_WriteStrobe & ~WriteStrobe)
>> 		begin			
>> 			StartWritingMemory <=1;			
>> 			WriteNibble <=0;
>> 			Write_Address <= 4095;		
>> 		end 
>>         end
>> 
>> 	   
>> 					
>> ---------------------------------------		
>> Posted through http://www.FPGARelated.com
>
>You can't use the asynchronous input in your "if" clause.  To
>properly sense the falling edge of an asynchronous input you
>need two flops and then compare the outputs of those flops.
>If instead you compare the output of the first flop to the
>input signal (yes I know this would have less latency) you
>have the possibility of a vanishingly small time when the
>two signals are different.  Then you don't meet the setup
>time to the registers inside your if statement.
>
>Try this:
>
>reg [1:0] pre_WriteStrobe;
>
>  	 always @(posedge GlobalClk)
>  	 begin
>  		pre_WriteStrobe <= {pre_WriteStrobe[0],WriteStrobe};
>  		if(  pre_WriteStrobe[1] & ~pre_WriteStrobe[0])
>  		begin			
>  			StartWritingMemory <=1;			
>  			WriteNibble <=0;
>  			Write_Address <= 4095;		
>  		end
>          end
>
>
>-- Gabor
>
hay, it seemed that's the problem, because it was solved with this
solution:). but i have some question, this solution is equal to lowering
GlobalClk frequency, i decreased the globalclk frequency to 10MHz but the
previous code still have error detecting edges, but with the code you
suggest the code is working ok at globalclk 50Mhz, so what may be the
issue?
tnx in advanced for help	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 153410
Subject: Re: problem with Global Clock pin and normal IO pin as Clock input
From: KJ <kkjennings@sbcglobal.net>
Date: Sat, 18 Feb 2012 11:51:50 -0800 (PST)
Links: << >> << T >> << A >>

On Feb 17, 11:51=A0pm, "nba83"
<nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com> wrote:
> this solution is equal to lowering
> GlobalClk frequency,

No it is not.

As Gabor explained, the problem is that the input WriteStrobe is
asynchronous to your clock.  This means that it can change states
likely anywhere within the clock cycle.  Now ask yourself the
question, if the flip flop has a requirement that the input be stable
for some period of time prior to the clock in order to work correctly,
how are you going to meet that requirement for the flip flop that is
the receiver of logic derived from WriteStrobe?

Resynchronizing WriteStrobe to the clock before using it to control
any real logic means that the 'real logic' will have a delayed
WriteStrobe that only changes immediately after the clock and
therefore should meet the timing requirements.

Kevin Jennings

Article: 153411
Subject: Re: problem with Global Clock pin and normal IO pin as Clock input
From: Gabor <gabor@alacron.com>
Date: Sat, 18 Feb 2012 17:45:17 -0800 (PST)
Links: << >> << T >> << A >>

On Feb 18, 2:51=A0pm, KJ <kkjenni...@sbcglobal.net> wrote:
> On Feb 17, 11:51=A0pm, "nba83"
>
> <nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com> wrote:
> > this solution is equal to lowering
> > GlobalClk frequency,
>
> No it is not.
>
> As Gabor explained, the problem is that the input WriteStrobe is
> asynchronous to your clock. =A0This means that it can change states
> likely anywhere within the clock cycle. =A0Now ask yourself the
> question, if the flip flop has a requirement that the input be stable
> for some period of time prior to the clock in order to work correctly,
> how are you going to meet that requirement for the flip flop that is
> the receiver of logic derived from WriteStrobe?
>
> Resynchronizing WriteStrobe to the clock before using it to control
> any real logic means that the 'real logic' will have a delayed
> WriteStrobe that only changes immediately after the clock and
> therefore should meet the timing requirements.
>
> Kevin Jennings

To belabor the point, in your original post you said you missed about
10% of the incoming edges at 100 MHz.  Now imagine that your
flip-flops need 1ns setup time, and you use the original equation
"pre_WriteStrobe & ~WriteStrobe" where WriteStrobe can change
state anywhere within the clock cycle.  If it changes about 1ns
before the clock edge, then the pulse created by "pre_WriteStrobe &
~WriteStrobe"
will only be about 1ns long.  This would happen about 10% of the
time when the clock is 100 MHz.  It would still happen, but only
about 1% of the time at 10 MHz.  Regardless of the clock frequency
you would never get down to 0% missed edges.

When you use two flip-flops, the real trick is that the first flop
is the only one that receives an asynchronous input.  Thus
all flops down the road rely on it to make the decision as to
which clock cycle the input changed.  So when WriteStrobe
changes 1 ns from a clock edge, either the first flop will
change state at that edge or it will change state only at the
following clock edge, but in all cases its output will change
just after a clock edge and every flop that uses its output will
agree on which clock edge the change occurred.  Any
design where more than one flop is involved in deciding
the edge where an asynchronous change happens will
be subject to errors like you have observed.

-- Gabor

Article: 153412
Subject: Re: gigabit ethernet problem
From: "nba83" <nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com>
Date: Sat, 18 Feb 2012 23:38:06 -0600
Links: << >> << T >> << A >>

hi 
have your problem solved, cause i 'm facing the same problem with Realtek
RTL8201 chip at the receive section,i connected RXD to TXD and RXDV to TXE
to test the chip in loopback through fpga xilinx spartan 3 xc3s400, in this
test i got 4 out of 50 packets with fcs error at the pc. what's the issue?
tnx for any help
>Hi,
>I am using xilinx spartan3 xc3s4000 in my design. It is interfaced with 2
>national Gigabit PHYs. So i receive a packet from phy A and transmit it
to
>PHY B and vice versa. Now the problem i am facing is that one of the
bytes
>in the packet randomly gets corrupt after a while.. 
>
>First the packet drop was very frequent at high speeds, then i checked
the
>power requirements of my PHYs and got to know that my regulator couldn't
>source that much current. Then i changed the regulator and now the
problem
>occurs very rarely or it doesnt occur at all.
>
>I have some checks in the RTL to identify if the error is FCS or buffer
>overflow.So every time the packet drops, my fcs flag is raised. So i
viewed
>the incoming packet and saw that it always had some random corrupt byte.
>Like i was sending packets with known pattern, so after a while some
random
>byte is getting corrupt. I don't know what to look for from now onwards. 
>I thought maybe it was the heat issue so used heat gun but nah it wasn't
>the heat problem.
>My ground noise is 80mv peak-to-peak.
>
>Need some pointers..
>
>Regards
>	   
>					
>---------------------------------------		
>Posted through http://www.FPGARelated.com
>	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 153413
Subject: Re: gigabit ethernet problem
From: MK <mk@nospam.co.uk>
Date: Mon, 20 Feb 2012 08:25:33 +0000
Links: << >> << T >> << A >>

On 22/09/2011 20:21, salimbaba wrote:
> Hi,
> I am using xilinx spartan3 xc3s4000 in my design. It is interfaced with 2
> national Gigabit PHYs. So i receive a packet from phy A and transmit it to
> PHY B and vice versa. Now the problem i am facing is that one of the bytes
> in the packet randomly gets corrupt after a while..
>
> First the packet drop was very frequent at high speeds, then i checked the
> power requirements of my PHYs and got to know that my regulator couldn't
> source that much current. Then i changed the regulator and now the problem
> occurs very rarely or it doesnt occur at all.
>
> I have some checks in the RTL to identify if the error is FCS or buffer
> overflow.So every time the packet drops, my fcs flag is raised. So i viewed
> the incoming packet and saw that it always had some random corrupt byte.
> Like i was sending packets with known pattern, so after a while some random
> byte is getting corrupt. I don't know what to look for from now onwards.
> I thought maybe it was the heat issue so used heat gun but nah it wasn't
> the heat problem.
> My ground noise is 80mv peak-to-peak.
>
> Need some pointers..
>
> Regards
> 	
> 					
> ---------------------------------------		
> Posted through http://www.FPGARelated.com
Can you please describe the hardware setup in more detail - is it your 
own board or a known good board. Has this hardware setup ever worked (ie 
been error free ?).
Is there any pattern in the 'random' corruption (eg is it always bit 0 
or a 1 seen as 0 (or a 0 seen as 1) etc etc. Is it always the nth byte 
in a packet etc.
Michael Kellett

Article: 153414
Subject: Re: problem with Global Clock pin and normal IO pin as Clock input
From: "Morten Leikvoll" <mleikvol@yahoo.nospam>
Date: Mon, 20 Feb 2012 15:16:58 +0100
Links: << >> << T >> << A >>

"nba83" <nba_baheri@n_o_s_p_a_m.yahoo.com> wrote in message 
news:RaSdnQf1bZqJJ6bSnZ2dnUVZ_uudnZ2d@giganews.com...
> hi
> i am trying to detect falling edge of a 200ns pulse(WriteStrobe)
> synchronously with this code. GlobalClk is 100MHz(10ns) oscillator clk
> attached to global clk pin of Xilinx Spartan 3 XC3s400-5I. the problem i 
> am
> facing is that about 1000 falling edges 100 of them are missed. i used
> IBUFG at the input clk but the output is the same. but if I connect the
> oscillator to a normal io pin with the constraint CLOCK_DEDICATED_ROUTE =
> FALSE; i can detect all the falling edges without error. i don't know
> what's the problem. any help would be appreciated :)
> always @(posedge GlobalClk)
> begin
> pre_WriteStrobe <= WriteStrobe;
> if(  pre_WriteStrobe & ~WriteStrobe)
> begin
> StartWritingMemory <=1;
> WriteNibble <=0;
> Write_Address <= 4095;
> end
>        end
>
>
>
> --------------------------------------- 

In theory this will never work, but statistically you will get an acceptable 
result by reclocking you signal.
Make sure you read and understand meta stability. The thing is that if your 
first FF tries to grab a signal that may change at an invalid phase (setup 
and hold are bad for the FF), the output of this FF can get to a state where 
the output voltage level is somewhere between 0 and 1. So even the next FF 
may get problems seeing if this is a 0 or 1. In worst case it can oscillate. 
The chance of this happening is rediuced for every FF you pass, so after a 
2-3 FF's you should most likely have an acceptable error rate.

Article: 153415
Subject: Re: LUT6 FPGAs and Carry Logic
From: rickman <gnuarm@gmail.com>
Date: Mon, 20 Feb 2012 16:12:27 -0800 (PST)
Links: << >> << T >> << A >>

On Feb 16, 4:06 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> rickman <gnu...@gmail.com> wrote:
>
> (snip, I wrote)
>
> >> For smaller ones, it likely doesn't make so much difference.
> >> There is some advantage as far as synthesis software of
> >> keeping a minimum number of different architectures.
>
> (snip)
>
> > I believe that is what it comes down to.  Given the fact that routing
> > is a huge percentage of the chip area (and so cost) this becomes a
> > more important factor as the chips get larger.  After all, routing
> > does go up at a faster rate than linear.  So minimizing routing is
> > more important in larger chips.  The tradeoff provides for lower
> > costs with LUT6 in larger devices.
> > The other side of the coin is more "wasted" logic when larger LUTs are
> > underutilized.  So it would seem that we have reached the point where
> > the LUT6 is optimal for many if not the vast majority of designs.
>
> One that I am interested in, though, is that 6LUT should be
> much better for building the MUX needed for barrel shifters.
> A 4LUT makes a two input MUX, but 6LUT can make a 4 input
> (and two select line) MUX. Other than that, I haven't though
> much about how useful differnet sizes are. The less logic
> between FF's, the less advantage to larger ones.

Yes, the 4LUT can be finagled by using the fourth input as an enable
which is in essence the AND gate of the next mux stage, then you can
use all four inputs of a LUT as the OR gate to combine 8 inputs in two
levels.  So the 4LUT is more like 1.5 2 input muxes.


> > I don't know that there is a performance penalty in using LUT6.  I
> > would expect that is minimal since the muxes in the LUTs are done with
> > transmission gates with very little delay, but I don't really know.
> > If so, the only issue then becomes cost.  So if you design is one of
> > the minority designs that can indeed be done more efficiently in a
> > LUT4 architecture, then you will pay a bit more for a LUT6 based
> > part... but given the advantages of smaller feature size you will
> > likely get lower costs with the newer parts than sticking with an old
> > generation.
>
> Well, they have to be designed not to glitch when switching
> between entries with the same output value. That doesn't
> naturally happen with an SRAM. Also, with transmission gates
> you can't go through too many without a buffer, but presumably
> that is part of optimizing the cell.
>
> -- glen

The glitching is from logic race conditions.  Using transmission gates
pretty much eliminates that as long as you use break before make
connections.  Then the capacitance of the line retains the last value
until the new value comes up.

Rick

Article: 153416
Subject: Re: gigabit ethernet problem
From: "nba83" <nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com>
Date: Mon, 20 Feb 2012 23:04:58 -0600
Links: << >> << T >> << A >>

i designed and rout the pcb board,i have one RTL8201BL as lan phy layer and
Xilinx Spartan 3 XC3s400 as controller. i transmit raw packets. i don't
have any error in sending packets, i tested transmit section at 95%
bandwidth without a packet loss, but at receive there are random error
receiving packets. the bytes that are corrupt are also random and does not
have a pattern. only when the packet length become large(about 1400 Bytes),
the error occur more frequent about (3-4 out of 20 packets),
i don't know how i can debug the problem, 
tnx in advanced for help :)

	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 153417
Subject: Re: gigabit ethernet problem
From: MK <mk@nospam.co.uk>
Date: Tue, 21 Feb 2012 08:08:58 +0000
Links: << >> << T >> << A >>

On 21/02/2012 05:04, nba83 wrote:
> i designed and rout the pcb board,i have one RTL8201BL as lan phy layer and
> Xilinx Spartan 3 XC3s400 as controller. i transmit raw packets. i don't
> have any error in sending packets, i tested transmit section at 95%
> bandwidth without a packet loss, but at receive there are random error
> receiving packets. the bytes that are corrupt are also random and does not
> have a pattern. only when the packet length become large(about 1400 Bytes),
> the error occur more frequent about (3-4 out of 20 packets),
> i don't know how i can debug the problem,
> tnx in advanced for help :)
>
> 	
> 					
> ---------------------------------------		
> Posted through http://www.FPGARelated.com
I can only make the most general suggestions. How are you generating the 
test packets - are you sure they are good. Can you check the signal 
integrity on either side of the PHY.
Are the errors similar to those you got when the power supply was poor.
Is the position of the errored byte at the start or end of the packet or 
just anywhere. Do you ever get more than one error per packet. Does it 
get worse or better with any boundary cases: ie does it never fail with 
a minimum length packet or fail lots more often with maximum length.
Is there any sensitivity to packet contents.
Is there any sensitivity to rate of packets.

MK

Article: 153418
Subject: Re: gigabit ethernet problem
From: johnp <jprovidenza@yahoo.com>
Date: Tue, 21 Feb 2012 07:50:32 -0800 (PST)
Links: << >> << T >> << A >>

On Feb 20, 9:04=A0pm, "nba83"
<nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com> wrote:
> i designed and rout the pcb board,i have one RTL8201BL as lan phy layer a=
nd
> Xilinx Spartan 3 XC3s400 as controller. i transmit raw packets. i don't
> have any error in sending packets, i tested transmit section at 95%
> bandwidth without a packet loss, but at receive there are random error
> receiving packets. the bytes that are corrupt are also random and does no=
t
> have a pattern. only when the packet length become large(about 1400 Bytes=
),
> the error occur more frequent about (3-4 out of 20 packets),
> i don't know how i can debug the problem,
> tnx in advanced for help :)
>
> ---------------------------------------
> Posted throughhttp://www.FPGARelated.com

Have you checked all your timing?  Making setup/hold times for the Rx
side of GigE can be tough with a Spartan.  Been there, done that, you
need to be very careful.

John P

Article: 153419
Subject: Using both Verilog and VHDL for Xilinx simulation
From: Michael <michael_laajanen@yahoo.com>
Date: Tue, 21 Feb 2012 22:43:57 +0100
Links: << >> << T >> << A >>

Hi,

How do I setup synopsys_sim.setup for simulating both Verilog and VHDL 
using VCS for a Xilinx FPGA?

I need for instance have SIMPRIM point to both the VHDL and the Verilog 
compiled library path, I did try using a : and simply append them but it 
failed.

/michael

Article: 153420
Subject: Re: gigabit ethernet problem
From: "nba83" <nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com>
Date: Tue, 21 Feb 2012 21:57:47 -0600
Links: << >> << T >> << A >>

>On Feb 20, 9:04=A0pm, "nba83"
><nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com> wrote:
>> i designed and rout the pcb board,i have one RTL8201BL as lan phy layer
a=
>nd
>> Xilinx Spartan 3 XC3s400 as controller. i transmit raw packets. i don't
>> have any error in sending packets, i tested transmit section at 95%
>> bandwidth without a packet loss, but at receive there are random error
>> receiving packets. the bytes that are corrupt are also random and does
no=
>t
>> have a pattern. only when the packet length become large(about 1400
Bytes=
>),
>> the error occur more frequent about (3-4 out of 20 packets),
>> i don't know how i can debug the problem,
>> tnx in advanced for help :)
>>
>> ---------------------------------------
>> Posted throughhttp://www.FPGARelated.com
>
>Have you checked all your timing?  Making setup/hold times for the Rx
>side of GigE can be tough with a Spartan.  Been there, done that, you
>need to be very careful.
>
>John P
>

no i don't know how to do that? i don't know what are setup/hold times for
Rx, and my Phy layer is 100MHz not Gig, I used RTL8201BL  and i wrote a
simple loopback program in which i connected RXDV to TXE and RXD to TXD at
the corresponding TXCLK and RXCLK, do i need to do some timing for this
simple program?? 
here is my code:
	
	always @ (posedge RTL_RXCLK)
	begin
		data	<=	RTL_RXD;
		en		<=	RTL_RXDV;
	end

	always @ (posedge RTL_TXCLK)
	begin
		RTL_TXD_I	<=	data;
		RTL_TXE_I	<=	en;
	end
	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 153421
Subject: Re: gigabit ethernet problem
From: "Morten Leikvoll" <mleikvol@yahoo.nospam>
Date: Wed, 22 Feb 2012 08:44:38 +0100
Links: << >> << T >> << A >>

"nba83" <nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com> wrote in message 
news:H5qdnQ4pDqAm-tnSnZ2dnUVZ_q-dnZ2d@giganews.com...
> here is my code:
>
> always @ (posedge RTL_RXCLK)
> begin
> data <= RTL_RXD;
> en <= RTL_RXDV;
> end
>
> always @ (posedge RTL_TXCLK)
> begin
> RTL_TXD_I <= data;
> RTL_TXE_I <= en;
> end

Huh? I havent been reading this thread in details, but how are these clocks 
syncronized? Maybe they arent, hence your problem.

Article: 153422
Subject: Re: gigabit ethernet problem
From: "nba83" <nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com>
Date: Wed, 22 Feb 2012 03:35:24 -0600
Links: << >> << T >> << A >>

>"nba83" <nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com> wrote in message 
>news:H5qdnQ4pDqAm-tnSnZ2dnUVZ_q-dnZ2d@giganews.com...
>> here is my code:
>>
>> always @ (posedge RTL_RXCLK)
>> begin
>> data <= RTL_RXD;
>> en <= RTL_RXDV;
>> end
>>
>> always @ (posedge RTL_TXCLK)
>> begin
>> RTL_TXD_I <= data;
>> RTL_TXE_I <= en;
>> end
>
>Huh? I havent been reading this thread in details, but how are these
clocks 
>syncronized? Maybe they arent, hence your problem.
>
>
>
>
i write it in two processes to ensure if the two clk are not syncronized
the data won't read and transmited bad.	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 153423
Subject: Re: gigabit ethernet problem
From: "Morten Leikvoll" <mleikvol@yahoo.nospam>
Date: Wed, 22 Feb 2012 10:45:33 +0100
Links: << >> << T >> << A >>

"nba83" <nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com> wrote in message 
news:dtKdneQMFslBK9nSnZ2dnUVZ_hudnZ2d@giganews.com...
> >"nba83" <nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com> wrote in message
>>news:H5qdnQ4pDqAm-tnSnZ2dnUVZ_q-dnZ2d@giganews.com...
>>> here is my code:
>>>
>>> always @ (posedge RTL_RXCLK)
>>> begin
>>> data <= RTL_RXD;
>>> en <= RTL_RXDV;
>>> end
>>>
>>> always @ (posedge RTL_TXCLK)
>>> begin
>>> RTL_TXD_I <= data;
>>> RTL_TXE_I <= en;
>>> end
>>
>>Huh? I havent been reading this thread in details, but how are these
> clocks
>>syncronized? Maybe they arent, hence your problem.
>>
>>
>>
>>
> i write it in two processes to ensure if the two clk are not syncronized
> the data won't read and transmited bad.

So you are trying to send some data grabbed in one clock domain into a 
different? You know that will fail?
If the clocks are almost similar, it will work for periods when the clock 
phases are close.
If txclk is derived from rxclk, you may get it to work, but first make a 
common domain.

Article: 153424
Subject: Re: gigabit ethernet problem
From: "nba83" <nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com>
Date: Wed, 22 Feb 2012 05:00:20 -0600
Links: << >> << T >> << A >>

>"nba83" <nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com> wrote in message 
>news:dtKdneQMFslBK9nSnZ2dnUVZ_hudnZ2d@giganews.com...
>> >"nba83" <nba_baheri@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.com> wrote in
message
>>>news:H5qdnQ4pDqAm-tnSnZ2dnUVZ_q-dnZ2d@giganews.com...
>>>> here is my code:
>>>>
>>>> always @ (posedge RTL_RXCLK)
>>>> begin
>>>> data <= RTL_RXD;
>>>> en <= RTL_RXDV;
>>>> end
>>>>
>>>> always @ (posedge RTL_TXCLK)
>>>> begin
>>>> RTL_TXD_I <= data;
>>>> RTL_TXE_I <= en;
>>>> end
>>>
>>>Huh? I havent been reading this thread in details, but how are these
>> clocks
>>>syncronized? Maybe they arent, hence your problem.
>>>
>>>
>>>
>>>
>> i write it in two processes to ensure if the two clk are not
syncronized
>> the data won't read and transmited bad.
>
>So you are trying to send some data grabbed in one clock domain into a 
>different? You know that will fail?
>If the clocks are almost similar, it will work for periods when the clock

>phases are close.
>If txclk is derived from rxclk, you may get it to work, but first make a 
>common domain.
>
>
>
>
tnx for your comment, so what should i do? i have tested the following code
too, but still i have error in receiving packets.
assign RTL_TXE=RTL_RXDV;
assign RTL_TXD=RTL_RXD;

tnx in advanced for help :)
Neda Baheri	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search