Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search

Messages from 160950

Article: 160950
Subject: Re: Estimating ROM gate count in ASIC
From: gnuarm.deletethisbit@gmail.com
Date: Sun, 30 Dec 2018 13:37:42 -0800 (PST)
Links: << >>  << T >>  << A >>
On Sunday, December 30, 2018 at 2:52:43 PM UTC-5, Kevin Neilson wrote:
> On Saturday, December 29, 2018 at 9:53:20 AM UTC-7, gnuarm.del...@gmail.c=
om wrote:
> > On Friday, December 28, 2018 at 2:49:37 PM UTC-5, Kevin Neilson wrote:
> > > On Saturday, December 22, 2018 at 12:14:43 PM UTC-7, gnuarm.del...@gm=
ail.com wrote:
> > > > On Thursday, December 6, 2018 at 6:02:25 PM UTC-5, Kevin Neilson wr=
ote:
> > > > > I've searched for this but to no avail.  I'd like a function f(D,=
W), where D=3Ddepth and W=3Dwidth, which provides an estimate of the gate c=
ount of a lookup ROM implemented in ASIC gates.
> > > > >=20
> > > > > Yes, I know it's dependent on the contents.  However, if half the=
 bits are ones and the contents are randomly distributed, a formula should =
be pretty accurate.
> > > > >=20
> > > > > It's easy for me to figure out an upper limit.  A basic ROM is an=
 AND-OR array.  The D address decoders (comprising ANDs/NOTs) can be shared=
 amongst the W columns.  Each of the W columns would require D/2-1 OR gates=
 if half the ROM bits in each column are 1.
> > > > >=20
> > > > > What I don't know is how many gates can be eliminated by sharing =
terms.  As W increases, term sharing should go up.  Again, I'm looking for =
a *formula*.
> > > >=20
> > > > That would be pretty easy.  Consider the costs of a D wide multiple=
xer with 1 or 0 on each input.  That would be an upper bound in any case. =
=20
> > > >=20
> > > > I believe my text book of many years ago used one of the input vari=
ables in either true or inverted form combined with 1s and 0s as choices fo=
r inputs which simplified the mux by one address input. =20
> > > >=20
> > > >   Rick C.=20
> > > >=20
> > > >   Tesla referral code - https://ts.la/richard11209
> > > >   Get 6 months of free supercharging
> > >=20
> > > Thanks, but I am looking for an accurate estimate.
> >=20
> > I'm confused.  Do you want an accurate measurement or an estimate? =20
> >=20
> >   Rick C.
> >=20
> >   - Get 6 months of free supercharging=20
> >   - Tesla referral code - https://ts.la/richard11209
>=20
> Both.  An estimate will be very close for large D,W.  If I roll a die 1e6=
 times, estimating there will be 5e5 heads is pretty accurate.  Estimating =
there will be less than or equal to the upper bound of 1e6 heads is correct=
 but not helpful.

Good thing you aren't rolling a die.=20

How much do you think this upper bound will vary from your estimate?  Have =
you tried any tests?  What is the result of your "estimate" under-estimatin=
g?=20

I think if I were doing this I'd try to get a handle on the expected result=
s and how much they might vary before I start looking for an equation to "e=
stimate" the number.  The one thing you didn't do with the die example is t=
o figure out how much you need to adjust your estimate to get a bound that =
will include 99.9xxx% of your cases or whatever value you need.  Do you kno=
w that equation?=20


  Rick C.

  + Get 6 months of free supercharging
  + Tesla referral code - https://ts.la/richard11209

Article: 160951
Subject: Re: Estimating ROM gate count in ASIC
From: Kevin Neilson <kevin.neilson@xilinx.com>
Date: Tue, 1 Jan 2019 15:55:53 -0800 (PST)
Links: << >>  << T >>  << A >>
On Sunday, December 30, 2018 at 2:37:46 PM UTC-7, gnuarm.del...@gmail.com w=
rote:
> On Sunday, December 30, 2018 at 2:52:43 PM UTC-5, Kevin Neilson wrote:
> > On Saturday, December 29, 2018 at 9:53:20 AM UTC-7, gnuarm.del...@gmail=
.com wrote:
> > > On Friday, December 28, 2018 at 2:49:37 PM UTC-5, Kevin Neilson wrote=
:
> > > > On Saturday, December 22, 2018 at 12:14:43 PM UTC-7, gnuarm.del...@=
gmail.com wrote:
> > > > > On Thursday, December 6, 2018 at 6:02:25 PM UTC-5, Kevin Neilson =
wrote:
> > > > > > I've searched for this but to no avail.  I'd like a function f(=
D,W), where D=3Ddepth and W=3Dwidth, which provides an estimate of the gate=
 count of a lookup ROM implemented in ASIC gates.
> > > > > >=20
> > > > > > Yes, I know it's dependent on the contents.  However, if half t=
he bits are ones and the contents are randomly distributed, a formula shoul=
d be pretty accurate.
> > > > > >=20
> > > > > > It's easy for me to figure out an upper limit.  A basic ROM is =
an AND-OR array.  The D address decoders (comprising ANDs/NOTs) can be shar=
ed amongst the W columns.  Each of the W columns would require D/2-1 OR gat=
es if half the ROM bits in each column are 1.
> > > > > >=20
> > > > > > What I don't know is how many gates can be eliminated by sharin=
g terms.  As W increases, term sharing should go up.  Again, I'm looking fo=
r a *formula*.
> > > > >=20
> > > > > That would be pretty easy.  Consider the costs of a D wide multip=
lexer with 1 or 0 on each input.  That would be an upper bound in any case.=
 =20
> > > > >=20
> > > > > I believe my text book of many years ago used one of the input va=
riables in either true or inverted form combined with 1s and 0s as choices =
for inputs which simplified the mux by one address input. =20
> > > > >=20
> > > > >   Rick C.=20
> > > > >=20
> > > > >   Tesla referral code - https://ts.la/richard11209
> > > > >   Get 6 months of free supercharging
> > > >=20
> > > > Thanks, but I am looking for an accurate estimate.
> > >=20
> > > I'm confused.  Do you want an accurate measurement or an estimate? =
=20
> > >=20
> > >   Rick C.
> > >=20
> > >   - Get 6 months of free supercharging=20
> > >   - Tesla referral code - https://ts.la/richard11209
> >=20
> > Both.  An estimate will be very close for large D,W.  If I roll a die 1=
e6 times, estimating there will be 5e5 heads is pretty accurate.  Estimatin=
g there will be less than or equal to the upper bound of 1e6 heads is corre=
ct but not helpful.
>=20
> Good thing you aren't rolling a die.=20
>=20
> How much do you think this upper bound will vary from your estimate?  Hav=
e you tried any tests?  What is the result of your "estimate" under-estimat=
ing?=20
>=20
> I think if I were doing this I'd try to get a handle on the expected resu=
lts and how much they might vary before I start looking for an equation to =
"estimate" the number.  The one thing you didn't do with the die example is=
 to figure out how much you need to adjust your estimate to get a bound tha=
t will include 99.9xxx% of your cases or whatever value you need.  Do you k=
now that equation?=20
>=20
>=20
>   Rick C.
>=20
>   + Get 6 months of free supercharging
>   + Tesla referral code - https://ts.la/richard11209

Sorry, my response could've been more nicely expressed.  I originally thoug=
h that ASIC guys must have some formula for this but perhaps not--maybe the=
y just run it through the synthesizer and check.  I'm writing ASIC code but=
 don't have direct access to the synthesizer.  If I did I could maybe plot =
a few points and fit a curve to it.  I only have access to FPGA synthesizer=
s and of course they just implement ROMs in LUTs and the LUT count is propo=
rtional to the number of bits in the ROM and there is no logic sharing as t=
he ROM size increases.

I did think about the die/coin example and how to find how good the estimat=
e is.  For example, say you flip a coin 1000 times.  My expected number of =
heads is is 500.  Say you want to know how many runs will have between 450 =
and 550 heads.  You can use the Poisson CDF.  In Matlab/Octave:

  octave:32> n_flips=3D1000; exp_heads =3D n_flips*0.5; poisscdf(550,exp_he=
ads)-poisscdf(450,exp_heads)
  ans =3D  0.97469

So I'll be in that range 97.5% of the time. 

Article: 160952
Subject: Re: Estimating ROM gate count in ASIC
From: gnuarm.deletethisbit@gmail.com
Date: Tue, 1 Jan 2019 18:48:40 -0800 (PST)
Links: << >>  << T >>  << A >>
On Tuesday, January 1, 2019 at 6:55:57 PM UTC-5, Kevin Neilson wrote:
> On Sunday, December 30, 2018 at 2:37:46 PM UTC-7, gnuarm.del...@gmail.com=
 wrote:
> > On Sunday, December 30, 2018 at 2:52:43 PM UTC-5, Kevin Neilson wrote:
> > > On Saturday, December 29, 2018 at 9:53:20 AM UTC-7, gnuarm.del...@gma=
il.com wrote:
> > > > On Friday, December 28, 2018 at 2:49:37 PM UTC-5, Kevin Neilson wro=
te:
> > > > > On Saturday, December 22, 2018 at 12:14:43 PM UTC-7, gnuarm.del..=
.@gmail.com wrote:
> > > > > > On Thursday, December 6, 2018 at 6:02:25 PM UTC-5, Kevin Neilso=
n wrote:
> > > > > > > I've searched for this but to no avail.  I'd like a function =
f(D,W), where D=3Ddepth and W=3Dwidth, which provides an estimate of the ga=
te count of a lookup ROM implemented in ASIC gates.
> > > > > > >=20
> > > > > > > Yes, I know it's dependent on the contents.  However, if half=
 the bits are ones and the contents are randomly distributed, a formula sho=
uld be pretty accurate.
> > > > > > >=20
> > > > > > > It's easy for me to figure out an upper limit.  A basic ROM i=
s an AND-OR array.  The D address decoders (comprising ANDs/NOTs) can be sh=
ared amongst the W columns.  Each of the W columns would require D/2-1 OR g=
ates if half the ROM bits in each column are 1.
> > > > > > >=20
> > > > > > > What I don't know is how many gates can be eliminated by shar=
ing terms.  As W increases, term sharing should go up.  Again, I'm looking =
for a *formula*.
> > > > > >=20
> > > > > > That would be pretty easy.  Consider the costs of a D wide mult=
iplexer with 1 or 0 on each input.  That would be an upper bound in any cas=
e. =20
> > > > > >=20
> > > > > > I believe my text book of many years ago used one of the input =
variables in either true or inverted form combined with 1s and 0s as choice=
s for inputs which simplified the mux by one address input. =20
> > > > > >=20
> > > > > >   Rick C.=20
> > > > > >=20
> > > > > >   Tesla referral code - https://ts.la/richard11209
> > > > > >   Get 6 months of free supercharging
> > > > >=20
> > > > > Thanks, but I am looking for an accurate estimate.
> > > >=20
> > > > I'm confused.  Do you want an accurate measurement or an estimate? =
=20
> > > >=20
> > > >   Rick C.
> > > >=20
> > > >   - Get 6 months of free supercharging=20
> > > >   - Tesla referral code - https://ts.la/richard11209
> > >=20
> > > Both.  An estimate will be very close for large D,W.  If I roll a die=
 1e6 times, estimating there will be 5e5 heads is pretty accurate.  Estimat=
ing there will be less than or equal to the upper bound of 1e6 heads is cor=
rect but not helpful.
> >=20
> > Good thing you aren't rolling a die.=20
> >=20
> > How much do you think this upper bound will vary from your estimate?  H=
ave you tried any tests?  What is the result of your "estimate" under-estim=
ating?=20
> >=20
> > I think if I were doing this I'd try to get a handle on the expected re=
sults and how much they might vary before I start looking for an equation t=
o "estimate" the number.  The one thing you didn't do with the die example =
is to figure out how much you need to adjust your estimate to get a bound t=
hat will include 99.9xxx% of your cases or whatever value you need.  Do you=
 know that equation?=20
> >=20
> >=20
> >   Rick C.
> >=20
> >   + Get 6 months of free supercharging
> >   + Tesla referral code - https://ts.la/richard11209
>=20
> Sorry, my response could've been more nicely expressed.  I originally tho=
ugh that ASIC guys must have some formula for this but perhaps not--maybe t=
hey just run it through the synthesizer and check.  I'm writing ASIC code b=
ut don't have direct access to the synthesizer.  If I did I could maybe plo=
t a few points and fit a curve to it.  I only have access to FPGA synthesiz=
ers and of course they just implement ROMs in LUTs and the LUT count is pro=
portional to the number of bits in the ROM and there is no logic sharing as=
 the ROM size increases.
>=20
> I did think about the die/coin example and how to find how good the estim=
ate is.  For example, say you flip a coin 1000 times.  My expected number o=
f heads is is 500.  Say you want to know how many runs will have between 45=
0 and 550 heads.  You can use the Poisson CDF.  In Matlab/Octave:
>=20
>   octave:32> n_flips=3D1000; exp_heads =3D n_flips*0.5; poisscdf(550,exp_=
heads)-poisscdf(450,exp_heads)
>   ans =3D  0.97469
>=20
> So I'll be in that range 97.5% of the time.

My point is that unless you can come up with a number like this for your es=
timate, how much good will it do you?  Even if you are right 99% of the tim=
e, when designing a chip is that of much value? =20

I would think knowing the upper bound is a very useful thing indeed when pl=
anning an ASIC.  But I don't have any more info on calculating the estimate=
 you say you need, so I can't help you.=20

  Rick C.

  -- Get 6 months of free supercharging
  -- Tesla referral code - https://ts.la/richard11209

Article: 160953
Subject: Can I use Verilog or SystemVerilog to write a state machine with
From: Weng Tianxiang <wtxwtx@gmail.com>
Date: Fri, 4 Jan 2019 20:29:58 -0800 (PST)
Links: << >>  << T >>  << A >>
Hi,

Can I use Verilog or SystemVerilog to write a state machine with clock gating function?

I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.

Thank you.

Weng

Article: 160954
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: Nicolas Matringe <nicolas.matringe@fre.fre>
Date: Sat, 5 Jan 2019 12:44:37 +0100
Links: << >>  << T >>  << A >>
On 05/01/2019 05:29, Weng Tianxiang wrote:
> Hi,
> 
> Can I use Verilog or SystemVerilog to write a state machine with clock gating function?
> 
> I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.

Clock gating can be written in any language you like. It's FPGAs that 
don't support clock gating.

Nicolas

Article: 160955
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: Weng Tianxiang <wtxwtx@gmail.com>
Date: Sat, 5 Jan 2019 06:18:45 -0800 (PST)
Links: << >>  << T >>  << A >>
On Saturday, January 5, 2019 at 3:44:39 AM UTC-8, Nicolas Matringe wrote:
> On 05/01/2019 05:29, Weng Tianxiang wrote:
> > Hi,
> >=20
> > Can I use Verilog or SystemVerilog to write a state machine with clock =
gating function?
> >=20
> > I know VHDL has no such function and want to know if Verilog or SystemV=
erilog has the clock gating function for a state machine.
>=20
> Clock gating can be written in any language you like. It's FPGAs that=20
> don't support clock gating.
>=20
> Nicolas

Hi Nicolas,

I am asking if Verilog or SystemVerilog has the ability to automatically ge=
nerate a state machine with clock gating function without any extra new sta=
tements? For example, do they have an attribute if the attribute being set =
the state machine generated will have the clock gating function?

At least VHDL-2008 does not have the ability.

Thank you.

Weng

Article: 160956
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: Nicolas Matringe <nicolas.matringe@fre.fre>
Date: Sat, 5 Jan 2019 19:18:13 +0100
Links: << >>  << T >>  << A >>
On 05/01/2019 15:18, Weng Tianxiang wrote:

> Hi Nicolas,
> 
> I am asking if Verilog or SystemVerilog has the ability to automatically generate a state machine with clock gating function without any extra new statements? For example, do they have an attribute if the attribute being set the state machine generated will have the clock gating function?
> 

Well then I don't know what that "clock gating function" is, I'm sorry.

Nicolas

Article: 160957
Subject: Can I use Verilog or SystemVerilog to write a state machine with
From: KJ <kkjennings@sbcglobal.net>
Date: Sat, 5 Jan 2019 10:47:36 -0800 (PST)
Links: << >>  << T >>  << A >>
Apparently you cannot, but yes it can be done by others. It can also be written in VHDL but apparently you don't like how to do that so you state that it can't be done.  Perhaps you should more clearly state your problem. 

Kevin

Article: 160958
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with clock gating function?
From: Theo <theom+news@chiark.greenend.org.uk>
Date: 05 Jan 2019 21:29:32 +0000 (GMT)
Links: << >>  << T >>  << A >>
Weng Tianxiang <wtxwtx@gmail.com> wrote:

> I am asking if Verilog or SystemVerilog has the ability to automatically
> generate a state machine with clock gating function without any extra new
> statements?

What do you mean 'extra new statements'?  This looks to me like clock
gating:


input clk;
input enable;
wire gated;

assign gated = clk & enable;

always @(posedge gated) begin
...
end


>  For example, do they have an attribute if the attribute being
> set the state machine generated will have the clock gating function?

I don't know what you mean by that.  (System)Verilog's abstraction doesn't
generate abstract state machines, it just allows you to write them. 
Whatever synthesis tools do with that code is up to them.  I presume tools
could pick up the above style if they so desire (I don't know if any ASIC
tools do but expect they would).

Theo

Article: 160959
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: Richard Damon <Richard@Damon-Family.org>
Date: Sat, 5 Jan 2019 17:35:25 -0500
Links: << >>  << T >>  << A >>
On 1/4/19 11:29 PM, Weng Tianxiang wrote:
> Hi,
> 
> Can I use Verilog or SystemVerilog to write a state machine with clock gating function?
> 
> I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.
> 
> Thank you.
> 
> Weng
> 

One big question is what do you mean by 'clock gating'

As was mentioned, one option for this is to do something like

assign gatedclk = clk & gate;

or sometimes

assign gatedclk = clk | gate;

and then us the gatedclk as the clock. The big issue with this is that
you need to worry about clock skew when you do this, as well as glitches
(the second version works better for gate changing on the rising edge of
clk, but needs to be stable before the falling edge.)

A second thing called 'clock gating' is to condition the transition on
the gate signal, something like

always @(posedge clk) begin
  if(gate) begin
... state machine here.
  end
end

This make the machine run on the original clock, but it will only change
on the cycles where the gate signal is true.

VHDL can do the same.

There is no need for a 'special statement', you just do it.  If doing
the first version, of actually gating the clock, you may want to use
some implementation defined macro function to buffer the clock and put
it into a low skew distribution network, like may have been done for the
original clock.

Article: 160960
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: Weng Tianxiang <wtxwtx@gmail.com>
Date: Sat, 5 Jan 2019 17:23:39 -0800 (PST)
Links: << >>  << T >>  << A >>
On Saturday, January 5, 2019 at 2:35:28 PM UTC-8, Richard Damon wrote:
> On 1/4/19 11:29 PM, Weng Tianxiang wrote:
> > Hi,
> > 
> > Can I use Verilog or SystemVerilog to write a state machine with clock gating function?
> > 
> > I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.
> > 
> > Thank you.
> > 
> > Weng
> > 
> 
> One big question is what do you mean by 'clock gating'
> 
> As was mentioned, one option for this is to do something like
> 
> assign gatedclk = clk & gate;
> 
> or sometimes
> 
> assign gatedclk = clk | gate;
> 
> and then us the gatedclk as the clock. The big issue with this is that
> you need to worry about clock skew when you do this, as well as glitches
> (the second version works better for gate changing on the rising edge of
> clk, but needs to be stable before the falling edge.)
> 
> A second thing called 'clock gating' is to condition the transition on
> the gate signal, something like
> 
> always @(posedge clk) begin
>   if(gate) begin
> ... state machine here.
>   end
> end
> 
> This make the machine run on the original clock, but it will only change
> on the cycles where the gate signal is true.
> 
> VHDL can do the same.
> 
> There is no need for a 'special statement', you just do it.  If doing
> the first version, of actually gating the clock, you may want to use
> some implementation defined macro function to buffer the clock and put
> it into a low skew distribution network, like may have been done for the
> original clock.

Hi Theo and Richard,

Thank you for your help.

Using clock gating function is to save power consumption. Why I ask the question is:

A cache line in Cache I, Cache II or even Cache III in a CPU usually has 64 (2**6) bytes and each cache line must have a state machine to keep data coherence among data over all situations. 

For a 6M (2**22 + 2**21) bytes cache II (the most I have seen in current market) a CPU must have at least (2**16 + 2**15) state machines, ~= 100,000, and those ~100,000 state machines don't change states most of time.

In above situation each of the ~100,000 state machines with each having more than 10 states must have a clock gating function to save power consumption: 

when it will not change states on the next cycle, a clock pulse should not be generated to keep the state unchanged and save power consumption.

Do you think if it is reasonable?

For an application implemented in a FPGA chip, the clock gating function may not be necessary because too few state machines are implemented in any normal application.

Actually I realized how to implement the power consumption scheme in VHDL as follows after the post is posted:

type STATE_TYPE is (s0, s1, ..., Sn);

signal WState, WState_NS: STATE_TYPE;

...;
a: process(clk)
begin
   if rising_edge(clk) then
      if SINI then
         WState <= S0;

      elsif WState /= WState_NS then --  WState /= WState_NS is necessary!
         WState <= WState_NS;
      end if;
   end if;
end process;

b: process(all)
begin
   case WState is
      when S0 =>
         if C00 then
            WState_NS <= S1;

         elsif C01 then
            WState_NS <= S2;

         else
            WState_NS <= S0;
         end if;

      ...;
   end case;
end process;

Thank you.

Weng

Article: 160961
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: Richard Damon <Richard@Damon-Family.org>
Date: Sat, 5 Jan 2019 21:28:31 -0500
Links: << >>  << T >>  << A >>
On 1/5/19 8:23 PM, Weng Tianxiang wrote:
> On Saturday, January 5, 2019 at 2:35:28 PM UTC-8, Richard Damon wrote:
>> On 1/4/19 11:29 PM, Weng Tianxiang wrote:
>>> Hi,
>>>
>>> Can I use Verilog or SystemVerilog to write a state machine with clock gating function?
>>>
>>> I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.
>>>
>>> Thank you.
>>>
>>> Weng
>>>
>>
>> One big question is what do you mean by 'clock gating'
>>
>> As was mentioned, one option for this is to do something like
>>
>> assign gatedclk = clk & gate;
>>
>> or sometimes
>>
>> assign gatedclk = clk | gate;
>>
>> and then us the gatedclk as the clock. The big issue with this is that
>> you need to worry about clock skew when you do this, as well as glitches
>> (the second version works better for gate changing on the rising edge of
>> clk, but needs to be stable before the falling edge.)
>>
>> A second thing called 'clock gating' is to condition the transition on
>> the gate signal, something like
>>
>> always @(posedge clk) begin
>>   if(gate) begin
>> ... state machine here.
>>   end
>> end
>>
>> This make the machine run on the original clock, but it will only change
>> on the cycles where the gate signal is true.
>>
>> VHDL can do the same.
>>
>> There is no need for a 'special statement', you just do it.  If doing
>> the first version, of actually gating the clock, you may want to use
>> some implementation defined macro function to buffer the clock and put
>> it into a low skew distribution network, like may have been done for the
>> original clock.
> 
> Hi Theo and Richard,
> 
> Thank you for your help.
> 
> Using clock gating function is to save power consumption. Why I ask the question is:
> 
> A cache line in Cache I, Cache II or even Cache III in a CPU usually has 64 (2**6) bytes and each cache line must have a state machine to keep data coherence among data over all situations. 
> 
> For a 6M (2**22 + 2**21) bytes cache II (the most I have seen in current market) a CPU must have at least (2**16 + 2**15) state machines, ~= 100,000, and those ~100,000 state machines don't change states most of time.
> 
> In above situation each of the ~100,000 state machines with each having more than 10 states must have a clock gating function to save power consumption: 
> 
> when it will not change states on the next cycle, a clock pulse should not be generated to keep the state unchanged and save power consumption.
> 
> Do you think if it is reasonable?
> 
> For an application implemented in a FPGA chip, the clock gating function may not be necessary because too few state machines are implemented in any normal application.
> > 
> Thank you.
> 
> Weng
> 

One issue with gated clocks is that each gating of the clock needs to be
considered a different clock domain from every other gating of the clock
and from the ungated clock, because the gating (and rebuffering) of the
clock introduces a delay in the clock, so you need to take precautions
when the signal passes from one domain to another. A FPGA might have,
and a gate array may provide a special circuit to generate a set of
gated clocks that will be kept in good enough alignment to not need
this, but then that would be a special application macro that needs to
be instanced.

Second, the power consumption between my first and second method (actual
gating of the clock and using a clock enable) is primarily in the power
to drive the clock line as the clock enable also keeps the state the
same in the 'skipped' clock cycle.

Article: 160962
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: Weng Tianxiang <wtxwtx@gmail.com>
Date: Sun, 6 Jan 2019 09:08:28 -0800 (PST)
Links: << >>  << T >>  << A >>
On Saturday, January 5, 2019 at 6:28:35 PM UTC-8, Richard Damon wrote:
> On 1/5/19 8:23 PM, Weng Tianxiang wrote:
> > On Saturday, January 5, 2019 at 2:35:28 PM UTC-8, Richard Damon wrote:
> >> On 1/4/19 11:29 PM, Weng Tianxiang wrote:
> >>> Hi,
> >>>
> >>> Can I use Verilog or SystemVerilog to write a state machine with clock gating function?
> >>>
> >>> I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.
> >>>
> >>> Thank you.
> >>>
> >>> Weng
> >>>
> >>
> >> One big question is what do you mean by 'clock gating'
> >>
> >> As was mentioned, one option for this is to do something like
> >>
> >> assign gatedclk = clk & gate;
> >>
> >> or sometimes
> >>
> >> assign gatedclk = clk | gate;
> >>
> >> and then us the gatedclk as the clock. The big issue with this is that
> >> you need to worry about clock skew when you do this, as well as glitches
> >> (the second version works better for gate changing on the rising edge of
> >> clk, but needs to be stable before the falling edge.)
> >>
> >> A second thing called 'clock gating' is to condition the transition on
> >> the gate signal, something like
> >>
> >> always @(posedge clk) begin
> >>   if(gate) begin
> >> ... state machine here.
> >>   end
> >> end
> >>
> >> This make the machine run on the original clock, but it will only change
> >> on the cycles where the gate signal is true.
> >>
> >> VHDL can do the same.
> >>
> >> There is no need for a 'special statement', you just do it.  If doing
> >> the first version, of actually gating the clock, you may want to use
> >> some implementation defined macro function to buffer the clock and put
> >> it into a low skew distribution network, like may have been done for the
> >> original clock.
> > 
> > Hi Theo and Richard,
> > 
> > Thank you for your help.
> > 
> > Using clock gating function is to save power consumption. Why I ask the question is:
> > 
> > A cache line in Cache I, Cache II or even Cache III in a CPU usually has 64 (2**6) bytes and each cache line must have a state machine to keep data coherence among data over all situations. 
> > 
> > For a 6M (2**22 + 2**21) bytes cache II (the most I have seen in current market) a CPU must have at least (2**16 + 2**15) state machines, ~= 100,000, and those ~100,000 state machines don't change states most of time.
> > 
> > In above situation each of the ~100,000 state machines with each having more than 10 states must have a clock gating function to save power consumption: 
> > 
> > when it will not change states on the next cycle, a clock pulse should not be generated to keep the state unchanged and save power consumption.
> > 
> > Do you think if it is reasonable?
> > 
> > For an application implemented in a FPGA chip, the clock gating function may not be necessary because too few state machines are implemented in any normal application.
> > > 
> > Thank you.
> > 
> > Weng
> > 
> 
> One issue with gated clocks is that each gating of the clock needs to be
> considered a different clock domain from every other gating of the clock
> and from the ungated clock, because the gating (and rebuffering) of the
> clock introduces a delay in the clock, so you need to take precautions
> when the signal passes from one domain to another. A FPGA might have,
> and a gate array may provide a special circuit to generate a set of
> gated clocks that will be kept in good enough alignment to not need
> this, but then that would be a special application macro that needs to
> be instanced.
> 
> Second, the power consumption between my first and second method (actual
> gating of the clock and using a clock enable) is primarily in the power
> to drive the clock line as the clock enable also keeps the state the
> same in the 'skipped' clock cycle.

Hi Richard,

There are 2 things to consider on how to generate a clock gating function:
1. Generate CE logic.
2. Make gated clock signal working properly.

You address the part 2) and I emphasize on the part 1). 

Is it complex to generate CE logic?

In my understanding generating a clock pulse is consuming more power than skipping the clock pulse.

I want to know if each of CPU ~100,000 state machine implementation actually has clock gating function. 

Based on your code I think it is reasonable to think each of CPU ~100,000 state machine implementation actually has clock gating function.

Only CPU designers know their implementation. I need the information.

Thank you.

Weng

Article: 160963
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: Richard Damon <Richard@Damon-Family.org>
Date: Sun, 6 Jan 2019 13:20:09 -0500
Links: << >>  << T >>  << A >>
On 1/6/19 12:08 PM, Weng Tianxiang wrote:
> On Saturday, January 5, 2019 at 6:28:35 PM UTC-8, Richard Damon wrote:
>> On 1/5/19 8:23 PM, Weng Tianxiang wrote:
>>> On Saturday, January 5, 2019 at 2:35:28 PM UTC-8, Richard Damon wrote:
>>>> On 1/4/19 11:29 PM, Weng Tianxiang wrote:
>>>>> Hi,
>>>>>
>>>>> Can I use Verilog or SystemVerilog to write a state machine with clock gating function?
>>>>>
>>>>> I know VHDL has no such function and want to know if Verilog or SystemVerilog has the clock gating function for a state machine.
>>>>>
>>>>> Thank you.
>>>>>
>>>>> Weng
>>>>>
>>>>
>>>> One big question is what do you mean by 'clock gating'
>>>>
>>>> As was mentioned, one option for this is to do something like
>>>>
>>>> assign gatedclk = clk & gate;
>>>>
>>>> or sometimes
>>>>
>>>> assign gatedclk = clk | gate;
>>>>
>>>> and then us the gatedclk as the clock. The big issue with this is that
>>>> you need to worry about clock skew when you do this, as well as glitches
>>>> (the second version works better for gate changing on the rising edge of
>>>> clk, but needs to be stable before the falling edge.)
>>>>
>>>> A second thing called 'clock gating' is to condition the transition on
>>>> the gate signal, something like
>>>>
>>>> always @(posedge clk) begin
>>>>   if(gate) begin
>>>> ... state machine here.
>>>>   end
>>>> end
>>>>
>>>> This make the machine run on the original clock, but it will only change
>>>> on the cycles where the gate signal is true.
>>>>
>>>> VHDL can do the same.
>>>>
>>>> There is no need for a 'special statement', you just do it.  If doing
>>>> the first version, of actually gating the clock, you may want to use
>>>> some implementation defined macro function to buffer the clock and put
>>>> it into a low skew distribution network, like may have been done for the
>>>> original clock.
>>>
>>> Hi Theo and Richard,
>>>
>>> Thank you for your help.
>>>
>>> Using clock gating function is to save power consumption. Why I ask the question is:
>>>
>>> A cache line in Cache I, Cache II or even Cache III in a CPU usually has 64 (2**6) bytes and each cache line must have a state machine to keep data coherence among data over all situations. 
>>>
>>> For a 6M (2**22 + 2**21) bytes cache II (the most I have seen in current market) a CPU must have at least (2**16 + 2**15) state machines, ~= 100,000, and those ~100,000 state machines don't change states most of time.
>>>
>>> In above situation each of the ~100,000 state machines with each having more than 10 states must have a clock gating function to save power consumption: 
>>>
>>> when it will not change states on the next cycle, a clock pulse should not be generated to keep the state unchanged and save power consumption.
>>>
>>> Do you think if it is reasonable?
>>>
>>> For an application implemented in a FPGA chip, the clock gating function may not be necessary because too few state machines are implemented in any normal application.
>>>>
>>> Thank you.
>>>
>>> Weng
>>>
>>
>> One issue with gated clocks is that each gating of the clock needs to be
>> considered a different clock domain from every other gating of the clock
>> and from the ungated clock, because the gating (and rebuffering) of the
>> clock introduces a delay in the clock, so you need to take precautions
>> when the signal passes from one domain to another. A FPGA might have,
>> and a gate array may provide a special circuit to generate a set of
>> gated clocks that will be kept in good enough alignment to not need
>> this, but then that would be a special application macro that needs to
>> be instanced.
>>
>> Second, the power consumption between my first and second method (actual
>> gating of the clock and using a clock enable) is primarily in the power
>> to drive the clock line as the clock enable also keeps the state the
>> same in the 'skipped' clock cycle.
> 
> Hi Richard,
> 
> There are 2 things to consider on how to generate a clock gating function:
> 1. Generate CE logic.
> 2. Make gated clock signal working properly.
> 
> You address the part 2) and I emphasize on the part 1). 
> 
> Is it complex to generate CE logic?
> 
> In my understanding generating a clock pulse is consuming more power than skipping the clock pulse.
> 
> I want to know if each of CPU ~100,000 state machine implementation actually has clock gating function. 
> 
> Based on your code I think it is reasonable to think each of CPU ~100,000 state machine implementation actually has clock gating function.
> 
> Only CPU designers know their implementation. I need the information.
> 
> Thank you.
> 
> Weng
> 

Actually gating the clock is a single gate (but then in an ASIC it can't
drive much logic, so things start to get more complicated). Making it
work gets things much more complicated, and probably gets you out of the
domain of portable Verilog or VHDL. That is the nature of clock trees.

Thus, step one is in a sense trivial if you are ignoring step two, but
doing step one while ignoring step two is worthless.

I personally don't know whether it is simpler/better to add the clock
enable functionality to the flip flops or gate the clock and deal with
all the timing/buffering issues, and it wouldn't surprise me if it
turned out that which is better very much depends on the process and
other criteria.

The only real answer would be to talk to the process people, but my
guess is that the answer is very much proprietary, and unless it looks
like you are willing and planning on spending the big bucks to actually
do this, won't waste their time talking about it.

Article: 160964
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: gnuarm.deletethisbit@gmail.com
Date: Sun, 6 Jan 2019 11:23:06 -0800 (PST)
Links: << >>  << T >>  << A >>
On Sunday, January 6, 2019 at 1:20:16 PM UTC-5, Richard Damon wrote:
> On 1/6/19 12:08 PM, Weng Tianxiang wrote:
> > On Saturday, January 5, 2019 at 6:28:35 PM UTC-8, Richard Damon wrote:
> >> On 1/5/19 8:23 PM, Weng Tianxiang wrote:
> >>> On Saturday, January 5, 2019 at 2:35:28 PM UTC-8, Richard Damon wrote=
:
> >>>> On 1/4/19 11:29 PM, Weng Tianxiang wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Can I use Verilog or SystemVerilog to write a state machine with cl=
ock gating function?
> >>>>>
> >>>>> I know VHDL has no such function and want to know if Verilog or Sys=
temVerilog has the clock gating function for a state machine.
> >>>>>
> >>>>> Thank you.
> >>>>>
> >>>>> Weng
> >>>>>
> >>>>
> >>>> One big question is what do you mean by 'clock gating'
> >>>>
> >>>> As was mentioned, one option for this is to do something like
> >>>>
> >>>> assign gatedclk =3D clk & gate;
> >>>>
> >>>> or sometimes
> >>>>
> >>>> assign gatedclk =3D clk | gate;
> >>>>
> >>>> and then us the gatedclk as the clock. The big issue with this is th=
at
> >>>> you need to worry about clock skew when you do this, as well as glit=
ches
> >>>> (the second version works better for gate changing on the rising edg=
e of
> >>>> clk, but needs to be stable before the falling edge.)
> >>>>
> >>>> A second thing called 'clock gating' is to condition the transition =
on
> >>>> the gate signal, something like
> >>>>
> >>>> always @(posedge clk) begin
> >>>>   if(gate) begin
> >>>> ... state machine here.
> >>>>   end
> >>>> end
> >>>>
> >>>> This make the machine run on the original clock, but it will only ch=
ange
> >>>> on the cycles where the gate signal is true.
> >>>>
> >>>> VHDL can do the same.
> >>>>
> >>>> There is no need for a 'special statement', you just do it.  If doin=
g
> >>>> the first version, of actually gating the clock, you may want to use
> >>>> some implementation defined macro function to buffer the clock and p=
ut
> >>>> it into a low skew distribution network, like may have been done for=
 the
> >>>> original clock.
> >>>
> >>> Hi Theo and Richard,
> >>>
> >>> Thank you for your help.
> >>>
> >>> Using clock gating function is to save power consumption. Why I ask t=
he question is:
> >>>
> >>> A cache line in Cache I, Cache II or even Cache III in a CPU usually =
has 64 (2**6) bytes and each cache line must have a state machine to keep d=
ata coherence among data over all situations.=20
> >>>
> >>> For a 6M (2**22 + 2**21) bytes cache II (the most I have seen in curr=
ent market) a CPU must have at least (2**16 + 2**15) state machines, ~=3D 1=
00,000, and those ~100,000 state machines don't change states most of time.
> >>>
> >>> In above situation each of the ~100,000 state machines with each havi=
ng more than 10 states must have a clock gating function to save power cons=
umption:=20
> >>>
> >>> when it will not change states on the next cycle, a clock pulse shoul=
d not be generated to keep the state unchanged and save power consumption.
> >>>
> >>> Do you think if it is reasonable?
> >>>
> >>> For an application implemented in a FPGA chip, the clock gating funct=
ion may not be necessary because too few state machines are implemented in =
any normal application.
> >>>>
> >>> Thank you.
> >>>
> >>> Weng
> >>>
> >>
> >> One issue with gated clocks is that each gating of the clock needs to =
be
> >> considered a different clock domain from every other gating of the clo=
ck
> >> and from the ungated clock, because the gating (and rebuffering) of th=
e
> >> clock introduces a delay in the clock, so you need to take precautions
> >> when the signal passes from one domain to another. A FPGA might have,
> >> and a gate array may provide a special circuit to generate a set of
> >> gated clocks that will be kept in good enough alignment to not need
> >> this, but then that would be a special application macro that needs to
> >> be instanced.
> >>
> >> Second, the power consumption between my first and second method (actu=
al
> >> gating of the clock and using a clock enable) is primarily in the powe=
r
> >> to drive the clock line as the clock enable also keeps the state the
> >> same in the 'skipped' clock cycle.
> >=20
> > Hi Richard,
> >=20
> > There are 2 things to consider on how to generate a clock gating functi=
on:
> > 1. Generate CE logic.
> > 2. Make gated clock signal working properly.
> >=20
> > You address the part 2) and I emphasize on the part 1).=20
> >=20
> > Is it complex to generate CE logic?
> >=20
> > In my understanding generating a clock pulse is consuming more power th=
an skipping the clock pulse.
> >=20
> > I want to know if each of CPU ~100,000 state machine implementation act=
ually has clock gating function.=20
> >=20
> > Based on your code I think it is reasonable to think each of CPU ~100,0=
00 state machine implementation actually has clock gating function.
> >=20
> > Only CPU designers know their implementation. I need the information.
> >=20
> > Thank you.
> >=20
> > Weng
> >=20
>=20
> Actually gating the clock is a single gate (but then in an ASIC it can't
> drive much logic, so things start to get more complicated). Making it
> work gets things much more complicated, and probably gets you out of the
> domain of portable Verilog or VHDL. That is the nature of clock trees.
>=20
> Thus, step one is in a sense trivial if you are ignoring step two, but
> doing step one while ignoring step two is worthless.
>=20
> I personally don't know whether it is simpler/better to add the clock
> enable functionality to the flip flops or gate the clock and deal with
> all the timing/buffering issues, and it wouldn't surprise me if it
> turned out that which is better very much depends on the process and
> other criteria.
>=20
> The only real answer would be to talk to the process people, but my
> guess is that the answer is very much proprietary, and unless it looks
> like you are willing and planning on spending the big bucks to actually
> do this, won't waste their time talking about it.

Sure, in full custom ASICs it is not uncommon to gate the clock.  In fast c=
hips the clock tree design can consume half the dynamic power in the chip. =
 So gating the clock can bring significant power savings.  However, the clo=
ck gating being described here is over far too small a portion of the chip =
to be effective on many levels if I understand what is going on.  The OP is=
 talking about 100,000 identical state machines, one for each cache item.  =
I believe what he is talking about as FSMs are really just a handful of FFs=
 but I'm not sure.  If so, the clock gating logic is nearly as large and so=
 would consume nearly as much power and area as the logic it is controlling=
. =20

Will it be practical to design 100,000 clock gating circuits to control 100=
,000 tiny FSMs?  Maybe I am wrong about the size of the FSMs.  Or maybe it =
would be practical to combine the clock gating to many of the 100,000 FSMs =
so they are shut off in large blocks?  I don't know, but the OP seems preoc=
cupied with the idea of this being a language feature rather than a design =
feature added by the user.  I'm sure he wants to produce an idea using a li=
brary or something that he can patent.  That seems to be his MO.  Oh well..=
.=20

  Rick C.

  - Get 6 months of free supercharging
  - Tesla referral code - https://ts.la/richard11209

Article: 160965
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: Weng Tianxiang <wtxwtx@gmail.com>
Date: Sun, 6 Jan 2019 13:30:23 -0800 (PST)
Links: << >>  << T >>  << A >>
On Sunday, January 6, 2019 at 11:23:10 AM UTC-8, gnuarm.del...@gmail.com wr=
ote:
> On Sunday, January 6, 2019 at 1:20:16 PM UTC-5, Richard Damon wrote:
> > On 1/6/19 12:08 PM, Weng Tianxiang wrote:
> > > On Saturday, January 5, 2019 at 6:28:35 PM UTC-8, Richard Damon wrote=
:
> > >> On 1/5/19 8:23 PM, Weng Tianxiang wrote:
> > >>> On Saturday, January 5, 2019 at 2:35:28 PM UTC-8, Richard Damon wro=
te:
> > >>>> On 1/4/19 11:29 PM, Weng Tianxiang wrote:
> > >>>>> Hi,
> > >>>>>
> > >>>>> Can I use Verilog or SystemVerilog to write a state machine with =
clock gating function?
> > >>>>>
> > >>>>> I know VHDL has no such function and want to know if Verilog or S=
ystemVerilog has the clock gating function for a state machine.
> > >>>>>
> > >>>>> Thank you.
> > >>>>>
> > >>>>> Weng
> > >>>>>
> > >>>>
> > >>>> One big question is what do you mean by 'clock gating'
> > >>>>
> > >>>> As was mentioned, one option for this is to do something like
> > >>>>
> > >>>> assign gatedclk =3D clk & gate;
> > >>>>
> > >>>> or sometimes
> > >>>>
> > >>>> assign gatedclk =3D clk | gate;
> > >>>>
> > >>>> and then us the gatedclk as the clock. The big issue with this is =
that
> > >>>> you need to worry about clock skew when you do this, as well as gl=
itches
> > >>>> (the second version works better for gate changing on the rising e=
dge of
> > >>>> clk, but needs to be stable before the falling edge.)
> > >>>>
> > >>>> A second thing called 'clock gating' is to condition the transitio=
n on
> > >>>> the gate signal, something like
> > >>>>
> > >>>> always @(posedge clk) begin
> > >>>>   if(gate) begin
> > >>>> ... state machine here.
> > >>>>   end
> > >>>> end
> > >>>>
> > >>>> This make the machine run on the original clock, but it will only =
change
> > >>>> on the cycles where the gate signal is true.
> > >>>>
> > >>>> VHDL can do the same.
> > >>>>
> > >>>> There is no need for a 'special statement', you just do it.  If do=
ing
> > >>>> the first version, of actually gating the clock, you may want to u=
se
> > >>>> some implementation defined macro function to buffer the clock and=
 put
> > >>>> it into a low skew distribution network, like may have been done f=
or the
> > >>>> original clock.
> > >>>
> > >>> Hi Theo and Richard,
> > >>>
> > >>> Thank you for your help.
> > >>>
> > >>> Using clock gating function is to save power consumption. Why I ask=
 the question is:
> > >>>
> > >>> A cache line in Cache I, Cache II or even Cache III in a CPU usuall=
y has 64 (2**6) bytes and each cache line must have a state machine to keep=
 data coherence among data over all situations.=20
> > >>>
> > >>> For a 6M (2**22 + 2**21) bytes cache II (the most I have seen in cu=
rrent market) a CPU must have at least (2**16 + 2**15) state machines, ~=3D=
 100,000, and those ~100,000 state machines don't change states most of tim=
e.
> > >>>
> > >>> In above situation each of the ~100,000 state machines with each ha=
ving more than 10 states must have a clock gating function to save power co=
nsumption:=20
> > >>>
> > >>> when it will not change states on the next cycle, a clock pulse sho=
uld not be generated to keep the state unchanged and save power consumption=
.
> > >>>
> > >>> Do you think if it is reasonable?
> > >>>
> > >>> For an application implemented in a FPGA chip, the clock gating fun=
ction may not be necessary because too few state machines are implemented i=
n any normal application.
> > >>>>
> > >>> Thank you.
> > >>>
> > >>> Weng
> > >>>
> > >>
> > >> One issue with gated clocks is that each gating of the clock needs t=
o be
> > >> considered a different clock domain from every other gating of the c=
lock
> > >> and from the ungated clock, because the gating (and rebuffering) of =
the
> > >> clock introduces a delay in the clock, so you need to take precautio=
ns
> > >> when the signal passes from one domain to another. A FPGA might have=
,
> > >> and a gate array may provide a special circuit to generate a set of
> > >> gated clocks that will be kept in good enough alignment to not need
> > >> this, but then that would be a special application macro that needs =
to
> > >> be instanced.
> > >>
> > >> Second, the power consumption between my first and second method (ac=
tual
> > >> gating of the clock and using a clock enable) is primarily in the po=
wer
> > >> to drive the clock line as the clock enable also keeps the state the
> > >> same in the 'skipped' clock cycle.
> > >=20
> > > Hi Richard,
> > >=20
> > > There are 2 things to consider on how to generate a clock gating func=
tion:
> > > 1. Generate CE logic.
> > > 2. Make gated clock signal working properly.
> > >=20
> > > You address the part 2) and I emphasize on the part 1).=20
> > >=20
> > > Is it complex to generate CE logic?
> > >=20
> > > In my understanding generating a clock pulse is consuming more power =
than skipping the clock pulse.
> > >=20
> > > I want to know if each of CPU ~100,000 state machine implementation a=
ctually has clock gating function.=20
> > >=20
> > > Based on your code I think it is reasonable to think each of CPU ~100=
,000 state machine implementation actually has clock gating function.
> > >=20
> > > Only CPU designers know their implementation. I need the information.
> > >=20
> > > Thank you.
> > >=20
> > > Weng
> > >=20
> >=20
> > Actually gating the clock is a single gate (but then in an ASIC it can'=
t
> > drive much logic, so things start to get more complicated). Making it
> > work gets things much more complicated, and probably gets you out of th=
e
> > domain of portable Verilog or VHDL. That is the nature of clock trees.
> >=20
> > Thus, step one is in a sense trivial if you are ignoring step two, but
> > doing step one while ignoring step two is worthless.
> >=20
> > I personally don't know whether it is simpler/better to add the clock
> > enable functionality to the flip flops or gate the clock and deal with
> > all the timing/buffering issues, and it wouldn't surprise me if it
> > turned out that which is better very much depends on the process and
> > other criteria.
> >=20
> > The only real answer would be to talk to the process people, but my
> > guess is that the answer is very much proprietary, and unless it looks
> > like you are willing and planning on spending the big bucks to actually
> > do this, won't waste their time talking about it.
>=20
> Sure, in full custom ASICs it is not uncommon to gate the clock.  In fast=
 chips the clock tree design can consume half the dynamic power in the chip=
.  So gating the clock can bring significant power savings.  However, the c=
lock gating being described here is over far too small a portion of the chi=
p to be effective on many levels if I understand what is going on.  The OP =
is talking about 100,000 identical state machines, one for each cache item.=
  I believe what he is talking about as FSMs are really just a handful of F=
Fs but I'm not sure.  If so, the clock gating logic is nearly as large and =
so would consume nearly as much power and area as the logic it is controlli=
ng. =20
>=20
> Will it be practical to design 100,000 clock gating circuits to control 1=
00,000 tiny FSMs?  Maybe I am wrong about the size of the FSMs.  Or maybe i=
t would be practical to combine the clock gating to many of the 100,000 FSM=
s so they are shut off in large blocks?  I don't know, but the OP seems pre=
occupied with the idea of this being a language feature rather than a desig=
n feature added by the user.  I'm sure he wants to produce an idea using a =
library or something that he can patent.  That seems to be his MO.  Oh well=
...=20
>=20
>   Rick C.
>=20
>   - Get 6 months of free supercharging
>   - Tesla referral code - https://ts.la/richard11209

Hi Rick,
You misunderstand and ~100,000 state machines are even coded as the same bu=
t with different input signals and output signals, act differently and you =
cannot "combine the clock gating to many of the 100,000 FSMs". Each has mor=
e than 10 states, so each state machine must have 4 registers to implement =
and each has its clock gating logic and clock gating device.

Weng

Article: 160966
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: Richard Damon <Richard@Damon-Family.org>
Date: Sun, 6 Jan 2019 18:22:19 -0500
Links: << >>  << T >>  << A >>
On 1/6/19 4:30 PM, Weng Tianxiang wrote:
> Hi Rick,
> You misunderstand and ~100,000 state machines are even coded as the same but with different input signals and output signals, act differently and you cannot "combine the clock gating to many of the 100,000 FSMs". Each has more than 10 states, so each state machine must have 4 registers to implement and each has its clock gating logic and clock gating device.
> 
> Weng

If you are really talking gating for 4 FFs, than my guess is that using
Clock Enabled ffs would be much simpler and probably better than trying
to gate the clock and keeping things synchronized.

The big issue would be that to make the gated clocking work you may need
double the clock distribution tree, one for an 'early' clock that is to
be gated, and a second 'late' clock that ungated parts of the system
used that will line up with the gated clocks. This need for the second
clock distribution tree probably eats up more power than you are saving
by stopping the clock to those flip flops.

The primary alternative to two clocks would be running on opposite edges
(so skew isn't as much of a problem), but that then limits the speed the
system can run at.

Article: 160967
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: Weng Tianxiang <wtxwtx@gmail.com>
Date: Sun, 6 Jan 2019 17:59:24 -0800 (PST)
Links: << >>  << T >>  << A >>
I want to use my method in all types of circuits. A clock gating device is basically a latch. A FF with a clock enable input is a FF having a latch. Thank you.

Article: 160968
Subject: Re: Estimating ROM gate count in ASIC
From: Thomas Stanka <usenet_nospam_valid@stanka-web.de>
Date: Sun, 6 Jan 2019 23:38:31 -0800 (PST)
Links: << >>  << T >>  << A >>
Am Mittwoch, 2. Januar 2019 00:55:57 UTC+1 schrieb Kevin Neilson:

> So I'll be in that range 97.5% of the time.

I think you did not understand my posting. There is no way of estimating th=
e synthesis result of a ROM right by simple formula.
Unless you build a model of the synthesis tool itself with this formula.

A change of 1 dataword in a ROM with 1024 words could significant change th=
e synthesis result by more than 1%.

There is a simple upper bound and a simple lower bound but real ROMs are al=
ways in between.
Simple upper bound is calculated by building DNF for ROM in given technolog=
y and lower bound is 1 gate per bit (tie0 or tie1) for stand alone synthesi=
s.

Take for example DES encryption algorithm S-Box. These are lookup tables (R=
OM) designed to be not easy reduced by synthesis tools. You could synthesis=
 one of this S-Box 10 times with different seeds and would not get 2 identi=
cal results for the same S-Box.=20

If you have no access to ASIC synthesis tool than download free FPGA synthe=
sis tool and test the results for FPGA synthesis. This gives at least a fee=
ling how synthesis tools deal with your ROM.

bye Thomas

Article: 160969
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: Thomas Stanka <usenet_nospam_valid@stanka-web.de>
Date: Sun, 6 Jan 2019 23:48:29 -0800 (PST)
Links: << >>  << T >>  << A >>
Am Samstag, 5. Januar 2019 05:30:08 UTC+1 schrieb Weng Tianxiang:
> Can I use Verilog or SystemVerilog to write a state machine with clock ga=
ting function?
>=20
> I know VHDL has no such function and want to know if Verilog or SystemVer=
ilog has the clock gating function for a state machine.

All languages support clock gating when explicit expressed and no language =
has an implicit statement for it.
This is as the clock is not really anything special in the language [1] and=
 clock gating has several side effects that needs to be dealed with during =
layout. But in many cases you need to deal with some implications of clock =
gating during architectural design phase when writing the code.

[1] rising_edge(enable) or rising_edge(clock) have no difference for the la=
nguage but very different results when using synthesis tools

bye Thomas

Article: 160970
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: KJ <kkjennings@sbcglobal.net>
Date: Mon, 7 Jan 2019 05:10:59 -0800 (PST)
Links: << >>  << T >>  << A >>
On Saturday, January 5, 2019 at 8:23:43 PM UTC-5, Weng Tianxiang wrote:
>=20
> In above situation each of the ~100,000 state machines with each having m=
ore than 10 states must have a clock gating function to save power consumpt=
ion:=20

That is your unsubstantiated claim, not a fact.

>=20
> when it will not change states on the next cycle, a clock pulse should no=
t be generated to keep the state unchanged and save power consumption.
>=20

Any perceived lower power consumption has very, very little to do with the =
fact that the state does not change.  A flip flop that is clocked but does =
not happen to change its output does not consume much power.  The power is =
needed to charge/discharge the loads that are being driven.  Any decreased =
power consumption would have to do with the decrease in power in generating=
 the clock input to the flip flop.  But shifting from a common clock to add=
ing a gate that generates a clock probably does not lower power since the s=
ame number of clock signals are being generated.  If the gated clock routin=
g is a higher capacitive route then when using a free-running clock then yo=
u can consume more power.  This is the result when trying to implement gate=
d clocks in FPGA.  ASIC will be different.

>=20
> For an application implemented in a FPGA chip, the clock gating function =
may not be necessary because too few state machines are implemented in any =
normal application.
>=20
As I pointed out to you back in 2010 (I think), implementing what you descr=
ibe in an FPGA results in an increase in power consumption.  I provided you=
 with all of the details for your sample design.  The results of that analy=
sis are not "because too few state machines are implemented", it is because=
 gated clocks in FPGA use more power, not less.  Again, that was with your =
sample design of that time which appears to be the same thing you are reusi=
ng here.

> Actually I realized how to implement the power consumption scheme in VHDL=
 as follows after the post is posted:
>=20
I noticed that you did not show the actual gating of the clock, only the ap=
parent usage of a possibly free running clock.

> a: process(clk)
> begin
>    if rising_edge(clk) then

Also, the following 'elsif' is not necessary even though your comment says =
it is.  No worries though, synthesis tools should optimize out the 'elsif' =
and leave the assignment 'WState <=3D WState_NS;' on every clock.  If the t=
ool somehow leaves it in, then there will be an increase in power consumpti=
on due to use of additional logic required to implement 'elsif WState /=3D =
WState_NS then'.  That increase would need to be counted against any power =
savings that you think you're achieving.  Again, it would probably be worth=
while for you to do some analysis prior to posting and claiming...but after=
 all these years of not acting on this advice it doesn't appear that you're=
 willing to make that behavioral change.
>       elsif WState /=3D WState_NS then --  WState /=3D WState_NS is neces=
sary!
>          WState <=3D WState_NS;
>       end if;
>    end if;
> end process;

I suspect that you did not actually test any of this prior to posting and c=
laiming since the code is not complete and does not compile...as usual.

Kevin

Article: 160971
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: Weng Tianxiang <wtxwtx@gmail.com>
Date: Mon, 7 Jan 2019 11:21:46 -0800 (PST)
Links: << >>  << T >>  << A >>
On Monday, January 7, 2019 at 5:11:05 AM UTC-8, KJ wrote:
> On Saturday, January 5, 2019 at 8:23:43 PM UTC-5, Weng Tianxiang wrote:
> >=20
> > In above situation each of the ~100,000 state machines with each having=
 more than 10 states must have a clock gating function to save power consum=
ption:=20
>=20
> That is your unsubstantiated claim, not a fact.
>=20
> >=20
> > when it will not change states on the next cycle, a clock pulse should =
not be generated to keep the state unchanged and save power consumption.
> >=20
>=20
> Any perceived lower power consumption has very, very little to do with th=
e fact that the state does not change.  A flip flop that is clocked but doe=
s not happen to change its output does not consume much power.  The power i=
s needed to charge/discharge the loads that are being driven.  Any decrease=
d power consumption would have to do with the decrease in power in generati=
ng the clock input to the flip flop.  But shifting from a common clock to a=
dding a gate that generates a clock probably does not lower power since the=
 same number of clock signals are being generated.  If the gated clock rout=
ing is a higher capacitive route then when using a free-running clock then =
you can consume more power.  This is the result when trying to implement ga=
ted clocks in FPGA.  ASIC will be different.
>=20
> >=20
> > For an application implemented in a FPGA chip, the clock gating functio=
n may not be necessary because too few state machines are implemented in an=
y normal application.
> >=20
> As I pointed out to you back in 2010 (I think), implementing what you des=
cribe in an FPGA results in an increase in power consumption.  I provided y=
ou with all of the details for your sample design.  The results of that ana=
lysis are not "because too few state machines are implemented", it is becau=
se gated clocks in FPGA use more power, not less.  Again, that was with you=
r sample design of that time which appears to be the same thing you are reu=
sing here.
>=20
> > Actually I realized how to implement the power consumption scheme in VH=
DL as follows after the post is posted:
> >=20
> I noticed that you did not show the actual gating of the clock, only the =
apparent usage of a possibly free running clock.
>=20
> > a: process(clk)
> > begin
> >    if rising_edge(clk) then
>=20
> Also, the following 'elsif' is not necessary even though your comment say=
s it is.  No worries though, synthesis tools should optimize out the 'elsif=
' and leave the assignment 'WState <=3D WState_NS;' on every clock.  If the=
 tool somehow leaves it in, then there will be an increase in power consump=
tion due to use of additional logic required to implement 'elsif WState /=
=3D WState_NS then'.  That increase would need to be counted against any po=
wer savings that you think you're achieving.  Again, it would probably be w=
orthwhile for you to do some analysis prior to posting and claiming...but a=
fter all these years of not acting on this advice it doesn't appear that yo=
u're willing to make that behavioral change.
> >       elsif WState /=3D WState_NS then --  WState /=3D WState_NS is nec=
essary!
> >          WState <=3D WState_NS;
> >       end if;
> >    end if;
> > end process;
>=20
> I suspect that you did not actually test any of this prior to posting and=
 claiming since the code is not complete and does not compile...as usual.
>=20
> Kevin

Hi,

There are several experts responding to my post. Thank you. Noticeably I do=
 not find Hans of www.ht-lab.com giving his opinion. Usually his opinion is=
 reasonable and informative and he knows many things outside the FPGA chips=
 beyond my knowledge.

Here is the background for the purpose of my post:
1. On 12/31/2018 I filed a non-provisional patent application. I asked for =
earlier publication. The publication will happen about 14 weeks later since=
 its filing date.

2. On 01/06/2019 I sent it in almost the same version as a regular paper to=
 IEEE Transaction of circuits and System for publication. The review proces=
s may take up to 3 months.

Because IEEE Transaction strict restriction on the paper's originality, I c=
annot disclose any details about my invention until the transaction agrees =
to publish my paper 3 months later or rejects my paper in 1 or 2 weeks.

Here are some facts of my invention:
1. The logic used to generate a state machine with clock gating devices is =
almost the same as conventional method would generate, or maybe even simple=
r than conventional method.

2. I don't know how CPU deals with its 100,000*4 FFs clocking scheme used i=
n state machines for the Cache II control. If they don't care about the pow=
er saving or they have implemented some scheme in the implementation, my in=
vention would be of few values, or otherwise it would be worth million of d=
ollars.

3. My post's purpose is to test if such invention is of any value, not abou=
t how to implement a state machine with clock gating function.=20

4. After my application is published 3 months later I will immediately regi=
ster and sell the application at http://www.ast.com/interested-in-selling-t=
o-ast/. I know the website because Google refers to the website and indicat=
es they are a member of the site. I expect that Intel, IBM, AMD, Apple may =
also be the members of the website. The site asks for the selling price dur=
ing registration. So it is important for me to assess my invention's value =
properly.

5. I think no developing persons at Intel, IBM, AMD, Apple would visit this=
 website, not mention taking part in the discussion of my post.

6. I hope I will discuss the invention in more details 3 months later befor=
e my registrations in the patent selling website.

7. Xilinx chip has clock enable signal built into its cell block, one CE in=
put for 8 registers in the block. Altera may be in the same situation. So c=
lock enable is never a new thing and we don't have to pay attention to how =
the clock trees work. For a CPU design, in my opinion, logic design and clo=
ck tree design are 2 separated domains one after another, and logic designe=
rs never have to pay attention to the clock trees.=20

Thank you.

Weng


Article: 160972
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: gnuarm.deletethisbit@gmail.com
Date: Tue, 8 Jan 2019 09:37:41 -0800 (PST)
Links: << >>  << T >>  << A >>
On Sunday, January 6, 2019 at 4:30:27 PM UTC-5, Weng Tianxiang wrote:
> On Sunday, January 6, 2019 at 11:23:10 AM UTC-8, gnuarm.del...@gmail.com =
wrote:
> > On Sunday, January 6, 2019 at 1:20:16 PM UTC-5, Richard Damon wrote:
> > > On 1/6/19 12:08 PM, Weng Tianxiang wrote:
> > > > On Saturday, January 5, 2019 at 6:28:35 PM UTC-8, Richard Damon wro=
te:
> > > >> On 1/5/19 8:23 PM, Weng Tianxiang wrote:
> > > >>> On Saturday, January 5, 2019 at 2:35:28 PM UTC-8, Richard Damon w=
rote:
> > > >>>> On 1/4/19 11:29 PM, Weng Tianxiang wrote:
> > > >>>>> Hi,
> > > >>>>>
> > > >>>>> Can I use Verilog or SystemVerilog to write a state machine wit=
h clock gating function?
> > > >>>>>
> > > >>>>> I know VHDL has no such function and want to know if Verilog or=
 SystemVerilog has the clock gating function for a state machine.
> > > >>>>>
> > > >>>>> Thank you.
> > > >>>>>
> > > >>>>> Weng
> > > >>>>>
> > > >>>>
> > > >>>> One big question is what do you mean by 'clock gating'
> > > >>>>
> > > >>>> As was mentioned, one option for this is to do something like
> > > >>>>
> > > >>>> assign gatedclk =3D clk & gate;
> > > >>>>
> > > >>>> or sometimes
> > > >>>>
> > > >>>> assign gatedclk =3D clk | gate;
> > > >>>>
> > > >>>> and then us the gatedclk as the clock. The big issue with this i=
s that
> > > >>>> you need to worry about clock skew when you do this, as well as =
glitches
> > > >>>> (the second version works better for gate changing on the rising=
 edge of
> > > >>>> clk, but needs to be stable before the falling edge.)
> > > >>>>
> > > >>>> A second thing called 'clock gating' is to condition the transit=
ion on
> > > >>>> the gate signal, something like
> > > >>>>
> > > >>>> always @(posedge clk) begin
> > > >>>>   if(gate) begin
> > > >>>> ... state machine here.
> > > >>>>   end
> > > >>>> end
> > > >>>>
> > > >>>> This make the machine run on the original clock, but it will onl=
y change
> > > >>>> on the cycles where the gate signal is true.
> > > >>>>
> > > >>>> VHDL can do the same.
> > > >>>>
> > > >>>> There is no need for a 'special statement', you just do it.  If =
doing
> > > >>>> the first version, of actually gating the clock, you may want to=
 use
> > > >>>> some implementation defined macro function to buffer the clock a=
nd put
> > > >>>> it into a low skew distribution network, like may have been done=
 for the
> > > >>>> original clock.
> > > >>>
> > > >>> Hi Theo and Richard,
> > > >>>
> > > >>> Thank you for your help.
> > > >>>
> > > >>> Using clock gating function is to save power consumption. Why I a=
sk the question is:
> > > >>>
> > > >>> A cache line in Cache I, Cache II or even Cache III in a CPU usua=
lly has 64 (2**6) bytes and each cache line must have a state machine to ke=
ep data coherence among data over all situations.=20
> > > >>>
> > > >>> For a 6M (2**22 + 2**21) bytes cache II (the most I have seen in =
current market) a CPU must have at least (2**16 + 2**15) state machines, ~=
=3D 100,000, and those ~100,000 state machines don't change states most of =
time.
> > > >>>
> > > >>> In above situation each of the ~100,000 state machines with each =
having more than 10 states must have a clock gating function to save power =
consumption:=20
> > > >>>
> > > >>> when it will not change states on the next cycle, a clock pulse s=
hould not be generated to keep the state unchanged and save power consumpti=
on.
> > > >>>
> > > >>> Do you think if it is reasonable?
> > > >>>
> > > >>> For an application implemented in a FPGA chip, the clock gating f=
unction may not be necessary because too few state machines are implemented=
 in any normal application.
> > > >>>>
> > > >>> Thank you.
> > > >>>
> > > >>> Weng
> > > >>>
> > > >>
> > > >> One issue with gated clocks is that each gating of the clock needs=
 to be
> > > >> considered a different clock domain from every other gating of the=
 clock
> > > >> and from the ungated clock, because the gating (and rebuffering) o=
f the
> > > >> clock introduces a delay in the clock, so you need to take precaut=
ions
> > > >> when the signal passes from one domain to another. A FPGA might ha=
ve,
> > > >> and a gate array may provide a special circuit to generate a set o=
f
> > > >> gated clocks that will be kept in good enough alignment to not nee=
d
> > > >> this, but then that would be a special application macro that need=
s to
> > > >> be instanced.
> > > >>
> > > >> Second, the power consumption between my first and second method (=
actual
> > > >> gating of the clock and using a clock enable) is primarily in the =
power
> > > >> to drive the clock line as the clock enable also keeps the state t=
he
> > > >> same in the 'skipped' clock cycle.
> > > >=20
> > > > Hi Richard,
> > > >=20
> > > > There are 2 things to consider on how to generate a clock gating fu=
nction:
> > > > 1. Generate CE logic.
> > > > 2. Make gated clock signal working properly.
> > > >=20
> > > > You address the part 2) and I emphasize on the part 1).=20
> > > >=20
> > > > Is it complex to generate CE logic?
> > > >=20
> > > > In my understanding generating a clock pulse is consuming more powe=
r than skipping the clock pulse.
> > > >=20
> > > > I want to know if each of CPU ~100,000 state machine implementation=
 actually has clock gating function.=20
> > > >=20
> > > > Based on your code I think it is reasonable to think each of CPU ~1=
00,000 state machine implementation actually has clock gating function.
> > > >=20
> > > > Only CPU designers know their implementation. I need the informatio=
n.
> > > >=20
> > > > Thank you.
> > > >=20
> > > > Weng
> > > >=20
> > >=20
> > > Actually gating the clock is a single gate (but then in an ASIC it ca=
n't
> > > drive much logic, so things start to get more complicated). Making it
> > > work gets things much more complicated, and probably gets you out of =
the
> > > domain of portable Verilog or VHDL. That is the nature of clock trees=
.
> > >=20
> > > Thus, step one is in a sense trivial if you are ignoring step two, bu=
t
> > > doing step one while ignoring step two is worthless.
> > >=20
> > > I personally don't know whether it is simpler/better to add the clock
> > > enable functionality to the flip flops or gate the clock and deal wit=
h
> > > all the timing/buffering issues, and it wouldn't surprise me if it
> > > turned out that which is better very much depends on the process and
> > > other criteria.
> > >=20
> > > The only real answer would be to talk to the process people, but my
> > > guess is that the answer is very much proprietary, and unless it look=
s
> > > like you are willing and planning on spending the big bucks to actual=
ly
> > > do this, won't waste their time talking about it.
> >=20
> > Sure, in full custom ASICs it is not uncommon to gate the clock.  In fa=
st chips the clock tree design can consume half the dynamic power in the ch=
ip.  So gating the clock can bring significant power savings.  However, the=
 clock gating being described here is over far too small a portion of the c=
hip to be effective on many levels if I understand what is going on.  The O=
P is talking about 100,000 identical state machines, one for each cache ite=
m.  I believe what he is talking about as FSMs are really just a handful of=
 FFs but I'm not sure.  If so, the clock gating logic is nearly as large an=
d so would consume nearly as much power and area as the logic it is control=
ling. =20
> >=20
> > Will it be practical to design 100,000 clock gating circuits to control=
 100,000 tiny FSMs?  Maybe I am wrong about the size of the FSMs.  Or maybe=
 it would be practical to combine the clock gating to many of the 100,000 F=
SMs so they are shut off in large blocks?  I don't know, but the OP seems p=
reoccupied with the idea of this being a language feature rather than a des=
ign feature added by the user.  I'm sure he wants to produce an idea using =
a library or something that he can patent.  That seems to be his MO.  Oh we=
ll...=20
> >=20
> >   Rick C.
> >=20
> >   - Get 6 months of free supercharging
> >   - Tesla referral code - https://ts.la/richard11209
>=20
> Hi Rick,
> You misunderstand and ~100,000 state machines are even coded as the same =
but with different input signals and output signals, act differently and yo=
u cannot "combine the clock gating to many of the 100,000 FSMs". Each has m=
ore than 10 states, so each state machine must have 4 registers to implemen=
t and each has its clock gating logic and clock gating device.

So you know what the clock gating circuity would look like?  Try comparing =
that circuit to the FSM circuit.  You will see they are comparable in size =
and the gating circuit adds to the timing delay as well. =20

Please keep in mind that the 4 FFs in a single FSM can be lumped together w=
ith the 4 FFs from another FSM in your analysis to consider them to be a si=
ngle FSM for the purposes of clock gating.  When any one FSM is active you =
can make the entire circuit active.  This still retains the clock power sav=
ings for all the remaining 99,998 FSMs not in that circuit. =20

I'm not sure this will provide much in the way of logic savings.  But I am =
confident no one is going to want to implement clock gating circuits for ea=
ch 100,000 FSMs independently.  But then it seems they are scrounging aroun=
d for ways to improve power consumption of CPUs these days and there are lo=
ts of transistors available.  I'm also a guy who thought cell phones would =
not be widely accepted. lol


  Rick C.

  + Get 6 months of free supercharging
  + Tesla referral code - https://ts.la/richard11209

Article: 160973
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: Weng Tianxiang <wtxwtx@gmail.com>
Date: Tue, 8 Jan 2019 10:26:38 -0800 (PST)
Links: << >>  << T >>  << A >>
If 2 state machines as you suggested may be active on the same clock, how do you handle it using your scheme?

Article: 160974
Subject: Re: Can I use Verilog or SystemVerilog to write a state machine with
From: gnuarm.deletethisbit@gmail.com
Date: Tue, 8 Jan 2019 10:32:30 -0800 (PST)
Links: << >>  << T >>  << A >>
On Monday, January 7, 2019 at 2:21:51 PM UTC-5, Weng Tianxiang wrote:
> On Monday, January 7, 2019 at 5:11:05 AM UTC-8, KJ wrote:
> > On Saturday, January 5, 2019 at 8:23:43 PM UTC-5, Weng Tianxiang wrote:
> > >=20
> > > In above situation each of the ~100,000 state machines with each havi=
ng more than 10 states must have a clock gating function to save power cons=
umption:=20
> >=20
> > That is your unsubstantiated claim, not a fact.
> >=20
> > >=20
> > > when it will not change states on the next cycle, a clock pulse shoul=
d not be generated to keep the state unchanged and save power consumption.
> > >=20
> >=20
> > Any perceived lower power consumption has very, very little to do with =
the fact that the state does not change.  A flip flop that is clocked but d=
oes not happen to change its output does not consume much power.  The power=
 is needed to charge/discharge the loads that are being driven.  Any decrea=
sed power consumption would have to do with the decrease in power in genera=
ting the clock input to the flip flop.  But shifting from a common clock to=
 adding a gate that generates a clock probably does not lower power since t=
he same number of clock signals are being generated.  If the gated clock ro=
uting is a higher capacitive route then when using a free-running clock the=
n you can consume more power.  This is the result when trying to implement =
gated clocks in FPGA.  ASIC will be different.
> >=20
> > >=20
> > > For an application implemented in a FPGA chip, the clock gating funct=
ion may not be necessary because too few state machines are implemented in =
any normal application.
> > >=20
> > As I pointed out to you back in 2010 (I think), implementing what you d=
escribe in an FPGA results in an increase in power consumption.  I provided=
 you with all of the details for your sample design.  The results of that a=
nalysis are not "because too few state machines are implemented", it is bec=
ause gated clocks in FPGA use more power, not less.  Again, that was with y=
our sample design of that time which appears to be the same thing you are r=
eusing here.
> >=20
> > > Actually I realized how to implement the power consumption scheme in =
VHDL as follows after the post is posted:
> > >=20
> > I noticed that you did not show the actual gating of the clock, only th=
e apparent usage of a possibly free running clock.
> >=20
> > > a: process(clk)
> > > begin
> > >    if rising_edge(clk) then
> >=20
> > Also, the following 'elsif' is not necessary even though your comment s=
ays it is.  No worries though, synthesis tools should optimize out the 'els=
if' and leave the assignment 'WState <=3D WState_NS;' on every clock.  If t=
he tool somehow leaves it in, then there will be an increase in power consu=
mption due to use of additional logic required to implement 'elsif WState /=
=3D WState_NS then'.  That increase would need to be counted against any po=
wer savings that you think you're achieving.  Again, it would probably be w=
orthwhile for you to do some analysis prior to posting and claiming...but a=
fter all these years of not acting on this advice it doesn't appear that yo=
u're willing to make that behavioral change.
> > >       elsif WState /=3D WState_NS then --  WState /=3D WState_NS is n=
ecessary!
> > >          WState <=3D WState_NS;
> > >       end if;
> > >    end if;
> > > end process;
> >=20
> > I suspect that you did not actually test any of this prior to posting a=
nd claiming since the code is not complete and does not compile...as usual.
> >=20
> > Kevin
>=20
> Hi,
>=20
> There are several experts responding to my post. Thank you. Noticeably I =
do not find Hans of www.ht-lab.com giving his opinion. Usually his opinion =
is reasonable and informative and he knows many things outside the FPGA chi=
ps beyond my knowledge.
>=20
> Here is the background for the purpose of my post:
> 1. On 12/31/2018 I filed a non-provisional patent application. I asked fo=
r earlier publication. The publication will happen about 14 weeks later sin=
ce its filing date.
>=20
> 2. On 01/06/2019 I sent it in almost the same version as a regular paper =
to IEEE Transaction of circuits and System for publication. The review proc=
ess may take up to 3 months.
>=20
> Because IEEE Transaction strict restriction on the paper's originality, I=
 cannot disclose any details about my invention until the transaction agree=
s to publish my paper 3 months later or rejects my paper in 1 or 2 weeks.
>=20
> Here are some facts of my invention:
> 1. The logic used to generate a state machine with clock gating devices i=
s almost the same as conventional method would generate, or maybe even simp=
ler than conventional method.

I think you missed the mark by a wide margin on this one.  The logic needed=
 for the clock gating is this...

elsif WState /=3D WState_NS then=20

This is not so trivial compared to the FSM itself, especially in an ASIC.  =
I would estimate it is approximately the same amount of logic in general. =
=20


> 2. I don't know how CPU deals with its 100,000*4 FFs clocking scheme used=
 in state machines for the Cache II control. If they don't care about the p=
ower saving or they have implemented some scheme in the implementation, my =
invention would be of few values, or otherwise it would be worth million of=
 dollars.

For a patent to be valid it has to be non-obvious to a practitioner in the =
field.  I don't know how this is non-obvious to someone in the field of CPU=
 design.  You may obtain a patent, but then lose a patent defense case in c=
ourt.  But again, I didn't think cell phones would take off and now I have =
two.=20


> 3. My post's purpose is to test if such invention is of any value, not ab=
out how to implement a state machine with clock gating function.=20

What exactly is your "invention"???  Clock gating is nothing new.  It is ap=
plied to many parts of a CPU.  Is your invention the idea of applying it to=
 the individual FSMs in a CPU cache?  So if someone instead applies it to g=
roupings of FSMs in a CPU cache they will have worked around your patent.=
=20


> 4. After my application is published 3 months later I will immediately re=
gister and sell the application at http://www.ast.com/interested-in-selling=
-to-ast/. I know the website because Google refers to the website and indic=
ates they are a member of the site. I expect that Intel, IBM, AMD, Apple ma=
y also be the members of the website. The site asks for the selling price d=
uring registration. So it is important for me to assess my invention's valu=
e properly.

What value have you assessed so far?=20


> 5. I think no developing persons at Intel, IBM, AMD, Apple would visit th=
is website, not mention taking part in the discussion of my post.
>=20
> 6. I hope I will discuss the invention in more details 3 months later bef=
ore my registrations in the patent selling website.
>=20
> 7. Xilinx chip has clock enable signal built into its cell block, one CE =
input for 8 registers in the block. Altera may be in the same situation. So=
 clock enable is never a new thing and we don't have to pay attention to ho=
w the clock trees work. For a CPU design, in my opinion, logic design and c=
lock tree design are 2 separated domains one after another, and logic desig=
ners never have to pay attention to the clock trees.=20

Clock enable and clock gating are not the same thing.  Clock enable saves p=
ower by not changing the FF state, but if the FF input is the same as the o=
utput the state won't change anyway. =20

Here is something to consider.  Clock gating saves power compared to clock =
enabling by reducing the power consumed in the clock tree.  How much of the=
 clock tree will you actually be gating with a fine grained approach?  Cloc=
k trees are exponential structures with a multiplier for the fan out at eac=
h level.  With this fine grain approach you are only saving power in the fi=
nal level and in fact, may be adding a level if your clock gating control i=
s at a finer resolution than the last level of clock drive.=20

Generally clock gating is used at a high level to gate the clock to section=
s of a chip.  I expect it is seldom if ever used at a low level because the=
 power saved is not optimal and the logic required is maximal.=20

  Rick C.

  -- Get 6 months of free supercharging
  -- Tesla referral code - https://ts.la/richard11209



Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search