Messages from 124325

Article: 124325
Subject: Population Count circuit
From: acd <acd4usenet@lycos.de>
Date: Tue, 18 Sep 2007 12:00:06 -0700
Links: << >> << T >> << A >>

Out of curiosity:
I am currently studying some problems which require many population-
count operations
ov various sizes.

I have used population count in a course I taught last year, as it
can demonstrate various design styles (function, recursive function,
nested GENERATE statements, etc.) in VHDL.
I used a tree of full adders and the performance on a Spartan 3 as
reported by the xilinx
tools was moderate (using pipelining). *
I think that one should get a much faster implementation even on an
FPGA
if one uses redundant representations very much like multipliers do
(which
do essentially several population circuits of the partial products).

So, FPGA-experts: is that true, from which pipeline depth/input size
is it worth?
How is the cost/performance compared to an full-adder implementation?

Theoreticians: I could not find a complexity expression for this
operation.
But it sounds like textbook knowledge. Maybe I have the wrong
textbooks.

Andreas

Article: 124326
Subject: Re: Guess: what is the largest number of state machines in a current
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Tue, 18 Sep 2007 11:01:04 -0800
Links: << >> << T >> << A >>

comp.arch.fpga wrote:

(snip)

> But to give you the benefit of doubt and return to your orginal
> question:
> Smith-Waterman hardware implementations instantiate tens of thousands
> of identical state machines.

Like the ones that Paracel used to build and sell as real systems?

> Oh, wait, the implentation that we used had not reset. Damn.

Which one was that?

-- glen

Article: 124327
Subject: Re: Altera / Lattice / Xilinx CPLDs ?
From: Dave Pollum <vze24h5m@verizon.net>
Date: Tue, 18 Sep 2007 12:16:44 -0700
Links: << >> << T >> << A >>

On Sep 17, 12:14 pm, "Amontec, Larry" <laurent.ga...@ANTI-
SPAMamontec.com> wrote:
> Hi
>
> We are searching a small CPLD gate count like a coolrunner 128.
>
> - Two IO banks 1.4V to 3.3V with 5V tolerant
> - VCC should be 3.3V or 1.8V
>
> The 5V tolerant is important !
>
> Volume : 5000 - 10000 pces
>
> Any CPLDs ?
>
> Regards,
> Laurent

Laurent;
1) The Xilinx XC9500XL CPLDs have a VCCint of 3.3v; I/Os accept 5V,
3.3V, and 2.5V inputs; outputs 2.5v or 3.3v signals; each I/O has
input hysteresis (all info from Xilinx datasheet).  XC95144XL is
closest part to your needs.  Altera probably has something similar.
2) Use CRII along with voltage translators.
-Dave Pollum

Article: 124328
Subject: Re: Guess: what is the largest number of state machines in a current
From: Jim Granville <no.spam@designtools.maps.co.nz>
Date: Wed, 19 Sep 2007 07:46:25 +1200
Links: << >> << T >> << A >>

Symon wrote:

> "Weng Tianxiang" <wtxwtx@gmail.com> wrote in message 
> news:1189988802.612765.289620@50g2000hsm.googlegroups.com...
> 
>>Weng
>>
> 
> 
> IF OP = "Weng Tianxiang" AND group = comp_arch_fpga THEN
>   be_prepared_for_a_long_thread;
> ORIF crossposted = to_comp_lang_vhdl THEN
>   this_could_go_on_all_week;
> ANDIF both_the_above THEN
>   make_that_a_month;
> BUTIF plonk! THEN
>   blessed_relief;
> ELSIF experiences < imagination THEN
>   OP_question <= not(sense);
> ELSE
>   possibly_on_topic;
> END IF;
> 
> HTH., Syms. ;-)
> 
> p.s. Sorry, couldn't resist it!

:)

Whoa - hang on there, Syms !!!!

What's this ANDIF, BUTIF ?!?!

You can't use that until there has been a long discussion first ?! ;)

( I like the sound of the BUTIF, you might be onto something there... )

-jg

Article: 124329
Subject: Re: Population Count circuit
From: "John_H" <newsgroup@johnhandwork.com>
Date: Tue, 18 Sep 2007 12:52:16 -0700
Links: << >> << T >> << A >>

"acd" <acd4usenet@lycos.de> wrote in message 
news:1190142006.209627.97610@50g2000hsm.googlegroups.com...
> Out of curiosity:
> I am currently studying some problems which require many population-
> count operations
> ov various sizes.
>
> I have used population count in a course I taught last year, as it
> can demonstrate various design styles (function, recursive function,
> nested GENERATE statements, etc.) in VHDL.
> I used a tree of full adders and the performance on a Spartan 3 as
> reported by the xilinx
> tools was moderate (using pipelining). *
> I think that one should get a much faster implementation even on an
> FPGA
> if one uses redundant representations very much like multipliers do
> (which
> do essentially several population circuits of the partial products).
>
> So, FPGA-experts: is that true, from which pipeline depth/input size
> is it worth?
> How is the cost/performance compared to an full-adder implementation?
>
> Theoreticians: I could not find a complexity expression for this
> operation.
> But it sounds like textbook knowledge. Maybe I have the wrong
> textbooks.
>
> Andreas

Is your task looking at one input vector, determining which of many 
population counts need to be incremented and providing the many counts to 
the user after the full population represented by the input vectors have 
been processed?

Counters are fast and simple in FPGAs, at least when compared to the logic 
used to determine what counters need to be incremented in the first place.

A little more elaboration on the specific needs would be helpful; if I'm far 
off base it's because I don't yet understand what you're trying to 
accomplish and what basic limitations you're trying to design around.

- John_H

Article: 124330
Subject: Re: Altera / Lattice / Xilinx CPLDs ?
From: Jim Granville <no.spam@designtools.maps.co.nz>
Date: Wed, 19 Sep 2007 07:58:42 +1200
Links: << >> << T >> << A >>

John_H wrote:

> "Amontec, Larry" <laurent.gauch@ANTI-SPAMamontec.com> wrote in message 
> news:46eeb60f$1_6@news.bluewin.ch...
> 
>>Hi
>>
>>We are searching a small CPLD gate count like a coolrunner 128.
>>
>>- Two IO banks 1.4V to 3.3V with 5V tolerant
>>- VCC should be 3.3V or 1.8V
>>
>>The 5V tolerant is important !
>>
>>Volume : 5000 - 10000 pces
>>
>>Any CPLDs ?
>>
>>Regards,
>>Laurent
> 
> 
> For my own senseless curiosity, would you mind mentioning the application? 
> I'm wondering what new designs require the way-over-the-hill 5V standard. 
> The continuing requirements for 5V interfaces baffle me. 

5V is not over the hill at all, and is actually making a comeback.
Look at the newest microcontrollers in the Automotive sector, 5V
dominates.

Look at the newest uC devices from Silabs (C8051F530) and Freescale RS08
- both these are advanced process parts, and use on-chip regulators
to power the core.
So the IC vendor solves the problem, not the customer.

The trend to lower voltages is process-driven, not customer driven, and
it is the laziness of the suppliers, that makes this a customer problem.
- but it can be solved, with a little more effort, and that is what
is happening now in the Automotive sector.
Now the FPGA sector has problems with absolute power, but with the CPLD
it is possible.

CPLD with regulators do exist, ( just pretty poor Icc regulators )
and CPLD are more likely to do 'interface tasks', similar to uC, than FPGA.

Try driving a Power MOSFET from 3.3V device !
Also 5V sensore are common, and give best ADC noise immunity.

-jg

Article: 124331
Subject: Re: Altera / Lattice / Xilinx CPLDs ?
From: Jim Granville <no.spam@designtools.maps.co.nz>
Date: Wed, 19 Sep 2007 08:06:04 +1200
Links: << >> << T >> << A >>

Amontec, Larry wrote:
> Hi
> 
> We are searching a small CPLD gate count like a coolrunner 128.
> 
> - Two IO banks 1.4V to 3.3V with 5V tolerant
> - VCC should be 3.3V or 1.8V
> 
> The 5V tolerant is important !
> 
> Volume : 5000 - 10000 pces
> 
> Any CPLDs ?

Choices are
Atmel ATF1508ASL - 5V, and lowish power, but not 1.4V bank.

Older Coolrunner parts are 5V, but not 1.4V

Lattice spec 5V tolerant, but with a strange count-limit.
(how does one pin know/care, how many other pins are at 5V ?)

Atmel ATF1504BE (ATF1508BE soon) - does not have clamp
diodes to VccIO, and OD-ESD fires at ~5.6V, but they do not spec 
continual 5V operation.

Can you clarify the '5V tolerant' details ?

-jg

Article: 124332
Subject: Re: Population Count circuit
From: acd <acd4usenet@lycos.de>
Date: Tue, 18 Sep 2007 13:25:24 -0700
Links: << >> << T >> << A >>

On 18 Sep., 21:52, "John_H" <newsgr...@johnhandwork.com> wrote:
> Is your task looking at one input vector, determining which of many
> population counts need to be incremented and providing the many counts to
> the user after the full population represented by the input vectors have
> been processed?

Sorry, I thought the term "population count" was clear.
I mean counting the number of ones in a given input vector.
In this case, I want a pipelined circuit which can process a new input
vector every clock cycle, giving the result a fixed number of cycles
later at the output.
As mentioned, a tree of adders with increasing width would do it.
But I think using only a full adder at the last stage may be faster/
cheaper.

Andreas

Article: 124333
Subject: Re: Population Count circuit
From: "John_H" <newsgroup@johnhandwork.com>
Date: Tue, 18 Sep 2007 14:11:06 -0700
Links: << >> << T >> << A >>

"acd" <acd4usenet@lycos.de> wrote in message 
news:1190147124.046912.180750@y42g2000hsy.googlegroups.com...
> On 18 Sep., 21:52, "John_H" <newsgr...@johnhandwork.com> wrote:
>> Is your task looking at one input vector, determining which of many
>> population counts need to be incremented and providing the many counts to
>> the user after the full population represented by the input vectors have
>> been processed?
>
> Sorry, I thought the term "population count" was clear.
> I mean counting the number of ones in a given input vector.
> In this case, I want a pipelined circuit which can process a new input
> vector every clock cycle, giving the result a fixed number of cycles
> later at the output.
> As mentioned, a tree of adders with increasing width would do it.
> But I think using only a full adder at the last stage may be faster/
> cheaper.
>
> Andreas

Thanks for the clarification.
What's the size of your N?
A simple pipelined tree will give you speed limited by the largest 
register-to-register adder which is pretty good.  Do you need a higher speed 
still?

When Altera and Xilinx had maximum speeds just over 100 MHz, I had a "ones 
counter" that digested 32 bits at a time differently in the two 
architectures where my maximum frequency was a high 90 MHz.  The 
implementation was different for the two approaches to leverage some Altera 
logic paths that were faster than I could get with simple adders.  For where 
FPGAs are now, I woulnd't expect the different flow to produce any 
significant gain, especially for larger N.  My recollection is the alternate 
approach's delays went up linearly with N while the pipelined tree goes up 
as log(N).  The approach wasn't something I found in a book but was 
home-grown.

There are other ways, but your absolute "best" implementation depends on 
your requirements for N, for speed, and for latency.

- John_H

Article: 124334
Subject: Re: Altera / Lattice / Xilinx CPLDs ?
From: Jon Elson <elson@wustl.edu>
Date: Tue, 18 Sep 2007 16:15:24 -0500
Links: << >> << T >> << A >>



John_H wrote:
> "Amontec, Larry" <laurent.gauch@ANTI-SPAMamontec.com> wrote in message 
> news:46eeb60f$1_6@news.bluewin.ch...
> 
>>Hi
>>
>>We are searching a small CPLD gate count like a coolrunner 128.
>>
>>- Two IO banks 1.4V to 3.3V with 5V tolerant
>>- VCC should be 3.3V or 1.8V
>>
>>The 5V tolerant is important !
>>
>>Volume : 5000 - 10000 pces
>>
>>Any CPLDs ?
>>
>>Regards,
>>Laurent
> 
> 
> For my own senseless curiosity, would you mind mentioning the application? 
> I'm wondering what new designs require the way-over-the-hill 5V standard. 
> The continuing requirements for 5V interfaces baffle me. 
> 
> 
Well, OK, here's one:  We have a custom ASIC made through MOSIS in the 
5V AMIS
C5N process.  It has 16 channels of traditional nuclear instrumentation 
front end, everything but the ADC.  We are aiming for 14-bit resolution, 
and getting a real 12+ bits through the entire system.  So, we use the 
5V analog range to help us meet the 14-bit requirement.  Admittedly, we 
are pushing CMOS just about as far as you can go for dynamic range.
We have a 5V Xilinx CPLD on each board with two of the ASICs, and a 5V
Spartan on the motherboard to decode and assemble control signals.

Jon

Article: 124335
Subject: Re: Guess: what is the largest number of state machines in a current
From: Jim Granville <no.spam@designtools.maps.co.nz>
Date: Wed, 19 Sep 2007 09:17:05 +1200
Links: << >> << T >> << A >>

Weng Tianxiang wrote:
> 
> I cannot guess the largest number of state machines you have written
> for a design, but I know clearly the number of state machines you may
> have written in a design is less than 100k. Any question?

More questions? you still have not answered older ones !?

Here is my question again :

So are you talking about a Silicon Ceiling, or a Software Ceiling ?

Since a single FF_CE can be considered a state machine, the silicon 
limit will vary with FPGA, and be higher than any
practical requirement.

If there is some lower SW ceiling (that will clearly be SW release 
dependant) then that only matters, if it is below someone's real design 
needs.

-jg

Article: 124336
Subject: Re: Population Count circuit
From: John McCaskill <jhmccaskill@gmail.com>
Date: Tue, 18 Sep 2007 21:19:05 -0000
Links: << >> << T >> << A >>

On Sep 18, 4:29 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> acd wrote:
> > I am currently studying some problems which require many population-
> > count operations of various sizes.
> > I have used population count in a course I taught last year, as it
> > can demonstrate various design styles (function, recursive function,
> > nested GENERATE statements, etc.) in VHDL.
> > I used a tree of full adders and the performance on a Spartan 3 as
> > reported by the xilinx tools was moderate (using pipelining). *
> > I think that one should get a much faster implementation even on an
> > FPGA if one uses redundant representations very much like
>
>  > multipliers do (which do essentially several population circuits
>  > of the partial products).
>
> The full adder tree is known as a carry save adder tree.
>
> As far as I know, it is the best for large N, and assuming that
> a full adder is a reasonable logic unit.
>
> For FPGAs with LUTs with other than three inputs, and for not
> so large N, it might be that there are other implementations,
> but I don't believe they are all that much better.
>
> For not so large N, it is the boundary cases that become significant,
> similar to the problem of how many circles you can fit inside a
> given sized square without overlapping, as the size of the square
> increases.
>
> With 4 input LUTs, two implemented as full adders will convert
> three inputs into a two bit (ones and twos) output.  Three will
> convert four bits into three outputs (ones, twos, fours).
> When N isn't so large, in many cases the number on inputs
> needed at a given level is not a multiple of three, and the
> four input LUTs can be useful.
>
> I don't believe the result will be much faster, (in number
> of logic stages), but might use fewer LUTs.
>
> The last one I did only needed to output 0, 1, 2, 3 or more, which
> simplified the logic a little (but not a lot) from 36 inputs.
>
> -- glen



In addition to this, if you have block rams that you are not using for
anything else, they can be used for a look up table.  Four 16K by 1
bit block rams would give you the population count of a 14 bit
vector.  They can also be combined with the LUTs for a hybrid
solution.

Regards,

John McCaskill
www.fastertechnology.com

Article: 124337
Subject: Re: Population Count circuit
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Tue, 18 Sep 2007 13:29:01 -0800
Links: << >> << T >> << A >>

acd wrote:

> I am currently studying some problems which require many population-
> count operations of various sizes.

> I have used population count in a course I taught last year, as it
> can demonstrate various design styles (function, recursive function,
> nested GENERATE statements, etc.) in VHDL.
> I used a tree of full adders and the performance on a Spartan 3 as
> reported by the xilinx tools was moderate (using pipelining). *

> I think that one should get a much faster implementation even on an
> FPGA if one uses redundant representations very much like 
 > multipliers do (which do essentially several population circuits
 > of the partial products).

The full adder tree is known as a carry save adder tree.

As far as I know, it is the best for large N, and assuming that
a full adder is a reasonable logic unit.

For FPGAs with LUTs with other than three inputs, and for not
so large N, it might be that there are other implementations,
but I don't believe they are all that much better.

For not so large N, it is the boundary cases that become significant,
similar to the problem of how many circles you can fit inside a
given sized square without overlapping, as the size of the square
increases.

With 4 input LUTs, two implemented as full adders will convert
three inputs into a two bit (ones and twos) output.  Three will
convert four bits into three outputs (ones, twos, fours).
When N isn't so large, in many cases the number on inputs
needed at a given level is not a multiple of three, and the
four input LUTs can be useful.

I don't believe the result will be much faster, (in number
of logic stages), but might use fewer LUTs.

The last one I did only needed to output 0, 1, 2, 3 or more, which
simplified the logic a little (but not a lot) from 36 inputs.

-- glen

Article: 124338
Subject: Re: Tristate bus on spartan FPGA
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Tue, 18 Sep 2007 13:33:18 -0800
Links: << >> << T >> << A >>

Mike Treseler wrote:

(I wrote)

>>I thought the tools would synthesize the appropriate MUX given
>>tristate buffer logic.  They probably do that better than
>>explicitly programmed MUX logic.

> I would expect the same utilization for either description.
> The question is which description is easier for
> the designer to write and test.
> That of course, depends on the designer.

Someone from Xilinx posted a method that uses the FF's in the
FPGA to do this.  I presume that only works if the output is
registered, but it does seem to be an interesting solution.
I doubt it will generate that from explicit MUX logic, though.

Also, the cases with more than one selected at the same time
should be "don't care" states, which MUX logic might not
take into consideration.

-- glen

Article: 124339
Subject: Re: Peripheral Trouble!
From: John McCaskill <jhmccaskill@gmail.com>
Date: Tue, 18 Sep 2007 22:01:16 -0000
Links: << >> << T >> << A >>

On Sep 18, 12:08 pm, "MJ Pearson" <mjp...@york.ac.uk> wrote:
> >On 13 Set, 14:21, "MJ Pearson" <mjp...@york.ac.uk> wrote:
> >> Hello,
>
> >> I am using the xilinx virtex II development board xupv2p. I have built
> an
> >> expansion board that connects to the high speed expansion port, and
> >> delivers information from a camera. In ISE, I have produced what I
> think
> >> is a working piece of vhdl to synchronise the camera data (there will
> be 2
> >> data sets incoming) to the FPGA clock.
>
> >> What I would like to do is just read out this incoming data so I can
> check
> >> it. My idea was to use a microblaze processor - I will need one to do
> some
> >> processing at a later stage, and just do a printf to write the data to
> >> hyper-terminal.
>
> >> I used the create / import peripheral wizard, and have been hacking
> the
> >> user_logic file to incorporate my synchronization design. I am a bit
> >> unsure of ports though - do I have any user ports - are these the
> input
> >> pins??? I have altered the UCF file to assign the pins of the
> expansion
> >> port to my inputs in my vhdl file. Should I write this data to a slave
> >> register??
>
> >> Then in my C file, is it just a simple read and printf procedure.
>
> >> Any help / ideas I'd be grateful,
>
> >> Thanks
>
> >> Marc.
>
> >Hi,
> >I don't know if I have understood your point but it seems to me that
> >you want to print data coming from a camera.
>
> >If so, you should create a peripheral (like you have already done)
> >with some SW registers. This registers are the interface between the
> >peripheral and the microcontroller. The additional ports of your
> >peripheral will be the interface between the peripheral itself and the
> >camera.
> >So your IP, once acquired the desired data, should write something to
> >a register so that you can read it from the microprocessor. Probably
> >you also need to write a simple driver (in C language) to find if
> >there's someting new in the register, read the register and so on (the
> >driver isn't strictly necessary).
> >Googoling around you'll find all the details.
>
> >Hope this help a little.
>
> >Andrea
>
> Thanks for the reply.
>
> Have got (a little) further with this....
>
> I have made a number of external input ports for the inputs from my
> camera(s). I have altered the ucf file, mpd, mhs...
>
> I used the create / import peripheral wizard, and as mentioned have
> altered the user_logic.vhdl file. Do I need to map my external ports on
> the peripheral .vhdl file (created by the wizard)?
>
> I get an error when I build the netlist:
>
> ERROR:MDT - HDL synthesis failed!
> INFO:MDT - Refer to
>
> C:\MATLAB\R2006a\work\edkStuff\periphWizard\synthesis\camtoleds_0_wrapper_xst
>    .srp for details
>
> ERROR:MDT - platgen failed with errors!
>
> Looking at the .srp file :
>
> Formal CAM1_I_0 of entity with no default value must be associated with an
> actual value.
>
> CAM1_I_0 is an input port in my user logic .vhdl file. Which I have (think
> I have) mapped to an external pin via my ucf file:
>
> UCF snippet:
> NET "CAM_IN_0_pin" LOC = "AE5";
> NET "CAM_IN_0_pin" IOSTANDARD = LVTTL;
>
> MHS:
> PORT CAM1_I_0 = CAM_IN_0
>
> .mpd:
> PORT CAM1_I_0 = "", DIR = I
>
> Sorry if all this sounds pathetic, I have no idea how to fix that error, I
> am a bit lost!!



You may want to look at the "Platform Specification Format Reference
Manual" a.k.a. UG131.  It is included with EDK and is located at $EDK/
doc/psf_rm.pdf.  Take a look at the section on ports, and the key word
IOB_STATE.

In your .mpd, you have not specified the IOB_STATE, so it will default
to INFER, which will cause EDK to instantiate its own IBUF in the top
level VHDL file.  If you had an IBUF already in your code, this would
cause a problem.  I don't know if it would cause the problem you are
seeing.  If that is not it, read through the psf_rm, it describes how
all the EDK files work.

Regards,

John McCaskill
www.fastertechnology.com

Article: 124340
Subject: Re: Population Count circuit
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Tue, 18 Sep 2007 14:47:27 -0800
Links: << >> << T >> << A >>

acd wrote:

(snip)

> Sorry, I thought the term "population count" was clear.
> I mean counting the number of ones in a given input vector.
> In this case, I want a pipelined circuit which can process a new input
> vector every clock cycle, giving the result a fixed number of cycles
> later at the output.

> As mentioned, a tree of adders with increasing width would do it.
> But I think using only a full adder at the last stage may be faster/
> cheaper.

What do you mean by faster?  Depending on your clock rate, you might do
more than one stage of adders between pipeline latches.  That will give
you the result sooner.  In most FPGAs there are more latches than 
needed, so there is no savings.  It might be that some design could use
slightly less logic (LUTs), but it won't be a lot less.

As someone else mentioned, using block RAMs, if you have them available,
might help.

Otherwise, if your clock rate is slow you might be able to reuse some of
the logic.  That is, trade speed for logic using a faster clock on the
pipeline registers than the data input.

-- glen

Article: 124341
Subject: Re: Guess: what is the largest number of state machines in a current chip design: 1k, 10k, or...
From: Weng Tianxiang <wtxwtx@gmail.com>
Date: Tue, 18 Sep 2007 17:07:36 -0700
Links: << >> << T >> << A >>

On Sep 18, 11:43 am, "John_H" <newsgr...@johnhandwork.com> wrote:
> "Weng Tianxiang" <wtx...@gmail.com> wrote in message
>
> news:1190135851.404698.228480@y42g2000hsy.googlegroups.com...
>
> > Hi,
> > 1. I am talking about GUESSING the largest number of state machines a
> > current finished design may have. Not ceiling.
>
> My official guess:
>   light blue.

Hi John_H,
1. The guessing is missing target.
2. The design falls in every person's blind point who have responded
to the post so far.

Weng

Article: 124342
Subject: Re: Looking for fast AES cores with low latency
From: IDDLife <xing.starwill@gmail.com>
Date: Tue, 18 Sep 2007 17:45:40 -0700
Links: << >> << T >> << A >>

On Sep 19, 12:34 am, "Sylvain Munaut <Some...@SomeDomain.com>"
<246...@gmail.com> wrote:
> On Sep 18, 5:35 pm, Allan Herriman <allanherri...@hotmail.com> wrote:
>
> > Hi,
>
> > Since the initial rash of AES / Rijndael cores a few years ago, I
> > haven't seen much research at the high speed end.
>
> > Does anyone know how low the latency is for a recent high-end core in
> > a current FPGA family?
> > A quick web search reveals plenty of heavily pipelined implementations
> > with poor latency, but none that are really quick in terms of latency.
>
> > Thanks,
> > Allan
>
> What kind of frequency / latency are you looking for ?
>
> Most core can pretty easily be "de-pipelinined" to diminish
> latency but degrade frequency ...
>
>    Sylvain

I realized the AES algorithm several months ago and tried to find out
the highest frequency. However, using the GF calculation, the cost of
FPGA resource may be less.

Article: 124343
Subject: Re: Tristate bus on spartan FPGA
From: aravind <aramosfet@gmail.com>
Date: Tue, 18 Sep 2007 19:46:03 -0700
Links: << >> << T >> << A >>

On Sep 18, 5:31 pm, Jon Beniston <j...@beniston.com> wrote:
> On 18 Sep, 07:33, aravind <aramos...@gmail.com> wrote:
>
> > Hi, im implementing a 16bit bus along the lines of AMBA APB for some
> > of my peripherals like IDE ATA controller, LCD dsplay controller, ftdi
> > usb interface etc. But i found that xilinx spartan devices have no
> > internal tristate buffers.
> > I have a dozen or more peripherals to connect. Any idea of how i can
> > implement this?
>
> > thanks,
> > aravind
>
> As the others have said, use muxes on chip. Also, doesn't AMBA APB
> uses muxes anyway?
>
> Cheers,
> Jon

 True, AMBA 2.0 suggests we could use separate read and write data
buses. There shouldn't be a problem for write data bus. but for Read ,
multiple peripherals will be driving the PRDATA signals of AMBA APB
controller. Anyway multiplexing seems to be the only way to implement
it. Just wondering, What approach do ASICs, Processors etc use these
days when a tristate bus is ruled out?

Article: 124344
Subject: Re: Guess: what is the largest number of state machines in a current chip design: 1k, 10k, or...
From: Shannon <sgomes@sbcglobal.net>
Date: Wed, 19 Sep 2007 03:08:46 -0000
Links: << >> << T >> << A >>

On Sep 18, 5:07 pm, Weng Tianxiang <wtx...@gmail.com> wrote:
> On Sep 18, 11:43 am, "John_H" <newsgr...@johnhandwork.com> wrote:
>
> > "Weng Tianxiang" <wtx...@gmail.com> wrote in message
>
> >news:1190135851.404698.228480@y42g2000hsy.googlegroups.com...
>
> > > Hi,
> > > 1. I am talking about GUESSING the largest number of state machines a
> > > current finished design may have. Not ceiling.
>
> > My official guess:
> >   light blue.
>
> Hi John_H,
> 1. The guessing is missing target.
> 2. The design falls in every person's blind point who have responded
> to the post so far.
>
> Weng

I guess 28.

Article: 124345
Subject: Re: how to bidirectional signal in xilinx EDK tool ?
From: "lionheart70" <lhuipeng@dso.org.sg>
Date: Tue, 18 Sep 2007 23:22:08 -0400
Links: << >> << T >> << A >>

I encountered the same error recently when I tried to create a custom IP
for EDK that has a INOUT data bus.
How I remove the error is by moving my tri-state buffers from the IP VHDL
codes to the MHS file. In this way, I have 3 signals for the data bus,
data_I, data_O and data_T from the IP.  
Apparently, EDK handles the bidirectional signal in the MHS file. 
For syntax example, you can refer to EDK IP, IIC(or was it I2C) bus
controller's MPD and the project MHS after you've added the IIC bus
controller. 
Disclaimer : I manage to remove the error and generate the bitstream, but
as I don't have the hardware to test. This method was not tested.

Hope this help.

Cheers!
lionheart70

Article: 124346
Subject: Re: Guess: what is the largest number of state machines in a current
From: John_H <newsgroup@johnhandwork.com>
Date: Wed, 19 Sep 2007 05:12:13 GMT
Links: << >> << T >> << A >>

Weng Tianxiang wrote:
> On Sep 18, 11:43 am, "John_H" <newsgr...@johnhandwork.com> wrote:
>> "Weng Tianxiang" <wtx...@gmail.com> wrote in message
>>
>> news:1190135851.404698.228480@y42g2000hsy.googlegroups.com...
>>
>>> Hi,
>>> 1. I am talking about GUESSING the largest number of state machines a
>>> current finished design may have. Not ceiling.
>> My official guess:
>>   light blue.
> 
> Hi John_H,
> 1. The guessing is missing target.
> 2. The design falls in every person's blind point who have responded
> to the post so far.
> 
> Weng

To the extent that my guess is as applicable as anyone elses guess, I 
stand behind it.

You are asking a seriously senseless question and frankly I'm tired of 
watching the thread drone on and on and on so I just added a little to 
it because of the absurdity.

Who cares?!

Article: 124347
Subject: Re: Looking for fast AES cores with low latency
From: backhus <nix@nirgends.xyz>
Date: Wed, 19 Sep 2007 08:51:42 +0200
Links: << >> << T >> << A >>

Hi Allan,
the minimum latency of an AES-Core (at a reasonable clock frequency) is 
limited by the number of rounds (iterations) needed. That number depends 
mainly on the keylength.
  128 Bit Key : Round Number 10
  192 Bit Key : Round Number 12
  256 Bit Key : Round Number 14

There is an initial Round 0, but the latency of that can be eliminated 
by design. So the latency for a simple AES-128 Core will always be at 
least 10 clock cycles.
If you have enough chip area to unroll the rounds, only the initial 
latency (for the first conversion) needs that number of clock cycles. 
All following blocks are calculated on each following clock cycle 
because of the data pipelining in the unrolled architecture.

You may take a look at this paper:
http://www.i3m.hs-bremen.de/internet/download/elis/aes_i3m_overview.pdf

Please keep in mind that the clock frequencies given in this paper are 
examples only for the old Virtex-E FPGAs. Actual FPGAs perform much better.

Best regards
   Eilert


Allan Herriman schrieb:
> Hi,
> 
> Since the initial rash of AES / Rijndael cores a few years ago, I
> haven't seen much research at the high speed end.
> 
> Does anyone know how low the latency is for a recent high-end core in
> a current FPGA family?
> A quick web search reveals plenty of heavily pipelined implementations
> with poor latency, but none that are really quick in terms of latency.
> 
> Thanks,
> Allan

Article: 124348
Subject: Re: Guess: what is the largest number of state machines in a current chip design: 1k, 10k, or...
From: "comp.arch.fpga" <ksulimma@googlemail.com>
Date: Wed, 19 Sep 2007 07:55:06 -0000
Links: << >> << T >> << A >>

On 18 Sep., 19:17, Weng Tianxiang <wtx...@gmail.com> wrote:
> Hi,
> 1. I am talking about GUESSING the largest number of state machines a
> current finished design may have. Not ceiling.
But you do not react if someone answers your question. Can you beat
the
10k+ state machines of a smith-waterman DNA matcher?

> 3. A synchronous or an asynchronous reset signal is vital, either with
> clear routing or a hidden within other procedures.

Again, you did not read my post. Many state machines have no reset
signal.
For example the reset signal of a JTAG controller is optional. This is
a state machine that is implemented in virtually every complex piece
of silicon out there.

Kolja Sulimma

Article: 124349
Subject: Re: Population Count circuit
From: "Symon" <symon_brewer@hotmail.com>
Date: Wed, 19 Sep 2007 10:02:33 +0100
Links: << >> << T >> << A >>

"John McCaskill" <jhmccaskill@gmail.com> wrote in message 
news:1190150345.057393.118080@g4g2000hsf.googlegroups.com...
> On Sep 18, 4:29 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>> acd wrote:
>
> In addition to this, if you have block rams that you are not using for
> anything else, they can be used for a look up table.  Four 16K by 1
> bit block rams would give you the population count of a 14 bit
> vector.  They can also be combined with the LUTs for a hybrid
> solution.
>
> Regards,
>
> John McCaskill
> www.fastertechnology.com
>
...furthermore, you can do 28 bits if you're prepared to add the outputs of 
the two ports of the dual port RAMs.
HTH., Syms.

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search