Messages from 40150

Article: 40150
Subject: Call for Papers CHES 2002
From: Christof Paar <christof@ece.wpi.edu>
Date: Thu, 28 Feb 2002 15:56:04 -0500
Links: << >> << T >> << A >>



     Workshop on Cryptographic Hardware and Embedded Systems 2002
                            (CHES 2002)
                       www.chesworkshop.org

         Hotel Sofitel, San Francisco Bay (Redwood City), USA
                       August 13 - 15, 2002


                      Second Call for Papers

General Information

The focus of this workshop is on all aspects of cryptographic hardware
and security in embedded systems. The workshop will be a forum of new
results from the research community as well as from the industry. Of
special interest are contributions that describe new methods for
efficient hardware implementations and high-speed software for
embedded systems, e.g., smart cards, microprocessors, DSPs, etc. We
hope that the workshop will help to fill the gap between the
cryptography research community and the application areas of
cryptography. Consequently, we encourage submissions from academia,
industry, and other organizations. All submitted papers will be
reviewed.

This will be the fourth CHES workshop. CHES '99 and CHES 2000 were
held at WPI, and CHES 2001 was held in Paris. The number of
participants has grown to more than 200, with attendees coming from
industry, academia, and government organizations. The topics of CHES
2002 include but are not limited to:

   * Computer architectures for public-key and secret-key cryptosystems
   * Efficient algorithms for embedded processors
   * Reconfigurable computing in cryptography
   * Cryptographic processors and co-processors
   * Cryptography in wireless applications (mobile phone, LANs, etc.)
   * Security in pay-TV systems
   * Smart card attacks and architectures
   * Tamper resistance on the chip and board level
   * True and pseudo random number generators
   * Special-purpose hardware for cryptanalysis
   * Embedded security
   * Device identification



Instructions for Authors

Authors are invited to submit original papers. The preferred
submission form is by electronic mail to
submission@chesworkshop.org. The submissions must be anonymous, with
no author names, affiliations, acknowledgments, or obvious
references. Papers should be formatted in 12pt type and not exceed 12
pages (not including the title page and the bibliography). Please
submit the paper in Postscript or PDF, together with an extra file
containing the email and physical address of the authors, and an
indication of the corresponding author. We recommend that you generate
the PS or PDF file using LaTeX, however, MS Word is also
acceptable. All submissions will be refereed. Only original research
contributions will be considered. Submissions must not substantially
duplicate work that any of the authors have published elsewhere or
have submitted in parallel to any other conferences or workshops that
have proceedings.



Important Dates

 Submission Deadline:          May 1st, 2002.
 Acceptance Notification:      July 1st, 2002.
 Final Version due:            August 1st, 2002.
 Workshop:                     August 13th - 15th, 2002.

NOTE: The CHES dates August 13th - 15th are the Tuesday through
Thursday preceeding CRYPTO 2002 which starts on evening of Sunday,
August 18th.



Mailing List

If you want to receive emails with subsequent Call for Papers and
registration information, please send a brief mail to
mailinglist@chesworkshop.org.



Program Committee

Beni Arazi, Ben Gurion University, Israel
Jean-Sebastien Coron, Gemplus Card International, France
Kris Gaj, George Mason University, USA
Craig Gentry, DoCoMo Communications Laboratories, USA
Jim Goodman, Lumic Electronics, Canada
M. Anwar Hasan, University of Waterloo, Canada
David Jablon, Phoenix Technologies, USA
Peter Kornerup, Odense University, Denmark
Pil Joong Lee, Pohang Univ. of Sci. & Tech., Korea
Preda Mihailescu, University of Paderborn, Germany
David Nacchache, Gemplus Card International, France
Bart Preneel, Universite Catholique de Louvain, Belgium
Jean-Jacques Quisquater, Universite Catholique de Louvain, Belgium
Erkay Savas, rTrust Technologies, USA
Joseph Silverman, Brown University and NTRU Cryptosystems, Inc., USA
Jacques Stern, Ecole Normale Superieure, France
Berk Sunar, Worcester Polytechnic Institute, USA
Colin Walter, Computation Department - UMIST, U.K.



Organizational Committee

All correspondence and/or questions should be directed to either of
the Organizational Committee Members:

Burt Kaliski
(Program Chair)
RSA Laboratories
20 Crosby Drive
Bedford, MA 01730, USA
Phone: +1 781 687 7057
Fax: +1 781 687 7213
Email: bkaliski@rsasecurity.com

Cetin Kaya Koc
(Local Organization)
Dept. of Electrical & Computer Engineering
Oregon State University
Corvallis, Oregon 97331, USA
Phone: +1 541 737 4853
Fax: +1 541 737 8377
Email: Koc@ece.orst.edu

Christof Paar
(Publicity Chair)
Electrical Eng. & Information Sciences Dept.
Ruhr-Universitaet Bochum
44780 Bochum, Germany
Phone: +49 234 32 23988
Fax: +49 234 32 14444
Email: cpaar@crypto.ruhr-uni-bochum.de



Workshop Proceedings

The post-proceedings will be published in Springer-Verlag's Lecture
Notes in Computer Science (LNCS) series. Notice that in order to be
included in the proceedings, the authors of an accepted paper must
guarantee to present their contribution at the workshop.

Article: 40151
Subject: Re: Synopsys Design Compiler
From: acher@in.tum.de (Georg Acher)
Date: 28 Feb 2002 21:40:24 GMT
Links: << >> << T >> << A >>

In article <d049f91b.0202281030.206aeb6b@posting.google.com>,
 kayrock66@yahoo.com (Jay) writes:
|> I'm trying to pitch that my client use Synopsys Design Compiler
|> instead of an FPGA specific synthesizer from another vendor since his
|> Xilinx Vertex 2 FPGA is a proto for a standard cell part.  The clock
|> speed isn't important, verification of the tool flow and design
|> database is.
|> 
|> The problem I'm running into is that the Design Compiler output uses
|> almost 200% the LUTs compared to the purpose built FPGA synthesizer. 
|> So the logic will no longer fit the proto board.
|> 
|> Mini Example:
|> Design compiler: 1760 LUTS
|> FPGA synthesizer: 824 LUTS
|> 
|> Design compiler synthesizes to cells like AND2, OR2, AND4, etc whereas
|> the FPGA specific tool maps directly to special LUTs custom made for
|> the logic required like LUT_AB5A and LUT_67FE, etc.  Now I figured the
|> Xilinx mapper would be smart enough to "map" the Design Compiler AND2,
|> OR2, etc, into more compact LUT_ABCD and LUT_6534 type cells but just
|> seems to be doing a 1 for one map with no optimization.
|> 
|> It appears that Xilinx did not write the mapper optimization (option
|> -oe) for the recent products Vertex E/2 an Spartan 2 in effect giving
|> up support for Design Compiler.
|> 
|> Can any one else comment on this?  It seems crazy that I can't use the
|> old man of sythesis (Design Compiler) at $100k seat anymore.

My last experiences with DC are only on the 4k-Series (now I'm using
fpga_compiler2 for Spartan2), but maybe this helps:

"map -help spartan2" shows among others:

 -k 4|5|6          Function size for covering combinational logic.  If -k
                       is not specified, the default is -k 4.  This gives the
                       best balance of runtime to quality of results.  Using
                       5 or 6 can give superior results at the expense of 
                       runtime.

So try to use -k 6, this can make design much smaller (and faster...)

Have you used the usual tweaks in DC, like
"compile -boundary_optimization -map_effort high" and
"compile -map_effort high -ungroup_all" afterwards?

Are you sure that the reset net is replaced by the STARTUP-symbol and removed
in the design ("disconnect_net reset -all")?

-- 
         Georg Acher, acher@in.tum.de         
         http://www.in.tum.de/~acher/
          "Oh no, not again !" The bowl of petunias

Article: 40152
Subject: Altera FPGAs
From: prashantj@usa.net (Prashant)
Date: 28 Feb 2002 13:45:52 -0800
Links: << >> << T >> << A >>

hi,

Do the different devices in the APEX20KE family have different maximum
speeds of operation. e.g. Would EP20K1500E be expected to run much
faster than EP20K160E ?

I'm trying to implement a 16x16 combinational multiplier in the
EP20K160E. But the timing simulations seem to take 40-50 ns for a
single multiply.

Anyone have any ideas, how can I speedup the operation. Would mapping
the multiplier on a EP20K1500E help ? I'm using the demo version hence
I can't map to EP20K1500E. But I would appreciate it if anyone knows
the speed improvement from 160E to 1500E.

Thanks,
Prashant

Article: 40153
Subject: Re: PCI book ... still confused
From: Kevin Brace <ihatespam99kevinbraceusenet@ihatespam99hotmail.com>
Date: Thu, 28 Feb 2002 16:13:27 -0600
Links: << >> << T >> << A >>

        One document I think you are forgetting to obtain is PCI Local
Bus Specification Revision 2.2.
You can purchase this specification from http://www.pcisig.com.
The reason I think this document is more important than the other three
books you mentioned is because Appendix B of PCI Specification have
sample state machines of a target interface and a master (initiator)
interface.
Although, I added a few more states in my PCI IP core design, my PCI IP
core's state machine for target and master pretty much resembles the
sample state machines of Appendix B.
        Regarding the three books you mentioned, I own 
PCI System Architecture 4th Edition (Costs $39.95 ISBN 0-201-30974-2)
and PCI Hardware and Software 4th Edition (Costs about $100 ISBN
092939259-0).
The problem with both books I think is that they are pretty much bus
protocol books, and doesn't discuss how a PCI IP core should be designed
(Neither books have even a single state machine diagram in them.), and
how the designer should design it to meet timings (Setup time (Tsu < 7ns
for 33MHz PCI and Tsu < 3ns for 66MHz PCI) is the hardest part to meet).
So, if what you want from those two books is how you should design a PCI
IP core, it won't be there.
As a bus protocol book, I will say that PCI System Architecture 4th
Edition is more for a beginner, and it is fairly easy to read, so it is
not bad having it, but the PCI specification is also well written, and
easy to read, so one may have to ask itself, "Do I really have to have
PCI System Architecture 4th Edition?"
PCI Hardware and Software 4th Edition is for an experienced designer,
and it contains a lot of details, but I feel like because it contains
too much details, it is hard to read.
Another thing I don't like about it will be the diagrams shown in the
book which looks ugly compared to those of PCI Specification or PCI
System Architecture 4th Edition.
Maybe PCI and PCI-X Hardware and Software 5th Edition is better, but I
won't count on it.
To tell you the truth, when I developed (Although it is still not done
yet.) my PCI IP core, I pretty much relied on PCI Specification for
making design decisions, and rarely referenced PCI System Architecture
4th Edition or PCI Hardware and Software 4th Edition.
So, I will say that PCI System Architecture 4th Edition and PCI Hardware
and Software 4th Edition are nice to have them, it is not absolutely
necessary to have it like the PCI Specification.
        Regarding the more important part of how to implement a PCI IP
core, first thing you will like to do is to download copies of Xilinx
LogiCORE PCI Design Guide from Xilinx, Altera PCI MegaCore Function User
Guide from Altera, and Synopsys DWPCI (a PCI IP core for ASICs) Data
Book from Synopsys.
That should give you a picture of what a PCI IP core is like, and what
the backend user interface is like.
However, those documents won't discuss you how they designed the inside,
so you will have to figure out that yourself (Otherwise, no one will pay
several thousand of dollars for a license.).
        From my experience of designing a vendor independent PCI IP
core, and implementing it in Xilinx Spartan-II XC2S150-5, the hardest
part of the design was meeting the setup time of Tsu < 7ns for 33MHz
PCI.
Meeting Clock-to-Output Valid (Tval) of Tval < 11ns for 33MHz PCI and
Tval < 6ns should be easy assuming that you know how to constrain FFs
within the IO pads (IOB FFs in Xilinx and IOE FFs in Altera.).
The reason meeting the setup time is so hard is because in PCI there are
cases where the registered version of control signals cannot be used
like when doing a no-wait cycle burst transfer (in initiator mode,
unregistered version of DEVSEL#, TRDY#, and STOP# has to be monitored,
and in target mode, unregistered version of FRAME# and IRDY# has to be
monitored.), or when asking for a disconnect in target mode
(unregistered version of FRAME# has to be monitored in this case), and
instead unregistered version has to be used.
These unregistered signals have to go through multiple levels of LUTs
before reaching a FF, and to meet the setup time, you cannot go through
too many levels of LUT.
Assuming that you use Spartan-II-5, and these control signals are close
to each other, going through 3 levels of LUT should still meet Tsu < 7ns
with automatic P&R.
If you don't mind doing manual floorplanning, 4 levels of LUT should
still meet Tsu < 7ns.
        Another difficult part of a PCI IP design is how to update the
output port of AD[31:0] during a target read or an initiator write.
IRDY# in target mode, and TRDY# in initiator mode will influence the
output port of AD[31:0], but the problem here is that unlike the above
case of unregistered input control signals to a FF of output control
signals (in target mode, unregistered FRAME# and IRDY# to DEVSEL#,
TRDY#, and STOP#'s output port, and in initiator mode, unregistered
DELSEL#, TRDY#, and STOP# to FRAME# and IRDY#'s output port.), the
routing distance will likely be large, so the levels of LUT will have to
be even less than the above (Should be at most 2 levels of LUT.).
Use Clock Enable (CE) input of a FF to keep the levels of LUT low.
If you are using Xilinx devices like Spartan-II, you should use Xilinx's
infamous and secret PCILOGIC.
Only Virtex/Virtex-E/Spartan-II/Spartan-IIE (Virtex-II doesn't support
it) support PCILOGIC.
Basically, PCILOGIC has bunch of NAND gates that generates CE signals
for IOB FFs, and I guess the benefit of using it is that it supposedly
allows predictable timings.
To keep your code generic for ASIC porting or in use with other FPGAs
(i.e., Altera FPGAs), you should try to "emulate" this PCILOGIC with
regular LUTs.
One 5-input LUT, or three 4-input LUTs (will have 2 levels of 4-input
LUT) can emulate PCILOGIC.
If you are interested, I can post a sample Verilog code that works with
ISE WebPACK 4.1.
        Parity generation is another issue during a target read cycle
because unregistered C/BE#[3:0] has to be through some kind of a parity
generator in Tsu < 7ns.
The trick is to compute the parity of AD[31:0] that is getting read
ahead, since there is 30ns to do so, and merge that with C/BE#[3:0] to
compute the final parity that goes out to PAR.
Virtex's 5-input LUT can handle this nicely, but I haven't been able to
instruct XST (ISE WebPACK's synthesis tool) to do so even if KEEP
attribute is used.
Using a carry-chain parity generator might be better than using a
combinational parity generator, but I haven't figured out a way to
instruct XST to infer that.
        I recommend using Address/Data Stepping which, of course,
reduces bus utilization (performance) in initiator mode because GNT# has
to be asserted at least for 2 cycles (Won't be able to start a
transaction if GNT# is asserted for only one cycle. AD[31:0] and
C/BE#[3:0] has to be turned off immediately.), but will help you meet
Tsu for OE (Output Enable) FFs.
Xilinx and Altera uses this technique, presumably to meet Tsu < 3ns for
66MHz PCI.
        The bottom line is that if the logic is not designed carefully,
even a relatively new device like Xilinx Spartan-II won't meet even
33MHz PCI's setup time.
To meet the setup time, you will need to have good understanding of
target architecture (Like what the delays of various internal resources
like LUT is like.).
Since I got mine to meet 33MHz PCI timings comfortably with Verilog, I
don't believe there is any necessity to use Schematics, but you should
never be hesitant to do manual floorplanning because automatic P&R tool
just doesn't know the correct location of where the LUT should be
placed.
Human brain can do a better job than what software can.




Kevin Brace (Don't respond to me directly, respond within the
newsgroup.)




Matthias Scheerer wrote:
> 
> Hi there,
> 
> to see a posting about PCI books was very interesting for me, because we
> too have to decide, what PCI book to buy. We are currently developing
> FPGA/ASIC logic to connect to PCI and also software drivers (linux by
> now). We now have to decide whether to buy
> "PCI System Architecture, 4th Ed. and PCI-X System Architecture" (both)
> or
> "PCI and PCI-X Hardware and Software, 5th Ed."
> 
> Any comments on those (three) books ?
> 
> Thanks.
> Matthias

Article: 40154
Subject: Re: Simulation Question
From: John Williams <j2.williams@qut.edu.au>
Date: Fri, 01 Mar 2002 08:30:42 +1000
Links: << >> << T >> << A >>

Martin Thompson wrote:
> 
<snip>

Thank you both Martin and Muzaffer, that's just what I needed to know to
get started.

Cheers,

John
-- 
Dr John Williams,    Postdoctoral Research Fellow
Queensland University of Technology,   Brisbane,  Australia
Phone : (+61 7) 3864 2427           Fax : (+61 7) 3864 1517

Article: 40155
Subject: Re: share two months salary with you if you have job information
From: "Ryan Henderson" <hendersr@oit.edu>
Date: Thu, 28 Feb 2002 15:22:56 -0800
Links: << >> << T >> << A >>

Wow, I hope things really haven't gotten this bad....

-Ryan

Article: 40156
Subject: Re: Altera FPGAs
From: Peter Alfke <peter.alfke@xilinx.com>
Date: Thu, 28 Feb 2002 15:27:11 -0800
Links: << >> << T >> << A >>

Prashant wrote:

> I'm trying to implement a 16x16 combinational multiplier in the
> EP20K160E. But the timing simulations seem to take 40-50 ns for a
> single multiply.
>
> Anyone have any ideas, how can I speedup the operation.

Well, you could change to a Virtex-II chip that performs this multiplication
much faster
(<6 ns combinatorial delay, < 4 ns with internal pipeline)
Just a friendly reminder that there are other options...
I couldn't resist this opportunity!       :-)
Peter Alfke, Xilinx Applications

Article: 40157
Subject: Re: Altera FPGAs
From: nweaver@CSUA.Berkeley.EDU (Nicholas Weaver)
Date: Fri, 1 Mar 2002 00:36:03 +0000 (UTC)
Links: << >> << T >> << A >>

In article <ea62e09.0202281345.1467d3c2@posting.google.com>,
Prashant <prashantj@usa.net> wrote:
>hi,
>
>Do the different devices in the APEX20KE family have different maximum
>speeds of operation. e.g. Would EP20K1500E be expected to run much
>faster than EP20K160E ?
>
>I'm trying to implement a 16x16 combinational multiplier in the
>EP20K160E. But the timing simulations seem to take 40-50 ns for a
>single multiply.

A)  Restructure.  What does your multiplier look like?  It MAY (repeat
MAY) be better to go with a carry-save structure, or simply reorder
the adders.

B)  Pipeline.  Pipeline.  With FPGAs, ALWAYS pipeline!
-- 
Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Article: 40158
Subject: Re: Comparison between two FPGAs- what is decisive factor?
From: Ray Andraka <ray@andraka.com>
Date: Fri, 01 Mar 2002 01:16:01 GMT
Links: << >> << T >> << A >>

Nope,  different marketing gates.  The virtexII has a higher concentration
of memory, plus the multipliers contribute to the gate count.  You will
need to compare the number of slices, which  will leave you somewhat less
than 3:1 when comparing 2V6000 to 1000E.  Also, it is worth noting that
the cost is more or less exponential with device size.  For a given number
of slices, rather than a given number of marketing gates (and ignoring the
other goodies) the VIrtexE is very competitive.

king wrote:

> Hi all,
> I have a design which uses say X no of XCV1000E FPGAs. I wud like to
> go for denser FPGAs ( XC2V6000). The total system gates in XCV1000E is
> approximately 1.5 Million while in XC2V6000 is 6 Million. So can I
> assume that the logic implemented in four (6/1.5) FPGAs can be
> implemented using a single XC2V6000 FPGAs? But the LUTs of the two
> looks different. Will this affect the beforesaid ratio? Or is there
> any other decisive factors involved? Ur reply will be most welcom
> with kind regs
> king

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 40159
Subject: Re: Creation of FPGA tips and tricks forum - help required
From: Ray Andraka <ray@andraka.com>
Date: Fri, 01 Mar 2002 01:18:26 GMT
Links: << >> << T >> << A >>

Have you been to the comp.arch.fpga FAQ site maintained by Philip Freidin?  I
think that between this newsgroup and that website, all the intents of your
forum are satisfied.  I think it would probably serve the community better if
you contributed to the FAQ.

Paul wrote:

> Having come back to logic design after a 5 year break, I had a bit of a
> culture-shock with the myriad of different tools and their inherent
> strengths and weaknesses.
>
> I'm hoping to help make it a bit easier for others and expand my current
> limited understanding by creating a set of forums for discussion.
>
> My aim in creating the forums was:
>
> 1) Provide a forum for discussion of various programmable logic tools and
> how best to use them.
>
> 2) Provide a place to store tips and techniques used by programmable logic
> designers.
>
> 3) Complement the discussions on the main programmable logic newsgroups and
> perhaps go into more specific detail and provide more tutorial information
> to supplement the newsgroup information.
>
> 4) Provide an edited summary of valuable discussions on the newsgroups.
>
> At present I'd appreciate any comment and assistance in starting up the
> process.
>
> http://pub64.ezboard.com/bfpgatipsandtricks
>
> Because the forums are new I've focussed on Altera-based tools, but over the
> coming weeks if there is sufficient interest I'll attempt to extend them to
> other device toolsets.
>
> I should point out that there is little useful content on the forums as yet,
> which is where your assistance would be invaluable.
>
> You will need to register a user name, email  and some details to post
> (viewing doesn't require this). How accurately you want to do this is
> entirely up to you.
>
> If you need to contact me, try pauljnospambaxter@hotnospammail.com without
> the nospam bits.
>
> Feedback appreciated.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 40160
Subject: Re: Altera FPGAs
From: prashantj@usa.net (Prashant)
Date: 28 Feb 2002 17:29:40 -0800
Links: << >> << T >> << A >>

Well, as of now the choice is not in my hands and I have to make do
with whatever I have. But I did try to map the 16x16 multiplier on a
XC2V250 and the results were much worse (about 100ns for a single
multiply). I was pretty impressed with the <6ns delay that you mention
below and had read about it myself some time earlier. But I havent
seen those numbers in my simulations. Any ideas ?

Thanks,
Prashant

Peter Alfke <peter.alfke@xilinx.com> wrote in message news:<3C7EBCCE.36D08841@xilinx.com>...
> Prashant wrote:
> 
> > I'm trying to implement a 16x16 combinational multiplier in the
> > EP20K160E. But the timing simulations seem to take 40-50 ns for a
> > single multiply.
> >
> > Anyone have any ideas, how can I speedup the operation.
> 
> Well, you could change to a Virtex-II chip that performs this multiplication
> much faster
> (<6 ns combinatorial delay, < 4 ns with internal pipeline)
> Just a friendly reminder that there are other options...
> I couldn't resist this opportunity!       :-)
> Peter Alfke, Xilinx Applications

Article: 40161
Subject: Re: Altera FPGAs
From: Ray Andraka <ray@andraka.com>
Date: Fri, 01 Mar 2002 01:40:04 GMT
Links: << >> << T >> << A >>

Reminds me of a furniture store advertisement that has been running up
here...A woman is cutting her furniture up with a chainsaw, while her husband
is reading a mail notice "It says here you MAY be a winner"  followed shortly
by the announcer saying "time for new furniture?"

Anyway, 40-50 ns sounds a tad high.  What is the structure of your
multiplier?  You might consider a computed partial products structure as
discussed on my website, that will get a logic depth of 5.5 (I'm counting the
cascade gate as .5) if I did the math right.  If you can manage to get it set
up in adjacent LABs you should be able to cut the delay down a bit.

If at all possible, consider pipelining the multiplier.  Pipelining won't
improve the time for it to produce a particular result (in fact it will
actually increase the latency due to slack times needed between each
register), but it will increase the throughput by allowing you to start on a
new product before the previous is completed.  Sometimes, pipelining just
isn't possible because of a loop in the data path that includes the
multiplier.  If that is the case, then you might get some improvement by using
the EABs as larger LUTs to reduce the number of product terms.  You may also
be able to use booth recoding to reduce the number of product terms.

Finally, as Peter mentioned, you could use one of the newer devices with
dedicated multipliers.  Right now, I believe the only ones shipping are the
Xilinx VirtexII family.  Make sure you check the latest data sheets carefully
regarding the speeds.  There have been a number of adjustments to the
multiplier speeds, most of them not favorable.  Also make sure the speeds
include the routing delays to get to/from the multiplier.

Nicholas Weaver wrote:

> In article <ea62e09.0202281345.1467d3c2@posting.google.com>,
> Prashant <prashantj@usa.net> wrote:
> >hi,
> >
> >Do the different devices in the APEX20KE family have different maximum
> >speeds of operation. e.g. Would EP20K1500E be expected to run much
> >faster than EP20K160E ?
> >
> >I'm trying to implement a 16x16 combinational multiplier in the
> >EP20K160E. But the timing simulations seem to take 40-50 ns for a
> >single multiply.
>
> A)  Restructure.  What does your multiplier look like?  It MAY (repeat
> MAY) be better to go with a carry-save structure, or simply reorder
> the adders.
>
> B)  Pipeline.  Pipeline.  With FPGAs, ALWAYS pipeline!
> --
> Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 40162
Subject: Re: Altera FPGAs
From: "Peter Ormsby" <faepete.deletethis@mediaone.net>
Date: Fri, 01 Mar 2002 04:08:43 GMT
Links: << >> << T >> << A >>


Prashant <prashantj@usa.net> wrote in message
news:ea62e09.0202281345.1467d3c2@posting.google.com...

> I'm trying to implement a 16x16 combinational multiplier in the
> EP20K160E. But the timing simulations seem to take 40-50 ns for a
> single multiply.
>
> Anyone have any ideas, how can I speedup the operation?
...
> Thanks,
> Prashant

Prashant,

If anything, the 20K1500E is going to be slower than the 20K160E.  However,
you should be able to get much faster speeds out of your 20K160E than you
are currently seeing.

Make sure that your inputs and outputs are registered.  It takes a
relatively long time for a signal to go between the pins and the FPGA
routing structure, so it's helpful to move the inputs into a register before
running them though the multiplier and then registering the multiplier
outputs before running them off the chip.

Once you have the inputs and outputs registered, you should be able to get a
multiplier working about 50MHz with no pipeline stages or over 110 MHz with
two stages in a 20K160E.  You can use the Megawizard to create a pipelined
multiplier without having to figure out the partial products implementation
yourself.

Let me know if this doesn't help.

-Pete-

Article: 40163
Subject: Re: Creation of FPGA tips and tricks forum - help required
From: "Theron Hicks (Terry)" <hicksthe@egr.msu.edu>
Date: Thu, 28 Feb 2002 23:55:54 -0500
Links: << >> << T >> << A >>

I am inclined to agree with Ray.  However, if you can afford the time and energy
to do a good job, a new source of info can't hurt.  By a good job, I mean well
organized, accurate, and concise.  (With a family, etc. I don't have that much
time myself.)  Really good tutorials are hard to find and there is a good reason
why.  They take a LOT of effort and skill to do well.  I have generally found good
support in these news groups.  In my case, I am using almost solely Xilinx parts
and the Xilinx people seem to haunt these groups extensively.  They respond well,
with accurate answers.  (THANKS!!  Xilinx and especially Peter Alfke)  Given your
direction toward brand "A", perhaps this project might need doing.  This is not
intended as a slam at Altera, just an indication of my own lack of knowledge.

(BTW, I hope my all caps above is not offensive.  If so can somebody please let me
know.  My knowledge of proper newsgroup manners is lacking.)

Thanks, and whatever you decide, good luck and keep us posted.
Theron Hicks

Ray Andraka wrote:

> Have you been to the comp.arch.fpga FAQ site maintained by Philip Freidin?  I
> think that between this newsgroup and that website, all the intents of your
> forum are satisfied.  I think it would probably serve the community better if
> you contributed to the FAQ.
>
> Paul wrote:
>
> > Having come back to logic design after a 5 year break, I had a bit of a
> > culture-shock with the myriad of different tools and their inherent
> > strengths and weaknesses.
> >
> > I'm hoping to help make it a bit easier for others and expand my current
> > limited understanding by creating a set of forums for discussion.
> >
> > My aim in creating the forums was:
> >
> > 1) Provide a forum for discussion of various programmable logic tools and
> > how best to use them.
> >
> > 2) Provide a place to store tips and techniques used by programmable logic
> > designers.
> >
> > 3) Complement the discussions on the main programmable logic newsgroups and
> > perhaps go into more specific detail and provide more tutorial information
> > to supplement the newsgroup information.
> >
> > 4) Provide an edited summary of valuable discussions on the newsgroups.
> >
> > At present I'd appreciate any comment and assistance in starting up the
> > process.
> >
> > http://pub64.ezboard.com/bfpgatipsandtricks
> >
> > Because the forums are new I've focussed on Altera-based tools, but over the
> > coming weeks if there is sufficient interest I'll attempt to extend them to
> > other device toolsets.
> >
> > I should point out that there is little useful content on the forums as yet,
> > which is where your assistance would be invaluable.
> >
> > You will need to register a user name, email  and some details to post
> > (viewing doesn't require this). How accurately you want to do this is
> > entirely up to you.
> >
> > If you need to contact me, try pauljnospambaxter@hotnospammail.com without
> > the nospam bits.
> >
> > Feedback appreciated.
>
> --
> --Ray Andraka, P.E.
> President, the Andraka Consulting Group, Inc.
> 401/884-7930     Fax 401/884-7950
> email ray@andraka.com
> http://www.andraka.com
>
>  "They that give up essential liberty to obtain a little
>   temporary safety deserve neither liberty nor safety."
>                                           -Benjamin Franklin, 1759

Article: 40164
Subject: Re: IIR. convolution
From: Ray Andraka <ray@andraka.com>
Date: Fri, 01 Mar 2002 05:10:19 GMT
Links: << >> << T >> << A >>

FIRs can generally be run at much higher data rates than IIRs in FPGA
implementations because they can be heavily pipelined.  IIR filters can't
really be pipelined since the output has to be back at the input within one
sample interval.  Large FIRs can be had by using symmetry and distributed
arithmetic.  See my papers on my website regarding radar in a chip...in that
case there are 4 256 tap FIR filters (2 per complex filter) per FPGA running
data at 5 MHz sample rate.  If you want bigger or faster, you might consider
using an FFT to do fast convolution.  We've done a 2K tap complex filter at 60
MHz sample rates that way.

MANDY & DOUGLAS wrote:

> Depends on the application. Basically an IIR filter can give a similar
> response to an FIR filter. The use of an IIR in place of an FIR makes sense
> when any implementation of the FIR would cause it to become too expenseive
> to implement - typically high numbers of taps at high data rates. There are
> lots of books on the subjects of filtering that I'm sure some other readers
> of this site can recommend.
>
> "Alkos Nikos" <alkosd@yahoo.co.uk> wrote in message
> news:24b1bde3a9b9f5b214e841e536d8a5a7.57871@mygate.mailgate.org...
> > basic question, could we tell that IIR performs a convolution operation
> > as FIR does
> > thanks
> >
> >
> > --
> > Posted via Mailgate.ORG Server - http://www.Mailgate.ORG
> >

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 40165
Subject: Re: APEX-II vs VIRTEX-II
From: fastgirl70@hotmail.com (Girl)
Date: 28 Feb 2002 22:04:09 -0800
Links: << >> << T >> << A >>

Quartus II v2.0 Web Edition is now on the altera web page and it supports ACEX 1K.

http://www.altera.com/products/software/sfw-quarwebmain.html


rickman <spamgoeshere4@yahoo.com> wrote in message news:<3C6F58FB.1E11DA9B@yahoo.com>...
> I see that this question has met with no reply after two weeks. I guess
> that is the answer...
> 
> 
> 
> Russell Shaw wrote:
> > 
> > Guy Schlacter wrote:
> > >
> > > QuartusII v2.0 just released Friday and has been producing very good
> > > results for both this new family and ApexII.
> > 
> > When is Quartus web edition going to include Acex 1k devices?
> > 
> > > As far as GATE COUNTING, every vendor and family uses differnet
> > > nomenclature.  For the user, you are best off descregarding gate counts
> > > and comparing
> > > 4input LUTs
> > > Available Memory counts
> > > Other dedicated Resources Multipliers etc.
> > >
> > > Best of Luck,
> > > Guy Schlacter
> > > Altera Corp.
> > >
> > > "Steve Holroyd" <spholroyd@iee.org> wrote in message
> > > news:b623f4cf.0201111039.2a16155@posting.google.com...
> > >
> > > > I am currently task of recommending the largest, fastest and most
> > > > memory FPGA that's readily available the first half of this year for a
> > > > FPGA Array Card.
> > > >
> > > > The choices have been narrowed down to two families Altera's APEX-II
> > > > (EP2A70) and XILINX Virtex-II (XC2V6000).
> > > >
> > > > Which can operate at the highest speed?
> > > >
> > > > Steve
> > >
> > > --
> > > Posted via Mailgate.ORG Server - http://www.Mailgate.ORG
> > 
> > --
> >    ___                                           ___
> >   /  /\                                         /  /\
> >  /  /__\ Russell Shaw, B.Eng, M.Eng(Research)  /  /\/\
> > /__/   / Victoria, Australia, Down-Under      /__/\/\/
> > \  \  /                                       \  \/\/
> >  \__\/                                         \__\/
>  
> -- 
> 
> Rick "rickman" Collins
> 
> rick.collins@XYarius.com
> Ignore the reply address. To email me use the above address with the XY
> removed.
> 
> Arius - A Signal Processing Solutions Company
> Specializing in DSP and FPGA design      URL http://www.arius.com
> 4 King Ave                               301-682-7772 Voice
> Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 40166
Subject: Re: PCI book ... still confused
From: Matthias Scheerer <scheerer@uni-mannheim.de>
Date: Fri, 01 Mar 2002 09:24:21 +0100
Links: << >> << T >> << A >>

Hi Kevin,

I forgot to say that we already have the PCI Spec, which is actually the
basis of our development. Due to your comment I think PCI Hardware and
Software will be right for us.

Thanks for the comprehensive answer.

Matthias

Kevin Brace wrote:
> 
>         One document I think you are forgetting to obtain is PCI Local
> Bus Specification Revision 2.2.
> You can purchase this specification from http://www.pcisig.com.
> The reason I think this document is more important than the other three
> books you mentioned is because Appendix B of PCI Specification have
> sample state machines of a target interface and a master (initiator)
> interface.
> Although, I added a few more states in my PCI IP core design, my PCI IP
> core's state machine for target and master pretty much resembles the
> sample state machines of Appendix B.
>         Regarding the three books you mentioned, I own
> PCI System Architecture 4th Edition (Costs $39.95 ISBN 0-201-30974-2)
> and PCI Hardware and Software 4th Edition (Costs about $100 ISBN
> 092939259-0).
> The problem with both books I think is that they are pretty much bus
> protocol books, and doesn't discuss how a PCI IP core should be designed
> (Neither books have even a single state machine diagram in them.), and
> how the designer should design it to meet timings (Setup time (Tsu < 7ns
> for 33MHz PCI and Tsu < 3ns for 66MHz PCI) is the hardest part to meet).
> So, if what you want from those two books is how you should design a PCI
> IP core, it won't be there.
> As a bus protocol book, I will say that PCI System Architecture 4th
> Edition is more for a beginner, and it is fairly easy to read, so it is
> not bad having it, but the PCI specification is also well written, and
> easy to read, so one may have to ask itself, "Do I really have to have
> PCI System Architecture 4th Edition?"
> PCI Hardware and Software 4th Edition is for an experienced designer,
> and it contains a lot of details, but I feel like because it contains
> too much details, it is hard to read.
> Another thing I don't like about it will be the diagrams shown in the
> book which looks ugly compared to those of PCI Specification or PCI
> System Architecture 4th Edition.
> Maybe PCI and PCI-X Hardware and Software 5th Edition is better, but I
> won't count on it.
> To tell you the truth, when I developed (Although it is still not done
> yet.) my PCI IP core, I pretty much relied on PCI Specification for
> making design decisions, and rarely referenced PCI System Architecture
> 4th Edition or PCI Hardware and Software 4th Edition.
> So, I will say that PCI System Architecture 4th Edition and PCI Hardware
> and Software 4th Edition are nice to have them, it is not absolutely
> necessary to have it like the PCI Specification.
>         Regarding the more important part of how to implement a PCI IP
> core, first thing you will like to do is to download copies of Xilinx
> LogiCORE PCI Design Guide from Xilinx, Altera PCI MegaCore Function User
> Guide from Altera, and Synopsys DWPCI (a PCI IP core for ASICs) Data
> Book from Synopsys.
> That should give you a picture of what a PCI IP core is like, and what
> the backend user interface is like.
> However, those documents won't discuss you how they designed the inside,
> so you will have to figure out that yourself (Otherwise, no one will pay
> several thousand of dollars for a license.).
>         From my experience of designing a vendor independent PCI IP
> core, and implementing it in Xilinx Spartan-II XC2S150-5, the hardest
> part of the design was meeting the setup time of Tsu < 7ns for 33MHz
> PCI.
> Meeting Clock-to-Output Valid (Tval) of Tval < 11ns for 33MHz PCI and
> Tval < 6ns should be easy assuming that you know how to constrain FFs
> within the IO pads (IOB FFs in Xilinx and IOE FFs in Altera.).
> The reason meeting the setup time is so hard is because in PCI there are
> cases where the registered version of control signals cannot be used
> like when doing a no-wait cycle burst transfer (in initiator mode,
> unregistered version of DEVSEL#, TRDY#, and STOP# has to be monitored,
> and in target mode, unregistered version of FRAME# and IRDY# has to be
> monitored.), or when asking for a disconnect in target mode
> (unregistered version of FRAME# has to be monitored in this case), and
> instead unregistered version has to be used.
> These unregistered signals have to go through multiple levels of LUTs
> before reaching a FF, and to meet the setup time, you cannot go through
> too many levels of LUT.
> Assuming that you use Spartan-II-5, and these control signals are close
> to each other, going through 3 levels of LUT should still meet Tsu < 7ns
> with automatic P&R.
> If you don't mind doing manual floorplanning, 4 levels of LUT should
> still meet Tsu < 7ns.
>         Another difficult part of a PCI IP design is how to update the
> output port of AD[31:0] during a target read or an initiator write.
> IRDY# in target mode, and TRDY# in initiator mode will influence the
> output port of AD[31:0], but the problem here is that unlike the above
> case of unregistered input control signals to a FF of output control
> signals (in target mode, unregistered FRAME# and IRDY# to DEVSEL#,
> TRDY#, and STOP#'s output port, and in initiator mode, unregistered
> DELSEL#, TRDY#, and STOP# to FRAME# and IRDY#'s output port.), the
> routing distance will likely be large, so the levels of LUT will have to
> be even less than the above (Should be at most 2 levels of LUT.).
> Use Clock Enable (CE) input of a FF to keep the levels of LUT low.
> If you are using Xilinx devices like Spartan-II, you should use Xilinx's
> infamous and secret PCILOGIC.
> Only Virtex/Virtex-E/Spartan-II/Spartan-IIE (Virtex-II doesn't support
> it) support PCILOGIC.
> Basically, PCILOGIC has bunch of NAND gates that generates CE signals
> for IOB FFs, and I guess the benefit of using it is that it supposedly
> allows predictable timings.
> To keep your code generic for ASIC porting or in use with other FPGAs
> (i.e., Altera FPGAs), you should try to "emulate" this PCILOGIC with
> regular LUTs.
> One 5-input LUT, or three 4-input LUTs (will have 2 levels of 4-input
> LUT) can emulate PCILOGIC.
> If you are interested, I can post a sample Verilog code that works with
> ISE WebPACK 4.1.
>         Parity generation is another issue during a target read cycle
> because unregistered C/BE#[3:0] has to be through some kind of a parity
> generator in Tsu < 7ns.
> The trick is to compute the parity of AD[31:0] that is getting read
> ahead, since there is 30ns to do so, and merge that with C/BE#[3:0] to
> compute the final parity that goes out to PAR.
> Virtex's 5-input LUT can handle this nicely, but I haven't been able to
> instruct XST (ISE WebPACK's synthesis tool) to do so even if KEEP
> attribute is used.
> Using a carry-chain parity generator might be better than using a
> combinational parity generator, but I haven't figured out a way to
> instruct XST to infer that.
>         I recommend using Address/Data Stepping which, of course,
> reduces bus utilization (performance) in initiator mode because GNT# has
> to be asserted at least for 2 cycles (Won't be able to start a
> transaction if GNT# is asserted for only one cycle. AD[31:0] and
> C/BE#[3:0] has to be turned off immediately.), but will help you meet
> Tsu for OE (Output Enable) FFs.
> Xilinx and Altera uses this technique, presumably to meet Tsu < 3ns for
> 66MHz PCI.
>         The bottom line is that if the logic is not designed carefully,
> even a relatively new device like Xilinx Spartan-II won't meet even
> 33MHz PCI's setup time.
> To meet the setup time, you will need to have good understanding of
> target architecture (Like what the delays of various internal resources
> like LUT is like.).
> Since I got mine to meet 33MHz PCI timings comfortably with Verilog, I
> don't believe there is any necessity to use Schematics, but you should
> never be hesitant to do manual floorplanning because automatic P&R tool
> just doesn't know the correct location of where the LUT should be
> placed.
> Human brain can do a better job than what software can.
> 
> Kevin Brace (Don't respond to me directly, respond within the
> newsgroup.)
> 
> Matthias Scheerer wrote:
> >
> > Hi there,
> >
> > to see a posting about PCI books was very interesting for me, because we
> > too have to decide, what PCI book to buy. We are currently developing
> > FPGA/ASIC logic to connect to PCI and also software drivers (linux by
> > now). We now have to decide whether to buy
> > "PCI System Architecture, 4th Ed. and PCI-X System Architecture" (both)
> > or
> > "PCI and PCI-X Hardware and Software, 5th Ed."
> >
> > Any comments on those (three) books ?
> >
> > Thanks.
> > Matthias

-- 
Matthias Scheerer        (mailto:scheerer@uni-mannheim.de)
University of Mannheim - Computer Architecture Group
68161 Mannheim - GERMANY (http://mufasa.informatik.uni-mannheim.de)
Phone: +49 621 181 2721  Fax: +49 621 181 2713

Article: 40167
Subject: QuartusII (was Re: APEX-II vs VIRTEX-II)
From: Russell Shaw <rjshaw@iprimus.com.au>
Date: Fri, 01 Mar 2002 10:19:15 GMT
Links: << >> << T >> << A >>

Yes, i've been using it:) I still got a minor glitch:(

Girl wrote:
> 
> Quartus II v2.0 Web Edition is now on the altera web page and it supports ACEX 1K.
> 
> http://www.altera.com/products/software/sfw-quarwebmain.html
> 
> rickman <spamgoeshere4@yahoo.com> wrote in message news:<3C6F58FB.1E11DA9B@yahoo.com>...
> > I see that this question has met with no reply after two weeks. I guess
> > that is the answer...
> >
> >
> >
> > Russell Shaw wrote:
> > >
> > > Guy Schlacter wrote:
> > > >
> > > > QuartusII v2.0 just released Friday and has been producing very good
> > > > results for both this new family and ApexII.
> > >
> > > When is Quartus web edition going to include Acex 1k devices?

Article: 40168
Subject: Re: Synopsys Design Compiler
From: "Ansgar Bambynek" <a.bambynek_xxx_@avm.de>
Date: Fri, 1 Mar 2002 11:37:29 +0100
Links: << >> << T >> << A >>

Hi

my experience with DC vs FC2 (fpga compiler II from Synopsys) is that the
results in terms of speed and area from dc are very poor compared to fc2.
This is probably true for other FPGA specific synthesizers like sinplicity,
...

Check with Synopsys if a dc license enables you to use fc2.

Unfortunately fc2 uses other commands than dc, e.g. set_multicycle_path is
not available in fc2.

I was told by Synopsys that there is a customer demand for reintegrating fc2
into dc again so you can keep all your constraints and synthesis scripts and
just have to change the libs. I don't know if or when this will happen.

HTH Ansgar

--
Attention reply address is invalid.
Please remove _xxx_
Jay <kayrock66@yahoo.com> schrieb in im Newsbeitrag:
d049f91b.0202281030.206aeb6b@posting.google.com...
> I'm trying to pitch that my client use Synopsys Design Compiler
> instead of an FPGA specific synthesizer from another vendor since his
> Xilinx Vertex 2 FPGA is a proto for a standard cell part.  The clock
> speed isn't important, verification of the tool flow and design
> database is.
>
> The problem I'm running into is that the Design Compiler output uses
> almost 200% the LUTs compared to the purpose built FPGA synthesizer.
> So the logic will no longer fit the proto board.
>
> Mini Example:
> Design compiler: 1760 LUTS
> FPGA synthesizer: 824 LUTS
>
> Design compiler synthesizes to cells like AND2, OR2, AND4, etc whereas
> the FPGA specific tool maps directly to special LUTs custom made for
> the logic required like LUT_AB5A and LUT_67FE, etc.  Now I figured the
> Xilinx mapper would be smart enough to "map" the Design Compiler AND2,
> OR2, etc, into more compact LUT_ABCD and LUT_6534 type cells but just
> seems to be doing a 1 for one map with no optimization.
>
> It appears that Xilinx did not write the mapper optimization (option
> -oe) for the recent products Vertex E/2 an Spartan 2 in effect giving
> up support for Design Compiler.
>
> Can any one else comment on this?  It seems crazy that I can't use the
> old man of sythesis (Design Compiler) at $100k seat anymore.
>
> BTW- Altera DOES still do map optimization on Design Compiler EDIF
> files.

Article: 40169
Subject: Re: Beginner Altera Questions
From: Martin Thompson <martin.j.thompson@trw.com>
Date: 01 Mar 2002 10:38:22 +0000
Links: << >> << T >> << A >>

alw@al-williams.com (Al Williams) writes:

> True, but what if you want to get tricky and have one flip flop latch
> on the rising edge and another latch on the falling edge? This seems
> to work, but it does whine about the clock not being global.
> 

It will then not be global.

> I haven't tried it, but I was wondering if you could invert the clock
> and then feed it through a global buffer (assuming you haven't used
> all the global buffers). Don't know if that'd work or not.
> 

If you have global buffers (which there aren't on the MAX3000, just a
global clock pin IIRC) you can do that, but even with a 50-50 clock
you can't guarantee where in the cycle the falling edge is as you've
gone through the routing/inverter/routing/global delays.

Cheers,
Martin

-- 
martin.j.thompson@trw.com
TRW Conekt, Solihull, UK
http://www.trw.com/conekt

Article: 40170
Subject: Re: Comparison between two FPGAs- what is decisive factor?
From: "Tim" <tim@rockylogic.com.nooospam.com>
Date: Fri, 1 Mar 2002 10:47:22 -0000
Links: << >> << T >> << A >>

And the routing resources (and speed) steadily increase as
the generations advance.  For some applications this can be
decisive.

Which sadly means that the only certain guide for your design
is a trial implementation.

Ray Andraka wrote:
> Nope,  different marketing gates.  The virtexII has a higher concentration
> of memory, plus the multipliers contribute to the gate count.  You will
> need to compare the number of slices, which  will leave you somewhat less
> than 3:1 when comparing 2V6000 to 1000E.  Also, it is worth noting that
> the cost is more or less exponential with device size.  For a given number
> of slices, rather than a given number of marketing gates (and ignoring the
> other goodies) the VIrtexE is very competitive.
>
> king wrote:
>
> > Hi all,
> > I have a design which uses say X no of XCV1000E FPGAs. I wud like to
> > go for denser FPGAs ( XC2V6000). The total system gates in XCV1000E is
> > approximately 1.5 Million while in XC2V6000 is 6 Million. So can I
> > assume that the logic implemented in four (6/1.5) FPGAs can be
> > implemented using a single XC2V6000 FPGAs? But the LUTs of the two
> > looks different. Will this affect the beforesaid ratio? Or is there
> > any other decisive factors involved? Ur reply will be most welcom
> > with kind regs
> > king
>
> --
> --Ray Andraka, P.E.
> President, the Andraka Consulting Group, Inc.
> 401/884-7930     Fax 401/884-7950
> email ray@andraka.com
> http://www.andraka.com
>
>  "They that give up essential liberty to obtain a little
>   temporary safety deserve neither liberty nor safety."
>                                           -Benjamin Franklin, 1759
>
>

Article: 40171
Subject: Re: Synopsys Design Compiler
From: "Tim" <tim@rockylogic.com.nooospam.com>
Date: Fri, 1 Mar 2002 11:04:35 -0000
Links: << >> << T >> << A >>

You could use DC to compile to a Verilog netlist, then an FPGA-specific
compiler to reprocess the Verilog netlist into EDIF.  This would
add another tool to your chain, but since you are using Xilinx 'map'
anyway, you are not really adding any more uncertainty.  Whether
the FPGA-specific optimisations would survive all this...


Jay wrote
> I'm trying to pitch that my client use Synopsys Design Compiler
> instead of an FPGA specific synthesizer from another vendor since his
> Xilinx Vertex 2 FPGA is a proto for a standard cell part.  The clock
> speed isn't important, verification of the tool flow and design
> database is.
>
> The problem I'm running into is that the Design Compiler output uses
> almost 200% the LUTs compared to the purpose built FPGA synthesizer.
> So the logic will no longer fit the proto board.
>
> Mini Example:
> Design compiler: 1760 LUTS
> FPGA synthesizer: 824 LUTS
>
> Design compiler synthesizes to cells like AND2, OR2, AND4, etc whereas
> the FPGA specific tool maps directly to special LUTs custom made for
> the logic required like LUT_AB5A and LUT_67FE, etc.  Now I figured the
> Xilinx mapper would be smart enough to "map" the Design Compiler AND2,
> OR2, etc, into more compact LUT_ABCD and LUT_6534 type cells but just
> seems to be doing a 1 for one map with no optimization.
>
> It appears that Xilinx did not write the mapper optimization (option
> -oe) for the recent products Vertex E/2 an Spartan 2 in effect giving
> up support for Design Compiler.
>
> Can any one else comment on this?  It seems crazy that I can't use the
> old man of sythesis (Design Compiler) at $100k seat anymore.
>
> BTW- Altera DOES still do map optimization on Design Compiler EDIF
> files.

Article: 40172
Subject: Re: stuck in state in Spartan-II!
From: "Ken Mac" <aeu96186@yahoo.co.uk>
Date: Fri, 1 Mar 2002 11:23:39 -0000
Links: << >> << T >> << A >>


Phil,

Your solution did indeed solve the hanging problem - thanks very much for
the code!

The design now runs for 28 seconds instead of 3 for some other reason but I
am sure I can track this down now I can make changes to my VHDL without
random state hangs happening!

Thanks again,

Ken

Article: 40173
Subject: Re: stuck in state in Spartan-II!
From: "Ken Mac" <aeu96186@yahoo.co.uk>
Date: Fri, 1 Mar 2002 12:23:18 -0000
Links: << >> << T >> << A >>


Phil,

Forgot to mention that the DAC is fed the 425kHz clock via a pin of my
Spartan-II.

The DAC uses the negative edges of the 425kHz clock to take data from
process C.

Process C uses the positive edges of the 425kHz clock to place data on the
serial data pin for the DAC and the negative edges then drive the DAC.

My DAC output now seems to be irregular - what signal should I use to clock
my DAC now?

Thanks again,

Ken


> > Process A:    Runs at 33MHz to fill a 16 location FIFO with 8-bit data
> > samples and then keep it full.
> > Process B:    Runs at 33MHz to take data from the FIFO when told to and
> > supply it to process C via a register.
> > Process C:    Runs at 425kHz and sends FIFO 8-bit data samples to a DAC
> > bit-serially then asks process B for next piece of data.
>
> > Process A is fed with data via a Visual C++ app (the Spartan-II is
mounted
> > on a PCI board) which is synchronised with the FPGA using an interrupt
pin
> > that the FPGA can assert and the C++ can read.
> >
> > I have used this system for many designs with no trouble (none of them
had
> > multiple clock domains however).  The problem here is that the design is
> > getting stuck in state for no apparent reason!  (i.e. the C++ hangs
waiting
> > for the interrupt pin!).
>
> Sounds to me like you have a problem crossing clock domains.
>
>
> > The system works for a random number of samples (between 28 and 33 it
seems)
> > and then gets stuck in state. This is very strange because it means that
my
> > protocols do work.
>
> Not strange at all.  Suppose we have two registers in one clock domain
sampling
> a signal from the other clock domain.  If both get the same value for the
next
> clock, the logic works.  If only one gets the value, the logic hangs.
>
>
> > The weird thing is that I put a piece of debug code in another state to
send
> > a signal out to a pin to probe, I ran the flow again to get a bitstream
and
> > the system ran perfectly for all 75001 samples I am using!  The debug
code
> > was "Debug <= '1'".
>
> Different placement, different routing, different timing, different odds
of
> failure.  Might work well at 25C, and fail like above at 28C.
>
> > Then I enabled clock DLLs using the BUFGDLL component and it hangs
again!
>
> Different timing, different odds of failure.
>
>
> > Previously, I had it working perfectly using the clock DLLs but without
a
> > FIFO (i.e. 1 sample at a time from C++ to FPGA to DAC) but I got some
> > stutters hence I introduced the FIFO.
> >
> > In that design I also had hanging problems but after I rejigged my
protocol
> > in my VHDL state machines it worked perfectly.
> >
> > It seems that seemingly random changes of VHDL make or break the system.
I
> > guess it must be to do with my 2 different clock rates but that is the
way
> > it has to be.
> >
> > I am at a loss - anyone any ideas?
>
> First, sync the 425KHz to 33 MHz, then edge detect, and then use the edge
> detected clock 425 for a clock enable for process C.  Code fragments:
>
> process(clk33) begin
>   if rising_edge(clk33) then
>     synslow <= clk425;
>     synslow2 <= synslow;
>     synslow3 <= synslow2;
>     en425 <= synslow2 and not synslow3;
>   end if;
> end process;
>
> processc:
> process(clk33) begin -- was (clk425)
>   if rising_edge(clk33)
>     if en425 = '1' then -- was rising_edge(clk425)
>     ....
>
> The reason this (hopefully!) will solve your problem is that almost all
logic
> will be running on a single clock.  While there is a chance that synslow
will
> not correctly clock in clk425 on rising or falling edge ("go metastable"),
> synslow2 is much less likely to fail (As mean time between failures >> age
of
> universe), and en425 even less so.
>
> Also, you could synchronize all control signals between the two processes.
More
> complex.
>
>
> --
> Phil Hays

Article: 40174
Subject: high-speed clock distribution/divider in a FPGA?
From: "Thomas Zipper" <thomas.zipper@icn.siemens.de>
Date: Fri, 1 Mar 2002 07:26:32 -0500
Links: << >> << T >> << A >>

Hi,
there were some "counter discussions" in this newsgroup, however I wonder if
it might be possible to

- divide down a couple of LVPECL clocks running at 622 MHz to a few MHz or
kHz and

- build a programmable LVPECL divider that can either pass 155 MHz and drive
it off-chip as LVDS clock or divide down a 622 MHz LVPECL clock to 155 MHz
and again drive it as LVDS clock to an another chip on board

I requirement would be, that the FPGA is small in respect to the board
space - the small Virtex-II 50/80 (BGA256) devices looking very nice.

Might there be a way to do that?

Bye

Thomas

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search