Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Workshop on Cryptographic Hardware and Embedded Systems 2002 (CHES 2002) www.chesworkshop.org Hotel Sofitel, San Francisco Bay (Redwood City), USA August 13 - 15, 2002 Second Call for Papers General Information The focus of this workshop is on all aspects of cryptographic hardware and security in embedded systems. The workshop will be a forum of new results from the research community as well as from the industry. Of special interest are contributions that describe new methods for efficient hardware implementations and high-speed software for embedded systems, e.g., smart cards, microprocessors, DSPs, etc. We hope that the workshop will help to fill the gap between the cryptography research community and the application areas of cryptography. Consequently, we encourage submissions from academia, industry, and other organizations. All submitted papers will be reviewed. This will be the fourth CHES workshop. CHES '99 and CHES 2000 were held at WPI, and CHES 2001 was held in Paris. The number of participants has grown to more than 200, with attendees coming from industry, academia, and government organizations. The topics of CHES 2002 include but are not limited to: * Computer architectures for public-key and secret-key cryptosystems * Efficient algorithms for embedded processors * Reconfigurable computing in cryptography * Cryptographic processors and co-processors * Cryptography in wireless applications (mobile phone, LANs, etc.) * Security in pay-TV systems * Smart card attacks and architectures * Tamper resistance on the chip and board level * True and pseudo random number generators * Special-purpose hardware for cryptanalysis * Embedded security * Device identification Instructions for Authors Authors are invited to submit original papers. The preferred submission form is by electronic mail to submission@chesworkshop.org. The submissions must be anonymous, with no author names, affiliations, acknowledgments, or obvious references. Papers should be formatted in 12pt type and not exceed 12 pages (not including the title page and the bibliography). Please submit the paper in Postscript or PDF, together with an extra file containing the email and physical address of the authors, and an indication of the corresponding author. We recommend that you generate the PS or PDF file using LaTeX, however, MS Word is also acceptable. All submissions will be refereed. Only original research contributions will be considered. Submissions must not substantially duplicate work that any of the authors have published elsewhere or have submitted in parallel to any other conferences or workshops that have proceedings. Important Dates Submission Deadline: May 1st, 2002. Acceptance Notification: July 1st, 2002. Final Version due: August 1st, 2002. Workshop: August 13th - 15th, 2002. NOTE: The CHES dates August 13th - 15th are the Tuesday through Thursday preceeding CRYPTO 2002 which starts on evening of Sunday, August 18th. Mailing List If you want to receive emails with subsequent Call for Papers and registration information, please send a brief mail to mailinglist@chesworkshop.org. Program Committee Beni Arazi, Ben Gurion University, Israel Jean-Sebastien Coron, Gemplus Card International, France Kris Gaj, George Mason University, USA Craig Gentry, DoCoMo Communications Laboratories, USA Jim Goodman, Lumic Electronics, Canada M. Anwar Hasan, University of Waterloo, Canada David Jablon, Phoenix Technologies, USA Peter Kornerup, Odense University, Denmark Pil Joong Lee, Pohang Univ. of Sci. & Tech., Korea Preda Mihailescu, University of Paderborn, Germany David Nacchache, Gemplus Card International, France Bart Preneel, Universite Catholique de Louvain, Belgium Jean-Jacques Quisquater, Universite Catholique de Louvain, Belgium Erkay Savas, rTrust Technologies, USA Joseph Silverman, Brown University and NTRU Cryptosystems, Inc., USA Jacques Stern, Ecole Normale Superieure, France Berk Sunar, Worcester Polytechnic Institute, USA Colin Walter, Computation Department - UMIST, U.K. Organizational Committee All correspondence and/or questions should be directed to either of the Organizational Committee Members: Burt Kaliski (Program Chair) RSA Laboratories 20 Crosby Drive Bedford, MA 01730, USA Phone: +1 781 687 7057 Fax: +1 781 687 7213 Email: bkaliski@rsasecurity.com Cetin Kaya Koc (Local Organization) Dept. of Electrical & Computer Engineering Oregon State University Corvallis, Oregon 97331, USA Phone: +1 541 737 4853 Fax: +1 541 737 8377 Email: Koc@ece.orst.edu Christof Paar (Publicity Chair) Electrical Eng. & Information Sciences Dept. Ruhr-Universitaet Bochum 44780 Bochum, Germany Phone: +49 234 32 23988 Fax: +49 234 32 14444 Email: cpaar@crypto.ruhr-uni-bochum.de Workshop Proceedings The post-proceedings will be published in Springer-Verlag's Lecture Notes in Computer Science (LNCS) series. Notice that in order to be included in the proceedings, the authors of an accepted paper must guarantee to present their contribution at the workshop.Article: 40151
In article <d049f91b.0202281030.206aeb6b@posting.google.com>, kayrock66@yahoo.com (Jay) writes: |> I'm trying to pitch that my client use Synopsys Design Compiler |> instead of an FPGA specific synthesizer from another vendor since his |> Xilinx Vertex 2 FPGA is a proto for a standard cell part. The clock |> speed isn't important, verification of the tool flow and design |> database is. |> |> The problem I'm running into is that the Design Compiler output uses |> almost 200% the LUTs compared to the purpose built FPGA synthesizer. |> So the logic will no longer fit the proto board. |> |> Mini Example: |> Design compiler: 1760 LUTS |> FPGA synthesizer: 824 LUTS |> |> Design compiler synthesizes to cells like AND2, OR2, AND4, etc whereas |> the FPGA specific tool maps directly to special LUTs custom made for |> the logic required like LUT_AB5A and LUT_67FE, etc. Now I figured the |> Xilinx mapper would be smart enough to "map" the Design Compiler AND2, |> OR2, etc, into more compact LUT_ABCD and LUT_6534 type cells but just |> seems to be doing a 1 for one map with no optimization. |> |> It appears that Xilinx did not write the mapper optimization (option |> -oe) for the recent products Vertex E/2 an Spartan 2 in effect giving |> up support for Design Compiler. |> |> Can any one else comment on this? It seems crazy that I can't use the |> old man of sythesis (Design Compiler) at $100k seat anymore. My last experiences with DC are only on the 4k-Series (now I'm using fpga_compiler2 for Spartan2), but maybe this helps: "map -help spartan2" shows among others: -k 4|5|6 Function size for covering combinational logic. If -k is not specified, the default is -k 4. This gives the best balance of runtime to quality of results. Using 5 or 6 can give superior results at the expense of runtime. So try to use -k 6, this can make design much smaller (and faster...) Have you used the usual tweaks in DC, like "compile -boundary_optimization -map_effort high" and "compile -map_effort high -ungroup_all" afterwards? Are you sure that the reset net is replaced by the STARTUP-symbol and removed in the design ("disconnect_net reset -all")? -- Georg Acher, acher@in.tum.de http://www.in.tum.de/~acher/ "Oh no, not again !" The bowl of petuniasArticle: 40152
hi, Do the different devices in the APEX20KE family have different maximum speeds of operation. e.g. Would EP20K1500E be expected to run much faster than EP20K160E ? I'm trying to implement a 16x16 combinational multiplier in the EP20K160E. But the timing simulations seem to take 40-50 ns for a single multiply. Anyone have any ideas, how can I speedup the operation. Would mapping the multiplier on a EP20K1500E help ? I'm using the demo version hence I can't map to EP20K1500E. But I would appreciate it if anyone knows the speed improvement from 160E to 1500E. Thanks, PrashantArticle: 40153
One document I think you are forgetting to obtain is PCI Local Bus Specification Revision 2.2. You can purchase this specification from http://www.pcisig.com. The reason I think this document is more important than the other three books you mentioned is because Appendix B of PCI Specification have sample state machines of a target interface and a master (initiator) interface. Although, I added a few more states in my PCI IP core design, my PCI IP core's state machine for target and master pretty much resembles the sample state machines of Appendix B. Regarding the three books you mentioned, I own PCI System Architecture 4th Edition (Costs $39.95 ISBN 0-201-30974-2) and PCI Hardware and Software 4th Edition (Costs about $100 ISBN 092939259-0). The problem with both books I think is that they are pretty much bus protocol books, and doesn't discuss how a PCI IP core should be designed (Neither books have even a single state machine diagram in them.), and how the designer should design it to meet timings (Setup time (Tsu < 7ns for 33MHz PCI and Tsu < 3ns for 66MHz PCI) is the hardest part to meet). So, if what you want from those two books is how you should design a PCI IP core, it won't be there. As a bus protocol book, I will say that PCI System Architecture 4th Edition is more for a beginner, and it is fairly easy to read, so it is not bad having it, but the PCI specification is also well written, and easy to read, so one may have to ask itself, "Do I really have to have PCI System Architecture 4th Edition?" PCI Hardware and Software 4th Edition is for an experienced designer, and it contains a lot of details, but I feel like because it contains too much details, it is hard to read. Another thing I don't like about it will be the diagrams shown in the book which looks ugly compared to those of PCI Specification or PCI System Architecture 4th Edition. Maybe PCI and PCI-X Hardware and Software 5th Edition is better, but I won't count on it. To tell you the truth, when I developed (Although it is still not done yet.) my PCI IP core, I pretty much relied on PCI Specification for making design decisions, and rarely referenced PCI System Architecture 4th Edition or PCI Hardware and Software 4th Edition. So, I will say that PCI System Architecture 4th Edition and PCI Hardware and Software 4th Edition are nice to have them, it is not absolutely necessary to have it like the PCI Specification. Regarding the more important part of how to implement a PCI IP core, first thing you will like to do is to download copies of Xilinx LogiCORE PCI Design Guide from Xilinx, Altera PCI MegaCore Function User Guide from Altera, and Synopsys DWPCI (a PCI IP core for ASICs) Data Book from Synopsys. That should give you a picture of what a PCI IP core is like, and what the backend user interface is like. However, those documents won't discuss you how they designed the inside, so you will have to figure out that yourself (Otherwise, no one will pay several thousand of dollars for a license.). From my experience of designing a vendor independent PCI IP core, and implementing it in Xilinx Spartan-II XC2S150-5, the hardest part of the design was meeting the setup time of Tsu < 7ns for 33MHz PCI. Meeting Clock-to-Output Valid (Tval) of Tval < 11ns for 33MHz PCI and Tval < 6ns should be easy assuming that you know how to constrain FFs within the IO pads (IOB FFs in Xilinx and IOE FFs in Altera.). The reason meeting the setup time is so hard is because in PCI there are cases where the registered version of control signals cannot be used like when doing a no-wait cycle burst transfer (in initiator mode, unregistered version of DEVSEL#, TRDY#, and STOP# has to be monitored, and in target mode, unregistered version of FRAME# and IRDY# has to be monitored.), or when asking for a disconnect in target mode (unregistered version of FRAME# has to be monitored in this case), and instead unregistered version has to be used. These unregistered signals have to go through multiple levels of LUTs before reaching a FF, and to meet the setup time, you cannot go through too many levels of LUT. Assuming that you use Spartan-II-5, and these control signals are close to each other, going through 3 levels of LUT should still meet Tsu < 7ns with automatic P&R. If you don't mind doing manual floorplanning, 4 levels of LUT should still meet Tsu < 7ns. Another difficult part of a PCI IP design is how to update the output port of AD[31:0] during a target read or an initiator write. IRDY# in target mode, and TRDY# in initiator mode will influence the output port of AD[31:0], but the problem here is that unlike the above case of unregistered input control signals to a FF of output control signals (in target mode, unregistered FRAME# and IRDY# to DEVSEL#, TRDY#, and STOP#'s output port, and in initiator mode, unregistered DELSEL#, TRDY#, and STOP# to FRAME# and IRDY#'s output port.), the routing distance will likely be large, so the levels of LUT will have to be even less than the above (Should be at most 2 levels of LUT.). Use Clock Enable (CE) input of a FF to keep the levels of LUT low. If you are using Xilinx devices like Spartan-II, you should use Xilinx's infamous and secret PCILOGIC. Only Virtex/Virtex-E/Spartan-II/Spartan-IIE (Virtex-II doesn't support it) support PCILOGIC. Basically, PCILOGIC has bunch of NAND gates that generates CE signals for IOB FFs, and I guess the benefit of using it is that it supposedly allows predictable timings. To keep your code generic for ASIC porting or in use with other FPGAs (i.e., Altera FPGAs), you should try to "emulate" this PCILOGIC with regular LUTs. One 5-input LUT, or three 4-input LUTs (will have 2 levels of 4-input LUT) can emulate PCILOGIC. If you are interested, I can post a sample Verilog code that works with ISE WebPACK 4.1. Parity generation is another issue during a target read cycle because unregistered C/BE#[3:0] has to be through some kind of a parity generator in Tsu < 7ns. The trick is to compute the parity of AD[31:0] that is getting read ahead, since there is 30ns to do so, and merge that with C/BE#[3:0] to compute the final parity that goes out to PAR. Virtex's 5-input LUT can handle this nicely, but I haven't been able to instruct XST (ISE WebPACK's synthesis tool) to do so even if KEEP attribute is used. Using a carry-chain parity generator might be better than using a combinational parity generator, but I haven't figured out a way to instruct XST to infer that. I recommend using Address/Data Stepping which, of course, reduces bus utilization (performance) in initiator mode because GNT# has to be asserted at least for 2 cycles (Won't be able to start a transaction if GNT# is asserted for only one cycle. AD[31:0] and C/BE#[3:0] has to be turned off immediately.), but will help you meet Tsu for OE (Output Enable) FFs. Xilinx and Altera uses this technique, presumably to meet Tsu < 3ns for 66MHz PCI. The bottom line is that if the logic is not designed carefully, even a relatively new device like Xilinx Spartan-II won't meet even 33MHz PCI's setup time. To meet the setup time, you will need to have good understanding of target architecture (Like what the delays of various internal resources like LUT is like.). Since I got mine to meet 33MHz PCI timings comfortably with Verilog, I don't believe there is any necessity to use Schematics, but you should never be hesitant to do manual floorplanning because automatic P&R tool just doesn't know the correct location of where the LUT should be placed. Human brain can do a better job than what software can. Kevin Brace (Don't respond to me directly, respond within the newsgroup.) Matthias Scheerer wrote: > > Hi there, > > to see a posting about PCI books was very interesting for me, because we > too have to decide, what PCI book to buy. We are currently developing > FPGA/ASIC logic to connect to PCI and also software drivers (linux by > now). We now have to decide whether to buy > "PCI System Architecture, 4th Ed. and PCI-X System Architecture" (both) > or > "PCI and PCI-X Hardware and Software, 5th Ed." > > Any comments on those (three) books ? > > Thanks. > MatthiasArticle: 40154
Martin Thompson wrote: > <snip> Thank you both Martin and Muzaffer, that's just what I needed to know to get started. Cheers, John -- Dr John Williams, Postdoctoral Research Fellow Queensland University of Technology, Brisbane, Australia Phone : (+61 7) 3864 2427 Fax : (+61 7) 3864 1517Article: 40155
Wow, I hope things really haven't gotten this bad.... -RyanArticle: 40156
Prashant wrote: > I'm trying to implement a 16x16 combinational multiplier in the > EP20K160E. But the timing simulations seem to take 40-50 ns for a > single multiply. > > Anyone have any ideas, how can I speedup the operation. Well, you could change to a Virtex-II chip that performs this multiplication much faster (<6 ns combinatorial delay, < 4 ns with internal pipeline) Just a friendly reminder that there are other options... I couldn't resist this opportunity! :-) Peter Alfke, Xilinx ApplicationsArticle: 40157
In article <ea62e09.0202281345.1467d3c2@posting.google.com>, Prashant <prashantj@usa.net> wrote: >hi, > >Do the different devices in the APEX20KE family have different maximum >speeds of operation. e.g. Would EP20K1500E be expected to run much >faster than EP20K160E ? > >I'm trying to implement a 16x16 combinational multiplier in the >EP20K160E. But the timing simulations seem to take 40-50 ns for a >single multiply. A) Restructure. What does your multiplier look like? It MAY (repeat MAY) be better to go with a carry-save structure, or simply reorder the adders. B) Pipeline. Pipeline. With FPGAs, ALWAYS pipeline! -- Nicholas C. Weaver nweaver@cs.berkeley.eduArticle: 40158
Nope, different marketing gates. The virtexII has a higher concentration of memory, plus the multipliers contribute to the gate count. You will need to compare the number of slices, which will leave you somewhat less than 3:1 when comparing 2V6000 to 1000E. Also, it is worth noting that the cost is more or less exponential with device size. For a given number of slices, rather than a given number of marketing gates (and ignoring the other goodies) the VIrtexE is very competitive. king wrote: > Hi all, > I have a design which uses say X no of XCV1000E FPGAs. I wud like to > go for denser FPGAs ( XC2V6000). The total system gates in XCV1000E is > approximately 1.5 Million while in XC2V6000 is 6 Million. So can I > assume that the logic implemented in four (6/1.5) FPGAs can be > implemented using a single XC2V6000 FPGAs? But the LUTs of the two > looks different. Will this affect the beforesaid ratio? Or is there > any other decisive factors involved? Ur reply will be most welcom > with kind regs > king -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 40159
Have you been to the comp.arch.fpga FAQ site maintained by Philip Freidin? I think that between this newsgroup and that website, all the intents of your forum are satisfied. I think it would probably serve the community better if you contributed to the FAQ. Paul wrote: > Having come back to logic design after a 5 year break, I had a bit of a > culture-shock with the myriad of different tools and their inherent > strengths and weaknesses. > > I'm hoping to help make it a bit easier for others and expand my current > limited understanding by creating a set of forums for discussion. > > My aim in creating the forums was: > > 1) Provide a forum for discussion of various programmable logic tools and > how best to use them. > > 2) Provide a place to store tips and techniques used by programmable logic > designers. > > 3) Complement the discussions on the main programmable logic newsgroups and > perhaps go into more specific detail and provide more tutorial information > to supplement the newsgroup information. > > 4) Provide an edited summary of valuable discussions on the newsgroups. > > At present I'd appreciate any comment and assistance in starting up the > process. > > http://pub64.ezboard.com/bfpgatipsandtricks > > Because the forums are new I've focussed on Altera-based tools, but over the > coming weeks if there is sufficient interest I'll attempt to extend them to > other device toolsets. > > I should point out that there is little useful content on the forums as yet, > which is where your assistance would be invaluable. > > You will need to register a user name, email and some details to post > (viewing doesn't require this). How accurately you want to do this is > entirely up to you. > > If you need to contact me, try pauljnospambaxter@hotnospammail.com without > the nospam bits. > > Feedback appreciated. -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 40160
Well, as of now the choice is not in my hands and I have to make do with whatever I have. But I did try to map the 16x16 multiplier on a XC2V250 and the results were much worse (about 100ns for a single multiply). I was pretty impressed with the <6ns delay that you mention below and had read about it myself some time earlier. But I havent seen those numbers in my simulations. Any ideas ? Thanks, Prashant Peter Alfke <peter.alfke@xilinx.com> wrote in message news:<3C7EBCCE.36D08841@xilinx.com>... > Prashant wrote: > > > I'm trying to implement a 16x16 combinational multiplier in the > > EP20K160E. But the timing simulations seem to take 40-50 ns for a > > single multiply. > > > > Anyone have any ideas, how can I speedup the operation. > > Well, you could change to a Virtex-II chip that performs this multiplication > much faster > (<6 ns combinatorial delay, < 4 ns with internal pipeline) > Just a friendly reminder that there are other options... > I couldn't resist this opportunity! :-) > Peter Alfke, Xilinx ApplicationsArticle: 40161
Reminds me of a furniture store advertisement that has been running up here...A woman is cutting her furniture up with a chainsaw, while her husband is reading a mail notice "It says here you MAY be a winner" followed shortly by the announcer saying "time for new furniture?" Anyway, 40-50 ns sounds a tad high. What is the structure of your multiplier? You might consider a computed partial products structure as discussed on my website, that will get a logic depth of 5.5 (I'm counting the cascade gate as .5) if I did the math right. If you can manage to get it set up in adjacent LABs you should be able to cut the delay down a bit. If at all possible, consider pipelining the multiplier. Pipelining won't improve the time for it to produce a particular result (in fact it will actually increase the latency due to slack times needed between each register), but it will increase the throughput by allowing you to start on a new product before the previous is completed. Sometimes, pipelining just isn't possible because of a loop in the data path that includes the multiplier. If that is the case, then you might get some improvement by using the EABs as larger LUTs to reduce the number of product terms. You may also be able to use booth recoding to reduce the number of product terms. Finally, as Peter mentioned, you could use one of the newer devices with dedicated multipliers. Right now, I believe the only ones shipping are the Xilinx VirtexII family. Make sure you check the latest data sheets carefully regarding the speeds. There have been a number of adjustments to the multiplier speeds, most of them not favorable. Also make sure the speeds include the routing delays to get to/from the multiplier. Nicholas Weaver wrote: > In article <ea62e09.0202281345.1467d3c2@posting.google.com>, > Prashant <prashantj@usa.net> wrote: > >hi, > > > >Do the different devices in the APEX20KE family have different maximum > >speeds of operation. e.g. Would EP20K1500E be expected to run much > >faster than EP20K160E ? > > > >I'm trying to implement a 16x16 combinational multiplier in the > >EP20K160E. But the timing simulations seem to take 40-50 ns for a > >single multiply. > > A) Restructure. What does your multiplier look like? It MAY (repeat > MAY) be better to go with a carry-save structure, or simply reorder > the adders. > > B) Pipeline. Pipeline. With FPGAs, ALWAYS pipeline! > -- > Nicholas C. Weaver nweaver@cs.berkeley.edu -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 40162
Prashant <prashantj@usa.net> wrote in message news:ea62e09.0202281345.1467d3c2@posting.google.com... > I'm trying to implement a 16x16 combinational multiplier in the > EP20K160E. But the timing simulations seem to take 40-50 ns for a > single multiply. > > Anyone have any ideas, how can I speedup the operation? ... > Thanks, > Prashant Prashant, If anything, the 20K1500E is going to be slower than the 20K160E. However, you should be able to get much faster speeds out of your 20K160E than you are currently seeing. Make sure that your inputs and outputs are registered. It takes a relatively long time for a signal to go between the pins and the FPGA routing structure, so it's helpful to move the inputs into a register before running them though the multiplier and then registering the multiplier outputs before running them off the chip. Once you have the inputs and outputs registered, you should be able to get a multiplier working about 50MHz with no pipeline stages or over 110 MHz with two stages in a 20K160E. You can use the Megawizard to create a pipelined multiplier without having to figure out the partial products implementation yourself. Let me know if this doesn't help. -Pete-Article: 40163
I am inclined to agree with Ray. However, if you can afford the time and energy to do a good job, a new source of info can't hurt. By a good job, I mean well organized, accurate, and concise. (With a family, etc. I don't have that much time myself.) Really good tutorials are hard to find and there is a good reason why. They take a LOT of effort and skill to do well. I have generally found good support in these news groups. In my case, I am using almost solely Xilinx parts and the Xilinx people seem to haunt these groups extensively. They respond well, with accurate answers. (THANKS!! Xilinx and especially Peter Alfke) Given your direction toward brand "A", perhaps this project might need doing. This is not intended as a slam at Altera, just an indication of my own lack of knowledge. (BTW, I hope my all caps above is not offensive. If so can somebody please let me know. My knowledge of proper newsgroup manners is lacking.) Thanks, and whatever you decide, good luck and keep us posted. Theron Hicks Ray Andraka wrote: > Have you been to the comp.arch.fpga FAQ site maintained by Philip Freidin? I > think that between this newsgroup and that website, all the intents of your > forum are satisfied. I think it would probably serve the community better if > you contributed to the FAQ. > > Paul wrote: > > > Having come back to logic design after a 5 year break, I had a bit of a > > culture-shock with the myriad of different tools and their inherent > > strengths and weaknesses. > > > > I'm hoping to help make it a bit easier for others and expand my current > > limited understanding by creating a set of forums for discussion. > > > > My aim in creating the forums was: > > > > 1) Provide a forum for discussion of various programmable logic tools and > > how best to use them. > > > > 2) Provide a place to store tips and techniques used by programmable logic > > designers. > > > > 3) Complement the discussions on the main programmable logic newsgroups and > > perhaps go into more specific detail and provide more tutorial information > > to supplement the newsgroup information. > > > > 4) Provide an edited summary of valuable discussions on the newsgroups. > > > > At present I'd appreciate any comment and assistance in starting up the > > process. > > > > http://pub64.ezboard.com/bfpgatipsandtricks > > > > Because the forums are new I've focussed on Altera-based tools, but over the > > coming weeks if there is sufficient interest I'll attempt to extend them to > > other device toolsets. > > > > I should point out that there is little useful content on the forums as yet, > > which is where your assistance would be invaluable. > > > > You will need to register a user name, email and some details to post > > (viewing doesn't require this). How accurately you want to do this is > > entirely up to you. > > > > If you need to contact me, try pauljnospambaxter@hotnospammail.com without > > the nospam bits. > > > > Feedback appreciated. > > -- > --Ray Andraka, P.E. > President, the Andraka Consulting Group, Inc. > 401/884-7930 Fax 401/884-7950 > email ray@andraka.com > http://www.andraka.com > > "They that give up essential liberty to obtain a little > temporary safety deserve neither liberty nor safety." > -Benjamin Franklin, 1759Article: 40164
FIRs can generally be run at much higher data rates than IIRs in FPGA implementations because they can be heavily pipelined. IIR filters can't really be pipelined since the output has to be back at the input within one sample interval. Large FIRs can be had by using symmetry and distributed arithmetic. See my papers on my website regarding radar in a chip...in that case there are 4 256 tap FIR filters (2 per complex filter) per FPGA running data at 5 MHz sample rate. If you want bigger or faster, you might consider using an FFT to do fast convolution. We've done a 2K tap complex filter at 60 MHz sample rates that way. MANDY & DOUGLAS wrote: > Depends on the application. Basically an IIR filter can give a similar > response to an FIR filter. The use of an IIR in place of an FIR makes sense > when any implementation of the FIR would cause it to become too expenseive > to implement - typically high numbers of taps at high data rates. There are > lots of books on the subjects of filtering that I'm sure some other readers > of this site can recommend. > > "Alkos Nikos" <alkosd@yahoo.co.uk> wrote in message > news:24b1bde3a9b9f5b214e841e536d8a5a7.57871@mygate.mailgate.org... > > basic question, could we tell that IIR performs a convolution operation > > as FIR does > > thanks > > > > > > -- > > Posted via Mailgate.ORG Server - http://www.Mailgate.ORG > > -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 40165
Quartus II v2.0 Web Edition is now on the altera web page and it supports ACEX 1K. http://www.altera.com/products/software/sfw-quarwebmain.html rickman <spamgoeshere4@yahoo.com> wrote in message news:<3C6F58FB.1E11DA9B@yahoo.com>... > I see that this question has met with no reply after two weeks. I guess > that is the answer... > > > > Russell Shaw wrote: > > > > Guy Schlacter wrote: > > > > > > QuartusII v2.0 just released Friday and has been producing very good > > > results for both this new family and ApexII. > > > > When is Quartus web edition going to include Acex 1k devices? > > > > > As far as GATE COUNTING, every vendor and family uses differnet > > > nomenclature. For the user, you are best off descregarding gate counts > > > and comparing > > > 4input LUTs > > > Available Memory counts > > > Other dedicated Resources Multipliers etc. > > > > > > Best of Luck, > > > Guy Schlacter > > > Altera Corp. > > > > > > "Steve Holroyd" <spholroyd@iee.org> wrote in message > > > news:b623f4cf.0201111039.2a16155@posting.google.com... > > > > > > > I am currently task of recommending the largest, fastest and most > > > > memory FPGA that's readily available the first half of this year for a > > > > FPGA Array Card. > > > > > > > > The choices have been narrowed down to two families Altera's APEX-II > > > > (EP2A70) and XILINX Virtex-II (XC2V6000). > > > > > > > > Which can operate at the highest speed? > > > > > > > > Steve > > > > > > -- > > > Posted via Mailgate.ORG Server - http://www.Mailgate.ORG > > > > -- > > ___ ___ > > / /\ / /\ > > / /__\ Russell Shaw, B.Eng, M.Eng(Research) / /\/\ > > /__/ / Victoria, Australia, Down-Under /__/\/\/ > > \ \ / \ \/\/ > > \__\/ \__\/ > > -- > > Rick "rickman" Collins > > rick.collins@XYarius.com > Ignore the reply address. To email me use the above address with the XY > removed. > > Arius - A Signal Processing Solutions Company > Specializing in DSP and FPGA design URL http://www.arius.com > 4 King Ave 301-682-7772 Voice > Frederick, MD 21701-3110 301-682-7666 FAXArticle: 40166
Hi Kevin, I forgot to say that we already have the PCI Spec, which is actually the basis of our development. Due to your comment I think PCI Hardware and Software will be right for us. Thanks for the comprehensive answer. Matthias Kevin Brace wrote: > > One document I think you are forgetting to obtain is PCI Local > Bus Specification Revision 2.2. > You can purchase this specification from http://www.pcisig.com. > The reason I think this document is more important than the other three > books you mentioned is because Appendix B of PCI Specification have > sample state machines of a target interface and a master (initiator) > interface. > Although, I added a few more states in my PCI IP core design, my PCI IP > core's state machine for target and master pretty much resembles the > sample state machines of Appendix B. > Regarding the three books you mentioned, I own > PCI System Architecture 4th Edition (Costs $39.95 ISBN 0-201-30974-2) > and PCI Hardware and Software 4th Edition (Costs about $100 ISBN > 092939259-0). > The problem with both books I think is that they are pretty much bus > protocol books, and doesn't discuss how a PCI IP core should be designed > (Neither books have even a single state machine diagram in them.), and > how the designer should design it to meet timings (Setup time (Tsu < 7ns > for 33MHz PCI and Tsu < 3ns for 66MHz PCI) is the hardest part to meet). > So, if what you want from those two books is how you should design a PCI > IP core, it won't be there. > As a bus protocol book, I will say that PCI System Architecture 4th > Edition is more for a beginner, and it is fairly easy to read, so it is > not bad having it, but the PCI specification is also well written, and > easy to read, so one may have to ask itself, "Do I really have to have > PCI System Architecture 4th Edition?" > PCI Hardware and Software 4th Edition is for an experienced designer, > and it contains a lot of details, but I feel like because it contains > too much details, it is hard to read. > Another thing I don't like about it will be the diagrams shown in the > book which looks ugly compared to those of PCI Specification or PCI > System Architecture 4th Edition. > Maybe PCI and PCI-X Hardware and Software 5th Edition is better, but I > won't count on it. > To tell you the truth, when I developed (Although it is still not done > yet.) my PCI IP core, I pretty much relied on PCI Specification for > making design decisions, and rarely referenced PCI System Architecture > 4th Edition or PCI Hardware and Software 4th Edition. > So, I will say that PCI System Architecture 4th Edition and PCI Hardware > and Software 4th Edition are nice to have them, it is not absolutely > necessary to have it like the PCI Specification. > Regarding the more important part of how to implement a PCI IP > core, first thing you will like to do is to download copies of Xilinx > LogiCORE PCI Design Guide from Xilinx, Altera PCI MegaCore Function User > Guide from Altera, and Synopsys DWPCI (a PCI IP core for ASICs) Data > Book from Synopsys. > That should give you a picture of what a PCI IP core is like, and what > the backend user interface is like. > However, those documents won't discuss you how they designed the inside, > so you will have to figure out that yourself (Otherwise, no one will pay > several thousand of dollars for a license.). > From my experience of designing a vendor independent PCI IP > core, and implementing it in Xilinx Spartan-II XC2S150-5, the hardest > part of the design was meeting the setup time of Tsu < 7ns for 33MHz > PCI. > Meeting Clock-to-Output Valid (Tval) of Tval < 11ns for 33MHz PCI and > Tval < 6ns should be easy assuming that you know how to constrain FFs > within the IO pads (IOB FFs in Xilinx and IOE FFs in Altera.). > The reason meeting the setup time is so hard is because in PCI there are > cases where the registered version of control signals cannot be used > like when doing a no-wait cycle burst transfer (in initiator mode, > unregistered version of DEVSEL#, TRDY#, and STOP# has to be monitored, > and in target mode, unregistered version of FRAME# and IRDY# has to be > monitored.), or when asking for a disconnect in target mode > (unregistered version of FRAME# has to be monitored in this case), and > instead unregistered version has to be used. > These unregistered signals have to go through multiple levels of LUTs > before reaching a FF, and to meet the setup time, you cannot go through > too many levels of LUT. > Assuming that you use Spartan-II-5, and these control signals are close > to each other, going through 3 levels of LUT should still meet Tsu < 7ns > with automatic P&R. > If you don't mind doing manual floorplanning, 4 levels of LUT should > still meet Tsu < 7ns. > Another difficult part of a PCI IP design is how to update the > output port of AD[31:0] during a target read or an initiator write. > IRDY# in target mode, and TRDY# in initiator mode will influence the > output port of AD[31:0], but the problem here is that unlike the above > case of unregistered input control signals to a FF of output control > signals (in target mode, unregistered FRAME# and IRDY# to DEVSEL#, > TRDY#, and STOP#'s output port, and in initiator mode, unregistered > DELSEL#, TRDY#, and STOP# to FRAME# and IRDY#'s output port.), the > routing distance will likely be large, so the levels of LUT will have to > be even less than the above (Should be at most 2 levels of LUT.). > Use Clock Enable (CE) input of a FF to keep the levels of LUT low. > If you are using Xilinx devices like Spartan-II, you should use Xilinx's > infamous and secret PCILOGIC. > Only Virtex/Virtex-E/Spartan-II/Spartan-IIE (Virtex-II doesn't support > it) support PCILOGIC. > Basically, PCILOGIC has bunch of NAND gates that generates CE signals > for IOB FFs, and I guess the benefit of using it is that it supposedly > allows predictable timings. > To keep your code generic for ASIC porting or in use with other FPGAs > (i.e., Altera FPGAs), you should try to "emulate" this PCILOGIC with > regular LUTs. > One 5-input LUT, or three 4-input LUTs (will have 2 levels of 4-input > LUT) can emulate PCILOGIC. > If you are interested, I can post a sample Verilog code that works with > ISE WebPACK 4.1. > Parity generation is another issue during a target read cycle > because unregistered C/BE#[3:0] has to be through some kind of a parity > generator in Tsu < 7ns. > The trick is to compute the parity of AD[31:0] that is getting read > ahead, since there is 30ns to do so, and merge that with C/BE#[3:0] to > compute the final parity that goes out to PAR. > Virtex's 5-input LUT can handle this nicely, but I haven't been able to > instruct XST (ISE WebPACK's synthesis tool) to do so even if KEEP > attribute is used. > Using a carry-chain parity generator might be better than using a > combinational parity generator, but I haven't figured out a way to > instruct XST to infer that. > I recommend using Address/Data Stepping which, of course, > reduces bus utilization (performance) in initiator mode because GNT# has > to be asserted at least for 2 cycles (Won't be able to start a > transaction if GNT# is asserted for only one cycle. AD[31:0] and > C/BE#[3:0] has to be turned off immediately.), but will help you meet > Tsu for OE (Output Enable) FFs. > Xilinx and Altera uses this technique, presumably to meet Tsu < 3ns for > 66MHz PCI. > The bottom line is that if the logic is not designed carefully, > even a relatively new device like Xilinx Spartan-II won't meet even > 33MHz PCI's setup time. > To meet the setup time, you will need to have good understanding of > target architecture (Like what the delays of various internal resources > like LUT is like.). > Since I got mine to meet 33MHz PCI timings comfortably with Verilog, I > don't believe there is any necessity to use Schematics, but you should > never be hesitant to do manual floorplanning because automatic P&R tool > just doesn't know the correct location of where the LUT should be > placed. > Human brain can do a better job than what software can. > > Kevin Brace (Don't respond to me directly, respond within the > newsgroup.) > > Matthias Scheerer wrote: > > > > Hi there, > > > > to see a posting about PCI books was very interesting for me, because we > > too have to decide, what PCI book to buy. We are currently developing > > FPGA/ASIC logic to connect to PCI and also software drivers (linux by > > now). We now have to decide whether to buy > > "PCI System Architecture, 4th Ed. and PCI-X System Architecture" (both) > > or > > "PCI and PCI-X Hardware and Software, 5th Ed." > > > > Any comments on those (three) books ? > > > > Thanks. > > Matthias -- Matthias Scheerer (mailto:scheerer@uni-mannheim.de) University of Mannheim - Computer Architecture Group 68161 Mannheim - GERMANY (http://mufasa.informatik.uni-mannheim.de) Phone: +49 621 181 2721 Fax: +49 621 181 2713Article: 40167
Yes, i've been using it:) I still got a minor glitch:( Girl wrote: > > Quartus II v2.0 Web Edition is now on the altera web page and it supports ACEX 1K. > > http://www.altera.com/products/software/sfw-quarwebmain.html > > rickman <spamgoeshere4@yahoo.com> wrote in message news:<3C6F58FB.1E11DA9B@yahoo.com>... > > I see that this question has met with no reply after two weeks. I guess > > that is the answer... > > > > > > > > Russell Shaw wrote: > > > > > > Guy Schlacter wrote: > > > > > > > > QuartusII v2.0 just released Friday and has been producing very good > > > > results for both this new family and ApexII. > > > > > > When is Quartus web edition going to include Acex 1k devices?Article: 40168
Hi my experience with DC vs FC2 (fpga compiler II from Synopsys) is that the results in terms of speed and area from dc are very poor compared to fc2. This is probably true for other FPGA specific synthesizers like sinplicity, ... Check with Synopsys if a dc license enables you to use fc2. Unfortunately fc2 uses other commands than dc, e.g. set_multicycle_path is not available in fc2. I was told by Synopsys that there is a customer demand for reintegrating fc2 into dc again so you can keep all your constraints and synthesis scripts and just have to change the libs. I don't know if or when this will happen. HTH Ansgar -- Attention reply address is invalid. Please remove _xxx_ Jay <kayrock66@yahoo.com> schrieb in im Newsbeitrag: d049f91b.0202281030.206aeb6b@posting.google.com... > I'm trying to pitch that my client use Synopsys Design Compiler > instead of an FPGA specific synthesizer from another vendor since his > Xilinx Vertex 2 FPGA is a proto for a standard cell part. The clock > speed isn't important, verification of the tool flow and design > database is. > > The problem I'm running into is that the Design Compiler output uses > almost 200% the LUTs compared to the purpose built FPGA synthesizer. > So the logic will no longer fit the proto board. > > Mini Example: > Design compiler: 1760 LUTS > FPGA synthesizer: 824 LUTS > > Design compiler synthesizes to cells like AND2, OR2, AND4, etc whereas > the FPGA specific tool maps directly to special LUTs custom made for > the logic required like LUT_AB5A and LUT_67FE, etc. Now I figured the > Xilinx mapper would be smart enough to "map" the Design Compiler AND2, > OR2, etc, into more compact LUT_ABCD and LUT_6534 type cells but just > seems to be doing a 1 for one map with no optimization. > > It appears that Xilinx did not write the mapper optimization (option > -oe) for the recent products Vertex E/2 an Spartan 2 in effect giving > up support for Design Compiler. > > Can any one else comment on this? It seems crazy that I can't use the > old man of sythesis (Design Compiler) at $100k seat anymore. > > BTW- Altera DOES still do map optimization on Design Compiler EDIF > files.Article: 40169
alw@al-williams.com (Al Williams) writes: > True, but what if you want to get tricky and have one flip flop latch > on the rising edge and another latch on the falling edge? This seems > to work, but it does whine about the clock not being global. > It will then not be global. > I haven't tried it, but I was wondering if you could invert the clock > and then feed it through a global buffer (assuming you haven't used > all the global buffers). Don't know if that'd work or not. > If you have global buffers (which there aren't on the MAX3000, just a global clock pin IIRC) you can do that, but even with a 50-50 clock you can't guarantee where in the cycle the falling edge is as you've gone through the routing/inverter/routing/global delays. Cheers, Martin -- martin.j.thompson@trw.com TRW Conekt, Solihull, UK http://www.trw.com/conektArticle: 40170
And the routing resources (and speed) steadily increase as the generations advance. For some applications this can be decisive. Which sadly means that the only certain guide for your design is a trial implementation. Ray Andraka wrote: > Nope, different marketing gates. The virtexII has a higher concentration > of memory, plus the multipliers contribute to the gate count. You will > need to compare the number of slices, which will leave you somewhat less > than 3:1 when comparing 2V6000 to 1000E. Also, it is worth noting that > the cost is more or less exponential with device size. For a given number > of slices, rather than a given number of marketing gates (and ignoring the > other goodies) the VIrtexE is very competitive. > > king wrote: > > > Hi all, > > I have a design which uses say X no of XCV1000E FPGAs. I wud like to > > go for denser FPGAs ( XC2V6000). The total system gates in XCV1000E is > > approximately 1.5 Million while in XC2V6000 is 6 Million. So can I > > assume that the logic implemented in four (6/1.5) FPGAs can be > > implemented using a single XC2V6000 FPGAs? But the LUTs of the two > > looks different. Will this affect the beforesaid ratio? Or is there > > any other decisive factors involved? Ur reply will be most welcom > > with kind regs > > king > > -- > --Ray Andraka, P.E. > President, the Andraka Consulting Group, Inc. > 401/884-7930 Fax 401/884-7950 > email ray@andraka.com > http://www.andraka.com > > "They that give up essential liberty to obtain a little > temporary safety deserve neither liberty nor safety." > -Benjamin Franklin, 1759 > >Article: 40171
You could use DC to compile to a Verilog netlist, then an FPGA-specific compiler to reprocess the Verilog netlist into EDIF. This would add another tool to your chain, but since you are using Xilinx 'map' anyway, you are not really adding any more uncertainty. Whether the FPGA-specific optimisations would survive all this... Jay wrote > I'm trying to pitch that my client use Synopsys Design Compiler > instead of an FPGA specific synthesizer from another vendor since his > Xilinx Vertex 2 FPGA is a proto for a standard cell part. The clock > speed isn't important, verification of the tool flow and design > database is. > > The problem I'm running into is that the Design Compiler output uses > almost 200% the LUTs compared to the purpose built FPGA synthesizer. > So the logic will no longer fit the proto board. > > Mini Example: > Design compiler: 1760 LUTS > FPGA synthesizer: 824 LUTS > > Design compiler synthesizes to cells like AND2, OR2, AND4, etc whereas > the FPGA specific tool maps directly to special LUTs custom made for > the logic required like LUT_AB5A and LUT_67FE, etc. Now I figured the > Xilinx mapper would be smart enough to "map" the Design Compiler AND2, > OR2, etc, into more compact LUT_ABCD and LUT_6534 type cells but just > seems to be doing a 1 for one map with no optimization. > > It appears that Xilinx did not write the mapper optimization (option > -oe) for the recent products Vertex E/2 an Spartan 2 in effect giving > up support for Design Compiler. > > Can any one else comment on this? It seems crazy that I can't use the > old man of sythesis (Design Compiler) at $100k seat anymore. > > BTW- Altera DOES still do map optimization on Design Compiler EDIF > files.Article: 40172
Phil, Your solution did indeed solve the hanging problem - thanks very much for the code! The design now runs for 28 seconds instead of 3 for some other reason but I am sure I can track this down now I can make changes to my VHDL without random state hangs happening! Thanks again, KenArticle: 40173
Phil, Forgot to mention that the DAC is fed the 425kHz clock via a pin of my Spartan-II. The DAC uses the negative edges of the 425kHz clock to take data from process C. Process C uses the positive edges of the 425kHz clock to place data on the serial data pin for the DAC and the negative edges then drive the DAC. My DAC output now seems to be irregular - what signal should I use to clock my DAC now? Thanks again, Ken > > Process A: Runs at 33MHz to fill a 16 location FIFO with 8-bit data > > samples and then keep it full. > > Process B: Runs at 33MHz to take data from the FIFO when told to and > > supply it to process C via a register. > > Process C: Runs at 425kHz and sends FIFO 8-bit data samples to a DAC > > bit-serially then asks process B for next piece of data. > > > Process A is fed with data via a Visual C++ app (the Spartan-II is mounted > > on a PCI board) which is synchronised with the FPGA using an interrupt pin > > that the FPGA can assert and the C++ can read. > > > > I have used this system for many designs with no trouble (none of them had > > multiple clock domains however). The problem here is that the design is > > getting stuck in state for no apparent reason! (i.e. the C++ hangs waiting > > for the interrupt pin!). > > Sounds to me like you have a problem crossing clock domains. > > > > The system works for a random number of samples (between 28 and 33 it seems) > > and then gets stuck in state. This is very strange because it means that my > > protocols do work. > > Not strange at all. Suppose we have two registers in one clock domain sampling > a signal from the other clock domain. If both get the same value for the next > clock, the logic works. If only one gets the value, the logic hangs. > > > > The weird thing is that I put a piece of debug code in another state to send > > a signal out to a pin to probe, I ran the flow again to get a bitstream and > > the system ran perfectly for all 75001 samples I am using! The debug code > > was "Debug <= '1'". > > Different placement, different routing, different timing, different odds of > failure. Might work well at 25C, and fail like above at 28C. > > > Then I enabled clock DLLs using the BUFGDLL component and it hangs again! > > Different timing, different odds of failure. > > > > Previously, I had it working perfectly using the clock DLLs but without a > > FIFO (i.e. 1 sample at a time from C++ to FPGA to DAC) but I got some > > stutters hence I introduced the FIFO. > > > > In that design I also had hanging problems but after I rejigged my protocol > > in my VHDL state machines it worked perfectly. > > > > It seems that seemingly random changes of VHDL make or break the system. I > > guess it must be to do with my 2 different clock rates but that is the way > > it has to be. > > > > I am at a loss - anyone any ideas? > > First, sync the 425KHz to 33 MHz, then edge detect, and then use the edge > detected clock 425 for a clock enable for process C. Code fragments: > > process(clk33) begin > if rising_edge(clk33) then > synslow <= clk425; > synslow2 <= synslow; > synslow3 <= synslow2; > en425 <= synslow2 and not synslow3; > end if; > end process; > > processc: > process(clk33) begin -- was (clk425) > if rising_edge(clk33) > if en425 = '1' then -- was rising_edge(clk425) > .... > > The reason this (hopefully!) will solve your problem is that almost all logic > will be running on a single clock. While there is a chance that synslow will > not correctly clock in clk425 on rising or falling edge ("go metastable"), > synslow2 is much less likely to fail (As mean time between failures >> age of > universe), and en425 even less so. > > Also, you could synchronize all control signals between the two processes. More > complex. > > > -- > Phil HaysArticle: 40174
Hi, there were some "counter discussions" in this newsgroup, however I wonder if it might be possible to - divide down a couple of LVPECL clocks running at 622 MHz to a few MHz or kHz and - build a programmable LVPECL divider that can either pass 155 MHz and drive it off-chip as LVDS clock or divide down a 622 MHz LVPECL clock to 155 MHz and again drive it as LVDS clock to an another chip on board I requirement would be, that the FPGA is small in respect to the board space - the small Virtex-II 50/80 (BGA256) devices looking very nice. Might there be a way to do that? Bye Thomas
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z