Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Jay, I think that the link you gave is to a Xilinx implementation that would be suitable for elliptic curves over the finite field F2m, rather than elliptic curves over the finite field Fp. I think I mentioned that F2k would be a somewhat kinder, gentler problem for FPGAs, but they could rip solutions out for Fp as well. Here's the Certicom challenge you linked to (i.e. ECCP-109) : http://www.certicom.com/research/ch42.html Here's the one that the article would be good for solving (i.e. ECC2-k): http://www.certicom.com/research/ch4.html By the way, here's a link to the C code that's required for the ECCP problems. It looks to me like it would be easily put into an FPGA, and that it wouldn't take up a lot of bandwidth on the PCI interface if done correctly. (I'd try to put all the math algorithms on, and only write control to the PCI, and read back results on the PCI interface that were "distinctive". http://www.nd.edu/~cmonico/eccp109/downloads/eccp109-130-3.tar.gz Carl -- Posted from firewall.terabeam.com [216.137.15.2] via Mailgate.ORG Server - http://www.Mailgate.ORGArticle: 37601
i want implement a division operation in FPGA.Eg.it divide by 64 how do i?Article: 37602
Hello, Jason. The SPI-4 IP Cores are available from both ALTERA and Xilinx. In our project, I use the ALTERA'S one. It could be freely downloaded from the ALTERA'S site for evaluation. http://www.altera.com/products/ip/altera/m-alt-posphy4.html It does not include a lot of VHDL code but encrypted netlists. But there is a 'USER GUIDE' which gives some information how it is implemented. It could be useful. Alex.Article: 37603
By the way ,how do it implement if it use addition,subtration,multiplication,division ?Article: 37604
Hi Carl, You da man! I was thinkng about using a BlockRAM to do squaring in a project of mine, more for the latency issues than device usage in my particular case, and was left with one 'regular' multiply. I like your sugestion. I'll try it out in the new year and get back to the group with any issues it raises, or sucesses. It seems so simple... I guess I don't need to feel so bad about only having Virtex hardware to play with, and no hardware multipliers ;-) As an aside, thanks to the posters who answered my previous article about modifying blockram contents in a bitstream, I haven't got round to trying anything yet, but I will do... Regards, Chris Saunter Carl Brannen (carl.brannen@terabeam.com) wrote: : I'm not sure if people are doing this already, but I couldn't find a reference : on the Xilinx web site. : Block RAMs make more efficient squaring circuits than they do multipliers. And : you can get multipliers out of squarers. : An explanation for the arithmetic. Let A and B be the numbers to be : multiplied. : Compute C = (A+B) * (A+B) = A**2 + 2AB + B**2 : Compute D = (A-B) * (A-B) = A**2 - 2AB + B**2 : Then C - D = 4AB. : This is particularly efficient in Xilinx Spartan2, Virtex, and Virtex2 : architectures because the block RAM is dual port. That means you can use one : side for the (A-B)**2 calculation and the other side for the (A+B)**2 : calculation. : With the Xilinx Spartan2, Virtex or VirtexE, use the RAMB4_16_16. It has 8 : inputs and 16 outputs in two sections. Each section can conveniently compute : the square of an 8-bit number. Note that the lowest two bits of the two : squares are going to have to be equal (i.e. C-D = 4AB, so C and D have to match : two bits), so you don't have to subtract bits 1 and 0 of the two squares. : If "A" and "B" are both 7-bit, their sum will be no worse than 8-bit, so you : can compute a 7x7 multiply using only the 8 LUTs for each of "A+B" and "A-B", : and another 14 LUTs for the result, a total of 30 LUTs (i.e. 15 slices) and one : block RAM. Maybe there's a way to get the bit back, and let A and B be 8-bit : numbers; I haven't looked at it long enough to conclude there isn't. : The circuit uses about half the LUTs required by the standard algorithm, at an : expense of one block RAM. : To put the LUT utilization in perspective, the Xilinx 8x8 multiply takes 39 : slices: : http://www.xilinx.com/ipcenter/reference_designs/vmult/vmult_v1_4.pdf : Using RAMB4s alone to implement even a 7x7 multiply would require a huge number : of them, as multiplies require twice as many address inputs as squares. : You can iterate on the calculation of the square. That is, if A is too big to : square in a single operation, then break A into two parts. With A broken into : two parts, say A = AH + AL, you can compute AH**2, AL**2 with block RAM, and : compute 2*AH*AL by computing the difference between (AH+AL)**2 and (AH-AL)**2. : Breaking A and B into more than 3 parts may be worth exploring, for certain bit : sizes. : Carl : -- : Posted from firewall.terabeam.com [216.137.15.2] : via Mailgate.ORG Server - http://www.Mailgate.ORGArticle: 37605
On Sun, 16 Dec 2001 21:09:17 -0800, "Jay Berg" <admin@eCompute.org> wrote: >Let me see if I can explain this. But given that I'm not a math expert, bear >with me. > >Modulo math (also known as "clock arithmetic") can be thought of as using >remainders. Imagine the following numbers. > > .... > >Since the need is for 128-bit multiplication (128x128=256), the result of >the multiplication can be 256-bits in size. Following the multiplication, >the 256-bit result is reduced by the modulus value N. This translates the >result into a number between 0 and (N-1). With the assumption that N is >128-bits (or less), the final result of the modulo multiplication will be >128-bits (or smaller). So, is the N you want to use a variable ( 1 .. (2^128)-1 ) constant( 1 .. (2^128)-1 ) or easiest of all, the number (2^128)-1 Philip Freidin FliptronicsArticle: 37606
I use the following code to initialize the block RAM of SpartanII FPGA as the handbook of Xilinx: module MYMEM(clk,we,addr,din,dout); input clk,we; input [7:0]din; output [7:0]dout; input [8:0]addr; wire logic0,logic1; assign logic0=1'b0; assign logic1=1'b1; RAMB4_S8 ram0(.WE(we),.EN(logic1),.RST(logic0),.CLK(clk),.ADDR(addr),.DI(din),.DO(dou t)); // synopsys translate_off defparam ram0.INIT_00= 256h'0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF; defparam ram0.INIT_01= 256'hFEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210; // synopsys translate_on endmodule But when I implement it using Xilinx Foundation3.1i.It shows INIT_xx including INIT_00 and INIT_01 not initialized.I use Synopsys FPGA Express to synthesis. Who can tell me how to initialize the block RAM? Thank you!Article: 37607
Mardin wrote: > i want implement a division operation in FPGA.Eg.it divide by 64 > how do i? Vry simple: Just shift the binary number six positions beyond the LSB. Peter Alfke, Xilinx ApplicationsArticle: 37608
If you just want to devide by 64, shift right by 6 places. The modulo is what was shifted out. To do the others, I recomend reading HDL Chip Design, by Douglas J. Smith. Chapter 9 has a good number of examples on how to implement various arethmetic functions, and pros and cons of each. next wrote: > By the way ,how do it implement if it use addition,subtration,multiplication,division ?Article: 37609
The current SW does not lend itself well to parallel processing at the level we're talking. The results of the first series of operations provide the results used to achieve the next series. Block operations would be quite difficult to achieve without redoing the entire SW application. I'll start work on thinking how to redo the method into a process that would allow working multiple results in a parallel fashion. Jay "Steven Derrien" <sderrien@irisa.fr> wrote in message news:3C1DBD59.ECD96E70@irisa.fr... > > Hello, > > On a typical PCI FGPA bord, it is likely that your performance is limited by the > PCI bandwidth rather than by the FPGA processing power. Assuming N fixed, you > need 3*128 bits (96 bytes, 2Wr 1Rd) I/O per iteration. > > If you use PCI MMAP IOs, you will hardly get more than 15MBytes/sec between the > host and the board. This poses a bound on your achievable peformance (15/96*10^6 > Mul operation per second ~ 150 000 Mul/sec), which will be less than what you > get by software. > > If your alogorithm has no data dependecies between the different multiplication > results (which I doubt), you could use blocked I/O (or DMA) operations, and > maybe reach 60-80Mbytes/sec, but even then, you would not get more than 1 > million multiplication per second. > > The only solution would be to implement a larger part of the algorithm (like a > whole loop nest) on the FGPA board, which is much more difficult (unelss your > algorithm is very regular, and requires little control) but this generally > reduces the amount of I/O operations on the PCI bus. > > StevenArticle: 37610
I recently came across a 'Pilchard' - an FPGA prototype system built on an SDRAM module, which when used in a Linux host allows very fast PC - cryptoprocessor etc. comms. This project was headed by Prof. Philip Leong at the Chinese University of Hong Kong. They used a Virtex 300 chip, in QFP format, which I believe is pin compatible with larger chips (although if you used a Virtex 800 on there, you might want to run a seperate power line to the card...) I have nothing to do with this proejct, other than I saw it and thaught 'that's sweet'... ;-) Try a google search for: pilchard fpga There is a .pdf availible but I can't find the address right now... (It's linked to from slashdot.................) Best of luck with the project! Cheers Chris Saunter Jay Berg (admin@eCompute.org) wrote: : After making the mistake of getting involved in the current ECCp109 : distributed computing project (see URL below), I'm now casting around to : determine if there's a possibility of finding a PCI board with an FPGA : co-processor capable of handling a small set of modular math functions. < big snip>Article: 37611
<see below for comments> "Carl Brannen" <carl.brannen@terabeam.com> wrote in message news:382a3bda7feccf0799b4a71c935b2c57.51709@mygate.mailgate.org... > Jay, I think that the link you gave is to a Xilinx implementation that would be > suitable for elliptic curves over the finite field F2m, rather than elliptic > curves over the finite field Fp. I think I mentioned that F2k would be a > somewhat kinder, gentler problem for FPGAs, but they could rip solutions out > for Fp as well. > > Here's the Certicom challenge you linked to (i.e. ECCP-109) : > http://www.certicom.com/research/ch42.html > > Here's the one that the article would be good for solving (i.e. ECC2-k): > http://www.certicom.com/research/ch4.html > > By the way, here's a link to the C code that's required for the ECCP problems. > It looks to me like it would be easily put into an FPGA, and that it wouldn't > take up a lot of bandwidth on the PCI interface if done correctly. (I'd try to > put all the math algorithms on, and only write control to the PCI, and read > back results on the PCI interface that were "distinctive". > > http://www.nd.edu/~cmonico/eccp109/downloads/eccp109-130-3.tar.gz > > Carl Yes, the last URL (for the SW download) you provided is for the ECCp109 challenge that I'm speaking of. There are a total of three paths through the math to achieve results. Each is very slightly different, but share many characteristics. As you can see, the only steps used are a series of add, subtract, and multiply. With each of the math operations being modulo of the same N value. Also note that all functions use a parameter list in the format of: function (result, inputX, inputX) And last, each of the three paths result in a single result value which is checked to for being one of the target values. Path 1 - Total of input parameters needed: 5 PY PX op_list[i].y op_list[i].x needInverting [i<<2] submod_p109 (lambda, PY, op_list[i].y); mulmod_p109 (lambda, lambda, &needInverting [i << 2]); addmod_p109 (temp_ul, op_list[i].x, PX); mulmod_p109 (temp2_ul, lambda, lambda); submod_p109 (tempx, temp2_ul, temp_ul); submod_p109 (temp_ul, op_list[i].x, tempx); mulmod_p109 (temp_ul, lambda, temp_ul); submod_p109 (res_list[i].y, temp_ul, op_list[i].y); Path 2 - Total of input parameters needed: 5 QY QX op_list[i].y op_list[i].x needInverting [i<<2] submod_p109 (lambda, QY, op_list[i].y); mulmod_p109 (lambda, lambda, &needInverting [i << 2]); addmod_p109 (temp_ul, op_list[i].x, QX); mulmod_p109 (temp2_ul, lambda, lambda); submod_p109 (tempx, temp2_ul, temp_ul); submod_p109 (temp_ul, op_list[i].x, tempx); mulmod_p109 (temp_ul, lambda, temp_ul); submod_p109 (res_list[i].y, temp_ul, op_list[i].y); Path 3 - Total of input parameters needed: 4 A op_list[i].y op_list[i].x needInverting [i<<2] mulmod_p109 (temp_ul, op_list[i].x, op_list[i].x); addmod_p109 (temp2_ul, temp_ul, temp_ul); addmod_p109 (temp2_ul, temp2_ul, temp_ul); addmod_p109 (lambda, temp2_ul, A); mulmod_p109 (lambda, lambda, &needInverting [i << 2]); mulmod_p109 (temp_ul, lambda, lambda); submod_p109 (temp_ul, temp_ul, op_list[i].x); submod_p109 (tempx, temp_ul, op_list[i].x); submod_p109 (temp_ul, op_list[i].x, tempx); mulmod_p109 (temp_ul, lambda, temp_ul); submod_p109 (res_list[i].y, temp_ul, op_list[i].y); If it was possible to put all three paths into firmware, it would be easy enough for the SW to preload the correct parameters and trigger the correct firmware operation. Is this in line with what you were thinking?Article: 37612
Dividing by a power of two is easy and can be accomplished by just shifting in the direction of the LSB. In case you want to implement more sophisticated divisions check out the LPM (Library of parameteized Modules) in the Altera tools MAX+plus II and Quartus II. It contains a free divide Megafunction called LPM_DIVIDE as well as VHDL and Verilog simulation models that go with it. - Wolfgang http://www.elca.de "Mardin" <chens_w@yahoo.com.cn> schrieb im Newsbeitrag news:ee73c01.-1@WebX.sUN8CHnE... > i want implement a division operation in FPGA.Eg.it divide by 64 > how do i?Article: 37613
The N (modulo) value for the ECCp109 challenge is: 00001BD579792B380B5B521E6D9FB599 As you can see, it does not utilize the full 128-bit domain. As I said elsewhere in the thread, N does not change during operation. But when the current challenge is completed, a new challenge would be launched and the N factor would need to become a new value. But if we are to move the entire equation into firmware rather than discrete math operations, the firmware solution becomes unique to this challenge. As a result, a new design would be needed for later challenges. Be aware that the current challenge was started in April and is currently estimated to be about 14% completed. It is believed that 58 million DPs (Distinguished Points) will need to be found to achieve a solution. As of this morning, there have been 8.6 million DPs found. To give an idea of the complexity of this challenge, I have one 450p2 that has been running the client for the last month or so. It has a total of 459,756,170,368 iterations performed (max of 190,000 iterations per second) and has found a total of 829 DPs. Each iteration consists of approximately 8-11 math operations on 128-bit numbers. Jay Berg jberg@eCompute.org "Philip Freidin" <philip@fliptronics.com> wrote in message news:i01s1u445chatt8sjimupf54m04j8gmp4n@4ax.com... > On Sun, 16 Dec 2001 21:09:17 -0800, "Jay Berg" <admin@eCompute.org> wrote: > >Let me see if I can explain this. But given that I'm not a math expert, bear > >with me. > > > >Modulo math (also known as "clock arithmetic") can be thought of as using > >remainders. Imagine the following numbers. > > > > .... > > > >Since the need is for 128-bit multiplication (128x128=256), the result of > >the multiplication can be 256-bits in size. Following the multiplication, > >the 256-bit result is reduced by the modulus value N. This translates the > >result into a number between 0 and (N-1). With the assumption that N is > >128-bits (or less), the final result of the modulo multiplication will be > >128-bits (or smaller). > > So, is the N you want to use a > variable ( 1 .. (2^128)-1 ) > constant( 1 .. (2^128)-1 ) > > or easiest of all, the number (2^128)-1 > > > Philip Freidin > FliptronicsArticle: 37614
Hi Jason, look at http://www.disi.unige.it/person/AnconaD/Architettura/vhdl_man/spi_ex.htm Regards Alex "Jason Berringer" <jberringer@trace-logic.com> wrote in message news:<S5bT7.2519$NC5.476993@news20.bellglobal.com>... > Hello again > > I'm curious to know if anyone out there knows where there are some examples > of an SPI interface coded in VHDL. Just curious as I have to code one in the > near future and I always like to compare the various approaches taken by > others. > > Thanks > > JasonArticle: 37615
As usual I'm in the position of trying to shut the stable door when the horse is already 2 counties away and accelerating fast but ... Has anyone on C.A.F used the ChipScope ILA stuff ? Does it work as advertised ? Had sucesses/failures ? Does it take up a lot of space per embedded analyser ? In short where does it lie in the spectrum [essential ... helpful ... difficult to use ... waste of time & gates] ?Article: 37616
hello, i'm new in this domain of computer science, so i don't know where to look for site or article on the specific sibject, that is : fast carry chains for FPGA. i've searched on citeseer.nj.nec.com/cs and found nothing interesting. so, do you have any idea where i can find resources about this subject? supaman. ps : i'm mainly interested in article, i have already 2 : "high-performance carry chains for FPGA's" by S. Hauck, M. M. Hosler, and T. W. Fry. "FPGA adders : performance evaluation and optimal design" by Shanzhen Xing, and William W.H. Yu.Article: 37617
Ray and Adarsh, XST can infer dual port RAM. Supported templates are documented in the XST User Guide within the Xilinx online documenation. Not every possible variant is supported; please let us know if the configuration you wish to use is not listed. FPGA Express does not infer any RAM. thanks, david. David Dye Xilinx Boulder Ray Andraka wrote: > XST won't infer a dual port RAM, and I don't believe express will either, which > is why you are getting FF's. Instantiate the RAMB4_S16_S16 directly from the > unisim library. Any generics have to be made invisible to the synthesizer > using translate_off/on pragmas (and you'll have to put matching attributes in > to pass the parameters to the netlist). > > It might help for you to post the errors along with snippets of your code. > > adarsh wrote: > > > we are having a similar problem. > > we need a Dual Port Ram for our design and were trying to instantiate one > > of the Block Rams available on the Virtex - E device using Verilog. > > Initially we just declared the required memory as an array of > > registers.something like > > > > reg [15:0] memory [255:0] ; > > The synthesizers (XST, FPGAexpress) are not inferring this as a RAM but as > > FFs > > > > Then we tried with the Xilinx Language template RAMB4_S16_S16, this gave a > > synthesis error. > > Last we tried with CorGen that gives an error when we hit the Generate > > button. > > > > Any suggestions ? > > > > adarsh kumar jain, > > > > Ray Andraka wrote: > > > > > I think he was looking for a low cost or free tool that would infer > > > one. If your tools do not support RAM inference, then you can always > > > instantiate the RAM primitive. I usually just instantiate the primitive > > > because it gives me more portability between tools and more flexibility > > > in describing what I want (and you don't have to rely on the tool for > > > doing the right thing, especially in regards to a dual ported memory). > > > > > -- > --Ray Andraka, P.E. > President, the Andraka Consulting Group, Inc. > 401/884-7930 Fax 401/884-7950 > email ray@andraka.com > http://www.andraka.com > > "They that give up essential liberty to obtain a little > temporary safety deserve neither liberty nor safety." > -Benjamin Franklin, 1759Article: 37618
Has anbody done this? The installer complains about MFC42.DLL and hiccups but continues. When in Project Manager everything synthesis-related is grayed- out. I've got my LM_LICENSE_FILE var pointed to \xilinx\data\license.loadngo. Any help most appreciated. Thanks. -Dave Not speaking for my employer, etc.Article: 37619
Steven, I too use the parallel port for both FPGA confiuration and post-configuration control. I've been using a switchbox (the kind for sharing two printers on one port) and that works great. If that's an option for you, it might be easier than debugging the PCI port. Occasionally, things get in a mode where I get the DONE FAILED TO ASSERT message repeatedly, but cycling power on the BurchEd board seems to fix it. Nick Steven Derrien wrote: > > Hello, > > This might be slighlty off-topic, but I guess several people in this NG > had to face this kind of problem. > We are using a BurchEd Board, with a parallel port download cable, > however, because we need to communicate with the board once it is > configured, we use two parallel ports, the one on the motherboard (for > communication in EPP mode) and another one connected on a PCI parallel > port extension board (using netmos 9705 chip) for configuration*. > > The PCI // port, does not work properly when it comes to configure the > FPGA board (I managed to make it work for a week or so, but now for a > mysterious reasons, the FPGA DONE signal does not behave correctly). BTW > configuration with the motherboard // port works fine. > > The general PCI // port behavior is correct (checked by feeding-back > CTRL signal on STATUS), so I really don't understand where this problem > is coming from. Has anybody faced the same kind of problems ? > > * We have no choice since the PCI board does not seem to allow anything > else than SPP > > Thank you for your help, > > StevenArticle: 37620
sdfjsd wrote: > > Last year, I used Xilinx Foundation Express 3.3i, to develop > for a Virtex300 part. I recently went to Xilinx's hoomepage, > and found that the 'Foundation ISE' has replaced the older > Foundation (non-ISE.) > > Does this mean : > > 1) goodbye old Windows 16-bit legacy code > (3.3i would crash on average, every 4-6 compiles, and > sometimes take down my NT4 workstation) What the heck were you doing with your computer that it would cause it to crash? I ran NT 4 SP6 with the Xilinx service packs, and it never crashed the computer. --aArticle: 37621
Peter Alfke wrote: > > Mardin wrote: > > > i want implement a division operation in FPGA.Eg.it divide by 64 > > how do i? > > Vry simple: > Just shift the binary number six positions beyond the LSB. Even simpler: select the bits you want. F'rinstance, signal foo : unsigned (15 downto 0); signal result : unsigned (9 downto 0); result <= foo (15 downto 6); You could probably use an alias to do the same thing... OK, so I'll admit that it's not bleedingly obvious that you want to divide by 64, and I should've written the example in Verilog, but... -------aArticle: 37622
Hi, I am debugging an Altera FPGA board with old "byteblaster" interface on board ( Just a 74HC244 ). Now I have only Maxplus II baseLine software which support only "byteblasterMV". The circuit of "byteblasterMV" added some pull up resistors, and also other slight difference. Problem : When program with JTAG chain through "byteblasterMV" mode, MaxplusII can't detect the device on board (An EPC2L20 ). Question: 1. Should I change the circuit on board to "byteblasterMV"? (I mean cut some wire, add some resistors). 2. Or maybe there is some patch or driver can be installed into Maxplus II baseline, make it support old byteblaster mode. Thank you very much. ShawnArticle: 37623
Thanks! how should i get the remainder ?Article: 37624
On Mon, 17 Dec 2001 16:35:15 -0800, next <chensw20@hotmail.com> wrote: >Thanks! >how should i get the remainder ? It's the six bits you were going to throw away. Philip Freidin Fliptronics
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z