Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
jan wrote: > Hello. > The reg statement will probably syntehsize into RAM on the FPGA, the > initial statement will not synthesize at all ( I'm using Leonardo > targeting Altera's FLEX devices). Initialization does not work for synthesis. Use a reset input. -- Mike TreselerArticle: 46526
Kunal wrote: > I read about Transmeta's Crusoe chip some time back, which has > something called the Code Morphing Software. This code morphing s/w > actually reads hex from its code memory, and at run-time translates > the hex code into equivalent native machine language instructions. So > the whole system itself is like a Java Virtual Machine (or a run-time > cross-assembler), only there is no partitioning between the H/W and > the system S/W. > > The whole thing is a overhead, of course, but its highly optimized and > parallelized in hardware wherever possible. Last I read, they had code > morphing software for 8086 instructions, i.e the Code Morphing > Software could only "understand" 86 hex. This system also allows you > to run programs compiled for different processors at the same time, > i.e it decides at run-time which instruction set is supported. > > Ok here's an idea... how about code-morphing HARDWARE? This sounds crazy enough to be interesting ;-) [...] > > Of course, there are a LOT of issues here, and operation may be slowed > down slightly, but it IS possible. The biggest problem would be > mapping between specific registers, but we can leave that to the > application programmer or the source assembler / compiler. IMHO the biggest problem is the PC (program counter) mapping. The straight forward approach is to simulate every virtual instruction with the same number of real instructions. If the processor you are emulating has multiplication builtin in and your micro hasn't that can be quite a lot. You would do a jump to skip the unused part in simpler instructions. - This is easy to implement only needs a EPROM as lookup-table - This has overhead of a jump for almost all instructions - This wastes much of the available address space Second approach: Use a call for each instruction - almost as simple to implement - greater overhead (call/ret) per instruction - maximum usage of address space - simple instructions may not need a call Third approach: Mix both. Do all commands that are quite short (perhaps 4 words) directly. Do a call for the rest. - still doable with a few standard logic parts (74xxx) Another thought: Using variable length replacements is very difficult, because you have to analyse the whole program and make a PC translation table. I wouldn't do that. > The applications of such a device would be very interesting indeed. A > code-morph for Java bytecode is only the beginning.... Backward > Compatibility will not be an issue anymore. This, I understand, is > keeping them from using all the features on the latest Intel chips. We > can load protocol-translation mappings to, transperently converting > from, say RS-232 to I2C (we already have hardware TCP/IP stacks). We > could port the hex code itself to other processors, instead of > re-writing the source code and re-compiling. Programmers can include > useful language features from other instructions sets without having > to worry about implementing them in the target processor code. > > Ok that's enough speculation for now, but could anyone well-versed in > VLSI design tell me how feasible this is? I don't think it will be > very difficult to implement, but the design of such a chip would be > very challenging. Also I need to know from experienced embedded > systems designers how truly useful such a device would be, and would > all the effort of developing it pay off, in terms of financial returns > and intellectual property rights. Or would there be lots of Transmeta lawyers hunting you down? I think you are better of with a software VM: - you can use uC without external program memory (more choices) - you can run all code from internal memory (more speed) - you can use a chip with harvard architecture (more speed) The hardware VM could be faster if most instructions can be simulated by one or two instrauctions. But remember the wait states you have to add for the translation logic. At last: Would it be cheaper to buy the hardware VM or to buy a faster processor? Still it would be a great fun project. The reason that it works for Transmeta is that their processor is designed to do this. > kundiArticle: 46527
Falk Brunner wrote: > I never understood what the hell is the advantage of putting some kind of > realtime compiler into expensive silliy-cone? Wouldnt those Transmeta guys > be much smarter (and nowadays much richer) if the had done a nice optimiced > RISC or whatever CPU and wrote a "simple" piece of software "aka translator, > compiler whatever) to translate a x86 code to native RISC just before > executing it, then load the RISC-code into RAM and execute it.? Anyone can > enlighten me? That is very much akin to what they did do. The Crusoe core is 4 issue VLIW implementation that runs the "Code Morphing" software loaded from a Flash-ROM on the mainboard. The software takes care of code translation, branch prediction, register renaming, and instruction reordering largely in software although the Crusoe does have some specialized hardware to speed this up. -- Wishing you good fortune, --Robin Kay-- (komadori)Article: 46528
Falk Brunner wrote: > "Kunal" <kundi_forever@yahoo.com> schrieb im Newsbeitrag > news:cd714c44.0209020625.5b892675@posting.google.com... > > [Code morphing in microprocessors and FPGAs ] > > I never understood what the hell is the advantage of putting some kind of > realtime compiler into expensive silliy-cone? Wouldnt those Transmeta guys > be much smarter (and nowadays much richer) if the had done a nice optimiced > RISC or whatever CPU and wrote a "simple" piece of software "aka translator, > compiler whatever) to translate a x86 code to native RISC just before > executing it, then load the RISC-code into RAM and execute it.? Anyone can > enlighten me? A key idea: To work in the x86-compatibility market, the cpu must be capable of booting multiple OSs, which means that the translator had to be internal to the cpu. The pure sw emulation approach was tried with DECs FX-32 (sp?), it worked OK for applications but could never handle an OS. Terje -- - <Terje.Mathisen@hda.hydro.com> "almost all programming can be viewed as an exercise in caching"Article: 46529
In article <3D73BD32.F3295793@myrealbox.com>, Robin KAY <komadori@myrealbox.com> wrote: >That is very much akin to what they did do. The Crusoe core is 4 issue VLIW >implementation that runs the "Code Morphing" software loaded from a Flash-ROM >on the mainboard. The software takes care of code translation, branch >prediction, register renaming, and instruction reordering largely in software >although the Crusoe does have some specialized hardware to speed this up. The problem is by putting the compiler in the inner loop, you have a pretty severe penalty for misses in the translation cache. As such, they were going for performance and lost (it's hard to statically schedule for a 4 issue VLIW when translating from x86, and the in order design is so penalized on the minor cache misses). Since they failed on performance, they tried to make claims about power. The problem is, this puts them directly up against Intel's fabrication facilities. If ultra low power x86 really mattered, intel would process shrink the Pentium core. -- Nicholas C. Weaver nweaver@cs.berkeley.eduArticle: 46530
In article <al0f65$evv$1@vkhdsu24.hda.hydro.com>, Terje Mathisen <terje.mathisen@hda.hydro.com> wrote: >The pure sw emulation approach was tried with DECs FX-32 (sp?), it >worked OK for applications but could never handle an OS. Well, it handled MOST of the OS, IIRC. I seem to recall the portable NT relying very heavily on a 486 emulator. -- Nicholas C. Weaver nweaver@cs.berkeley.eduArticle: 46531
In article <al08lg$1litsf$1@ID-84877.news.dfncis.de>, Falk Brunner <Falk.Brunner@gmx.de> wrote: >"Kunal" <kundi_forever@yahoo.com> schrieb im Newsbeitrag >news:cd714c44.0209020625.5b892675@posting.google.com... > >[Code morphing in microprocessors and FPGAs ] > >I never understood what the hell is the advantage of putting some kind of >realtime compiler into expensive silliy-cone? Wouldnt those Transmeta guys >be much smarter (and nowadays much richer) if the had done a nice optimiced >RISC or whatever CPU and wrote a "simple" piece of software "aka translator, >compiler whatever) to translate a x86 code to native RISC just before >executing it, then load the RISC-code into RAM and execute it.? Anyone can >enlighten me? Effectively, that is what they do. N decades of experience is that typical application-level ISAs are ghastly for programming fast emulators in, so what you want is a RISC ISA specifically designed for executing emulators. Then you put the emulator into firmware as a mixture of a just-in-time compiler and run-time emulation. Firmware/microcode/whatever is cheap silicon! This approach is in common to both the Crusoe and Pentium 4; they have made different choices of detail, but they have a lot in common. As most readers will remember, I believe that this approach is grossly underutilised, and could be used (for example) to produce much more streamlined arithmetic units (and hence more and faster ones). Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1@cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679Article: 46532
In article <al0goq$1mp$1@pegasus.csx.cam.ac.uk>, Nick Maclaren <nmm1@cus.cam.ac.uk> wrote: >Effectively, that is what they do. N decades of experience is that >typical application-level ISAs are ghastly for programming fast >emulators in, so what you want is a RISC ISA specifically designed >for executing emulators. Then you put the emulator into firmware >as a mixture of a just-in-time compiler and run-time emulation. >Firmware/microcode/whatever is cheap silicon! Effectively all performance oriented x86s do translation to an internal representation. AMD athlon's require 3 pipeline stages, IIRC. But both AMD and Intel are translating to out of order cores, while transmeta is inorder. Since so many x86 instructions touch memory, this can hurt transmeta. Also, intel, by cacheing the translations, decided they could make the tranlater slower (1 op/cycle as opposed to 3), which magnifies the icache miss penalty. IN retrospect, probably not the right decision. -- Nicholas C. Weaver nweaver@cs.berkeley.eduArticle: 46533
nweaver@ribbit.CS.Berkeley.EDU (Nicholas C. Weaver) wrote >Since they failed on performance, they tried to make claims about >power. The problem is, this puts them directly up against Intel's >fabrication facilities. If ultra low power x86 really mattered, intel >would process shrink the Pentium core. It appears that the Transmeta laptops don't offer significantly better battery life - when you compare MHz for MHz. I have done a fair bit of very low power ASIC development work, and it amazes me that, after all these years of CMOS, we still have a "CMOS" CPU, in a PC which 99% of the time does NOTHING (waits for a keystroke) and yet the damn thing still dissipates several WATTS!! I expect these CPUs are not anywhere near fully static CMOS, but ... Peter. -- Return address is invalid to help stop junk mail. E-mail replies to zX80@digiYserve.com but remove the X and the Y. Please do NOT copy usenet posts to email - it is NOT necessary.Article: 46534
In article <98i7nu83nln4ge0dbrdmimda24074kqa8g@4ax.com>, Peter <z80@ds1.com> wrote: >>Since they failed on performance, they tried to make claims about >>power. The problem is, this puts them directly up against Intel's >>fabrication facilities. If ultra low power x86 really mattered, intel >>would process shrink the Pentium core. > >It appears that the Transmeta laptops don't offer significantly better >battery life - when you compare MHz for MHz. They do have some better power management (better than Intel's "Fast or Slow" model in the P3 low power core), but then again, if low power performance MATTERED in the x86 world, you do what StrongARM did: Build a fully static, in order, processor, and process shrink it as much as you can. Intel has such great fab technology that you could probably do a 1-2 W, 1 GHz x86 that way. Also, Amdahl's law keeps Transmeta down, only about 1/3 of the notebook power budget is to the processor. >I have done a fair bit of very low power ASIC development work, and it >amazes me that, after all these years of CMOS, we still have a "CMOS" >CPU, in a PC which 99% of the time does NOTHING (waits for a >keystroke) and yet the damn thing still dissipates several WATTS!! >I expect these CPUs are not anywhere near fully static CMOS, but ... Nowhere even close. :) It's CMOS only in transitior types, not design methodology. -- Nicholas C. Weaver nweaver@cs.berkeley.eduArticle: 46535
Hi, Has anyone experience with C++ to Verilog/VHDL convertors? I thought that as a C++ programmer and a total FPGA newbie with just a minimum of digital design experience, this avenue would be very interesting, especially because I seek to implement a medium-complex software algorithm into a FPGA, and this algorithm will be subject to gradual refinement. I specifically would like to know: 1) Price 2) Capability ( in terms of supported C/C++) 3) Capability ( in terms of what it can produce) 4) Efficiency in using the FPGA resources. I already looked at those products: http://www.forteds.com/products/cynthesizer.html http://www.synopsys.com/products/cocentric_systemC/cocentric_systemC.html But they seem incredibly expensive and it is very hard to judge the capabilities from their websites. Thanks for any enlightenment, Frank de Groot, OsloArticle: 46536
OOPS. <sigh> I guess that's that then... "C2Verilog 2.0 is available now on Unix and Windows platforms starting at $75,000." Frank "Frank Andreas de Groot" <nospam@nospam.com> wrote in message news:DLQc9.17039$sR2.303475@news4.ulv.nextra.no... > I specifically would like to know: > > 1) PriceArticle: 46537
"Mike Treseler" <tres@tc.fluke.com> wrote in message news:3D73AEC0.6010508@tc.fluke.com... > jan wrote: > > > Hello. > > > The reg statement will probably syntehsize into RAM on the FPGA, the > > initial statement will not synthesize at all ( I'm using Leonardo > > targeting Altera's FLEX devices). > > > Initialization does not work for synthesis. > Use a reset input. > > -- Mike Treseler Yeah, you probably want something like: reg [11:0] sR; always @(posedge clk) begin if (reset) sR[11:0] <= 12'h6AD; else sR[11:0] <= { sR[0], sR[11:1] }; end That will rotate the data one bit each clock. If you want to pad with zeroes, use { 1'b0, sR[11:1] } . (Real hardware guys don't use "shift". ) :-) -StanArticle: 46538
In general the fastest logic will only have a simple equation between flip flop stages. -Stan "Arguo" <s9323090@cc.ncu.edu.tw> wrote in message news:52758910.0209020319.70d66511@posting.google.com... > I designed an SDRAM controller by Verilog using in Altera APEX20KE. > But the timing analyser report tell me, the crtical path is about 15ns. > I think that my Verilog module design is not good. Maybe too combine logic. > Which book or reference can tell me how to do design high-speed logic? > > Best RegardsArticle: 46539
On 2 Sep 2002 07:25:59 -0700, kundi_forever@yahoo.com (Kunal) wrote: >A pretty challenging VLSI project actually, possible too. Here's how I >think it may work: >This Code Morphing (CM) chip would be placed on the bus in between the >target uC and the code memory (ROM, flash wotever). It would route the >addresses generated by the uC to the code memory, and translate the >returned contents into hex code of the target uC, and send the >translated version back to the uC. This is pretty much what the JVM >does, but this virtual machine is a HARDWARE virutal machine, i.e. the >mapping between various instruction sets is HARD-WIRED. This topic is extensively covered in the following book: Interpretation and Instruction Path Coprocessing Eddy H. Debaere and Jan M. van Campenhout MIT Press, 1990 ISBN 0-262-04107-3 Several languages are discussed. Stephen -- Stephen Pelc, sfp@mpeltd.demon.co.uk MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeltd.demon.co.uk - free VFX Forth downloadsArticle: 46540
Frank Andreas de Groot wrote: > > Hi, > > Has anyone experience with C++ to Verilog/VHDL convertors? > I thought that as a C++ programmer and a total FPGA newbie with just a > minimum of > digital design experience, this avenue would be very interesting, especially > because > I seek to implement a medium-complex software algorithm into a FPGA, and > this algorithm will be subject to gradual refinement. C/C++, and other sequential/procedural languages are 'one thing at a time/one place at a time' languages, and so do not map/utilize well onto FPGA resource. You need to look carefully at your medium-complex software algorithm, and decide if it can benefit from parallel execution. If not, then a procedural language Compiler to a Soft Core running on a FPGA would be a valid path. Soft cores are improving all the time, and I believe you can now add you own opcodes - this allows some 'subroutine function -> HW replacement' If the algorithm can benefit from parallel execution, then look to use a SW langauge that is implicitly parallel to code it. If VHDL does not look suitable, a new language (AsmL) that I think has good promise to blur the Sw/HW boundary is at http://www.research.microsoft.com/fse/asml/overview.html This is free, and you could use it to experiment with. A softcore that swallows .NET bytecodes is probably not far off... - jgArticle: 46541
Too many of these shift elements and the resources get a bit ugly for a distributed register scheme; the approach is fine for a few coefficients and easy to understand from a code-support standpoint. If you want many fixed values, you should be able to appease both simulation and synthesis by generating lookup tables. Rather than using shift registers which would need discrete registers for each bit in most of the Altera families (I think the Stratix has shift register memory elements), use a ROM type approach where you should end up with only one LE per coefficient. By generating your own bit index and defining the values in a way that both simulation and synthesis can use, the whole task folds together nicely. The Index is input to each LE and the single registered output bit is your result. I haven't tested the code but I think it makes sense. reg [11:0] Coeff [2:0]; reg [ 3:0] Index; wire [ 4:0] next_index; integer i; reg [2:0] SRout; always begin Coeff[0] = 12'b011010101101; Coeff[1] = 12'h1a5; Coeff[2] = 12'd3192; end always @(posedge clock) begin if( next_index[4] ) Index <= 4'd11; // if next index is -1, reload else Index <= next_index; for( i=0; i<3; i=i+1 ) SRout[i] <= Coeff[i] >> Index; end assign next_index = Index - 1; Real hardware guys already have their shift together. - John_H Stan wrote: > "Mike Treseler" <tres@tc.fluke.com> wrote in message > news:3D73AEC0.6010508@tc.fluke.com... > >>jan wrote: >> >> >>>Hello. >>> >>>The reg statement will probably syntehsize into RAM on the FPGA, the >>>initial statement will not synthesize at all ( I'm using Leonardo >>>targeting Altera's FLEX devices). >>> >> >>Initialization does not work for synthesis. >>Use a reset input. >> >> -- Mike Treseler >> > > Yeah, you probably want something like: > > reg [11:0] sR; > > always @(posedge clk) begin > if (reset) sR[11:0] <= 12'h6AD; > else sR[11:0] <= { sR[0], sR[11:1] }; > end > > That will rotate the data one bit each clock. If you want to pad with > zeroes, use { 1'b0, sR[11:1] } . > (Real hardware guys don't use "shift". ) :-) > > -Stan > > > >Article: 46542
LOL Yes, I understand that it's a waste of hardware opportunities to code in C, and have it translated to a crappy Verilog equivalent... In the FPGA should be something like a 64-bit tri-state counter that gets an initial value from a PCI bus, and a pattern recognizer that looks at the output and advances the counter when it sees certain pattern combinations. All in all quite complex I'm afraid, too complex for a newbie. To write it in C though, would not be such a big problem at all for me. I would spend months testing and debugging the C, but then at least I'd have a (slow and inefficient) Verilog representation that I can refine, analize the hardware equivalent etc. I will look into your suggestions. I have seen exactly what I need, and it seems to be available for free too, but they don't offer it for download any more, it's called CTOV from Tenison Tech: http://www.tenisontech.com/products/ctov-page.html Very sad... But maybe I should just stop whining and learn how to do it the right way... Frank "Jim Granville" <jim.granville@designtools.co.nz> wrote in message news:3D73E8D3.279B@designtools.co.nz... > > If VHDL does not look suitable, a new language (AsmL) that I think > has good promise to blur the Sw/HW boundary is at > > > This is free, and you could use it to experiment with. > > A softcore that swallows .NET bytecodes is probably not far off... > > - jgArticle: 46543
"Josh Model" <model@ll.mit.edu> wrote in message news:<wkc89.44$I7.3516@llslave.llan.ll.mit.edu>... > Has anyone come across any 3rd party prototype boards for Actel FPGA's? > It seems as if Actel's stuff starts at ~$1k, and I was looking for one > closer to the $500 range. > > Thanks, > --Josh QuickCores offers a low cost IP Delivery System based on Actel's new ProASIC+. Prices start at $175 for the APA075. It's all self-contained in that no external device programmer is required. It also includes a built-in JTAG boundary scan controller and built-in JTAG real-time debug controller for microcontroller designs. No JTAG pod is required since everything is done via RS-232 using Actel Libero-generated STAPL files. It's all packaged in a 28-pin postage stamp form factor for easy prototyping. It's called the Musketeer (All-for-One Stamp). Visit www.quickcores.com for info. Regards, JerryArticle: 46544
Jerry D. Harthcock wrote: > > "Josh Model" <model@ll.mit.edu> wrote in message news:<wkc89.44$I7.3516@llslave.llan.ll.mit.edu>... > > Has anyone come across any 3rd party prototype boards for Actel FPGA's? > > It seems as if Actel's stuff starts at ~$1k, and I was looking for one > > closer to the $500 range. > > > > Thanks, > > --Josh > > QuickCores offers a low cost IP Delivery System based on Actel's new > ProASIC+. > Prices start at $175 for the APA075. It's all self-contained in that > no external device programmer is required. It also includes a > built-in JTAG boundary scan controller and built-in JTAG real-time > debug controller for microcontroller designs. No JTAG pod is required > since everything is done via RS-232 using Actel Libero-generated STAPL > files. > > It's all packaged in a 28-pin postage stamp form factor for easy > prototyping. It's called the Musketeer (All-for-One Stamp). > > Visit www.quickcores.com for info. Interesting lineup. What if a designer wants to mix some FPGA HW design, with one of your soft-cores ? - how is that done ? Missing from the web, is any speed info on these cores ? - jg -- ======= 80x51 Tools & IP Specialists ========= = http://www.DesignTools.co.nzArticle: 46545
"Holger Venus" <Holger.Venus@dlr.de> wrote in message news:<aji8kt$1ai94m$1@ID-143082.news.dfncis.de>... > Hi all, > does any body has experience with embedded processor IP cores in FPGA? . . > Thanks for any (related) comment, > > Holger Venus QuickCores specializes in FPGA-embeddable microcontroller cores such as 68HC05, 16C5x and proprietary 9-bit RISC. At the moment, only synthesized in Actel ProASIC+ and QuickLogic. Includes JTAG real-time debug module. You can download ref. designs (STAPL format) for free at www.quickcores.com. You can also take a look at opencores.org website: www.opencores.org JerryArticle: 46546
"Nicholas C. Weaver" <nweaver@ribbit.CS.Berkeley.EDU> wrote in message news:al0fks$in9$2@agate.berkeley.edu... > In article <al0f65$evv$1@vkhdsu24.hda.hydro.com>, > Terje Mathisen <terje.mathisen@hda.hydro.com> wrote: > >The pure sw emulation approach was tried with DECs FX-32 (sp?), it > >worked OK for applications but could never handle an OS. > > Well, it handled MOST of the OS, IIRC. I seem to recall the portable > NT relying very heavily on a 486 emulator. > -- And the first versions of MacOS for the early PowerMacs emulated most of the 68k OS Code on the PPC.Article: 46547
Hi all ! I would like to know what's utility of these differents buffers and differences between them. I've undertsood this: - when I want to input CLK signal in my fpga, I must to use IBUFG, but why ? - when I want to map an output of a DLL on other process or on other DLL, I must to use BUFG. But why ? Thanks a lot, -- LaurentArticle: 46548
Hello, > I have two data buses running at two different but constant clock... Have a look at http://www.sunburst-design.com/papers/ There are some papers regarding this topic, namely: - Synthesis and Scripting Techniques for Designing Multi-Asynchronous Clock Designs And if you finally need a FIFO to do it: - Simulation and Synthesis Techniques for Asynchronous FIFO Design - Simulation and Synthesis Techniques for Asynchronous FIFO Design with Asynchronous Pointer Comparisons. The other papers in this website are very recommendable as well. Regards, Paul.Article: 46549
IT often pays better which isn't offensive ;-) -- ------------------ Hans Summers http://www.HansSummers.Com "Duy K Do" <duy_do@angelfire.com> wrote in message news:3ecf0f2c.0209020735.32af1a34@posting.google.com... > Do you get offended if someone label you as IT consultant? > Sadly a lot of talented engineering graduates ended up in IT department. > > Cheers! > > Duy K Do
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z