Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
"Nicholas C. Weaver" <nweaver@ribbit.CS.Berkeley.EDU> wrote in message news:al0fks$in9$2@agate.berkeley.edu... > In article <al0f65$evv$1@vkhdsu24.hda.hydro.com>, > Terje Mathisen <terje.mathisen@hda.hydro.com> wrote: > >The pure sw emulation approach was tried with DECs FX-32 (sp?), it > >worked OK for applications but could never handle an OS. > > Well, it handled MOST of the OS, IIRC. I seem to recall the portable > NT relying very heavily on a 486 emulator. No, FX!32 was for user-mode code only. What is "the portable NT"? The Alpha port certainly had no reliance at all on a 486 emulator. PJDM -- Peter Mayne IBM GSA (but the opinions are mine) Canberra, ACT, Australia May contain traces of nuts.Article: 46551
For nails you use a hammer and for screws you take a screwdriver. Likewise are the tools for FPGAs. VHDL/Verilog wasn't invented because the C/C++ was unknown, I just doesn't fit. Rene -- Ing.Buero R.Tschaggelar - http://www.ibrtses.com & commercial newsgroups - http://www.talkto.net Frank Andreas de Groot wrote: > Hi, > > Has anyone experience with C++ to Verilog/VHDL convertors? > I thought that as a C++ programmer and a total FPGA newbie with just a > minimum of > digital design experience, this avenue would be very interesting, especially > because > I seek to implement a medium-complex software algorithm into a FPGA, and > this algorithm will be subject to gradual refinement.Article: 46552
> IMHO the biggest problem is the PC (program counter) mapping. >.... > Another thought: > Using variable length replacements is very difficult, because you have to > analyse the whole program and make a PC translation table. I wouldn't do > that. I think you can avoid that. When you can have translate opcodes at run-time, you can also translate the PC. See, the foreign language program would have been compiled/assembled with the foreign processor in mind, so all jumps would be relative. It would be similar to the Logical Address/ Physical Address scenario in modern memory management schemes. The CPU generate the address for the next instruction, the HW Code Morpher can simply add the byte length of the previous instruction to get the address of the next instruction. Which brings to mind another possible advantage.... all JUMP statements can be directly processed within the CM chip wihout having to reach the uC/uP. Imagine the H/W VM recieves an instruction saying "JUMP 0x001000" now it won't have to translate the instruction, it can directly generate and address 0x001000 to the code memory and begin translating and routing the instructions recieved from the new "virtual PC" address This will not work well with conditional jumps of course, but it can save some overhead of the routing for unconditional jum instructions. But when an instruction saying JUMP 0x0100 is parsed to the uC, the VM can be guaranteed that the next address generated by the uC *will* be 0x0100. We are basically abstracting the uC from the actual (physical) addresses within the program ROM. Conversely, there can be a reverse-mapping for special addresses and interrupt vectors... i.e if the CPU-generated address is a known interrupt-vector, the VM will pass the foreign-code equivalent of that vector to the program ROM. This is why I think the register translation would be most difficult. If the target processor simply doesn't have enough onchip RAM to simulate all that used in the foreign code, it will be a huge problem. But then I think it should't be a problem for modern processors. All of them have plenty of RAM! :D > > The straight forward approach is to simulate every virtual instruction with > the same number of real instructions. If the processor you are emulating > has multiplication builtin in and your micro hasn't that can be quite a lot. > You would do a jump to skip the unused part in simpler instructions. > - This is easy to implement only needs a EPROM as lookup-table > - This has overhead of a jump for almost all instructions > - This wastes much of the available address space > > Second approach: > Use a call for each instruction > - almost as simple to implement > - greater overhead (call/ret) per instruction > - maximum usage of address space > - simple instructions may not need a call > > Third approach: > Mix both. Do all commands that are quite short (perhaps 4 words) directly. > Do a call for the rest. > - still doable with a few standard logic parts (74xxx) .... What you are discussing seems to be a dedicated processor that implements all this. What I was considering was having it ALL implemented in VLSI... i.e the only code that would run would be microcode, at most. So the translation should occur almost within the space of time it normally takes for the uC to retrieve code without a VM in between...the tranlsation maybe triggered by the Read/Write signals on the bus. The foreign language code is translated to native code outside the target processor. Actually having a dedicated processor sitting in between the target processor and foreign language code memory would be the most flexible, and probably practical way to achieve this. It would allow the loading of multiple translations and mappings. > The hardware VM could be faster if most instructions can be simulated > by one or two instrauctions. But remember the wait states you have to add > for the translation logic. I think the optimum design would be a sort of highly-specialized, asynchronous microcontroller... one where the address and instruction mapping is instantaneous through digital logic, yet there might be some program code that can be triggered and executed during certain contingencies where simple mapping is not enough. This way translation latencies can be minimized to the point where they are infrequent at worst. > I think you are better of with a software VM: > - you can use uC without external program memory (more choices) > - you can run all code from internal memory (more speed) > - you can use a chip with harvard architecture (more speed) Well an external H/W VM doesn't make good sense if all code memory resides inside the uC itself. I mean you are basically getting it translated from an external source, and that is just extra overhead. My concept applies to uC/uPs where the code memory lies in a separate ROM chip...so the program can be translated en route from the mem chip to the processor on the fly. > > At last: Would it be cheaper to buy the hardware VM or to buy a faster > processor? That would depend on the complexity and development costs of the hardware VM. How complex could it be? Most of the onchip space would be dedicated to a huge bunch of registers to contain the target code, the foreign code(s) and the code mapping(s) - unless it is all hardwired for higher speed and zero flexibility. The mapping logic can be quite easily done with simple digital components. The contingencies I spoke of will require some more thought... maybe it can all be done on a SOC. If we talk about that, we can come up with a cost estimate. Then we can think of its possible applications, come up with some an idea of demand, and might be able to decide if its worthwhile to market. > Still it would be a great fun project. Ya I think so too ;D If I had any familiarity wth VHDL etc, I'd try to simulate something for two disparate RISC instruction sets. What I need to know is more possible applications of this, and some sort of feasibility analysis. Any ideas? kundiArticle: 46553
Falk Brunner wrote: > I never understood what the hell is the advantage of putting some kind of > realtime compiler into expensive silliy-cone? Wouldnt those Transmeta guys > be much smarter (and nowadays much richer) if the had done a nice > optimiced RISC or whatever CPU and wrote a "simple" piece of software "aka > translator, compiler whatever) to translate a x86 code to native RISC just > before executing it, then load the RISC-code into RAM and execute it.? > Anyone can enlighten me? That's what they do. I think that's a tradeoff issue. Crusoe has a software translator, which is slow (a number of cycles per translation), does a lot of optimization work, and caches a lot in RAM. Pentium 4 has a hardware translator, which is somewhat slow (1 cycle per translation), does no optimization work, and caches a L1 portion worth in the trace chache. Athlon, Hammer, and Pentium 3 have a hardware translator, which is fast (3 translations per cycle), doesn't optimize (rumors about the Hammer optimizing a little haven't been confirmed), and doesn't cache anything. I think the Pentium 4 approach should have been taken a bit further, towards more optimization. The P4 contains a estimation based scheduler that queues dependent instructions into its very fast in-order execution engine, and restarts (or reschedules) these dependent instruction if the calculation was wrong (e.g. due to a cache miss for a memory argument), making it effectively work like an OoOE engine, though it isn't. This stuff could have gone to the trace cache in this optimized form, removing a number of pipeline stages from the design. The mistake of TM is that as long as you do simple benchmarks, everything goes well, because you have only a few core routines that get executed very often, and therefore, the translation overhead is small. When you run real-world applications, things change. When you run benchmarks like BapCo (which execute one piece of code after the other), performance goes down even further. That's also a problem for the Pentium 4, but not nearly as much. -- Bernd Paysan "If you want it done right, you have to do it yourself" http://www.jwdt.com/~paysan/Article: 46554
Duy K Do wrote: > > Do you get offended if someone label you as IT consultant? It's the word 'consultant' I shy clear of. As someone said a 'consultant' is someone who borrows your watch to tell you the time. As a contracting engineer I prefer to think I get paid for some work. Nial.Article: 46555
Hi The issue is not that simple. I suggest that you take a look at some of our papers at http://www.es.isy.liu.se/publications/index.html Best regards Lasse In article <b0ab35d4.0207280653.5ebb19b8@posting.google.com>, hristo <hristostev@yahoo.com> wrote: > hello, > may be basic question > if someone has to implement an FIR using bit serial, he has to see the > output wordlength, thus the FIR bit growth. Then, he needs to expand the > input data with zero to have regular wordlength through the structure > > in parrallel we have not to do that > > what about digit serial, should we still need to expand the input data > with zero digits > > Many thanksArticle: 46556
In article <cd714c44.0209020625.5b892675@posting.google.com>, Kunal <kundi_forever@yahoo.com> wrote: > I read about Transmeta's Crusoe chip some time back, which has > something called the Code Morphing Software. Basically, the code morphing software is a combination interpreter and JIT recompiler for the IA32 instruction set to the high perfromance hardware-specific instruction set the processor itself uses. It's very much like the front end recompilation hardware in just about every high performance IA32 chip since, oh, maybe as far back as the 486 with its much-touted "RISC core", except by doing it in software they can build a chip that's *almost* as fast as the hardware equivalent with a lot less silicon (and hence power consumption). > Ok here's an idea... how about code-morphing HARDWARE? Congratulations, you just re-invented the last ten years of IA32 processor design. :) -- I've seen things you people can't imagine. Chimneysweeps on fire over the roofs of London. I've watched kite-strings glitter in the sun at Hyde Park Gate. All these things will be lost in time, like chalk-paintings in the rain. `-_-' Time for your nap. | Peter da Silva | Har du kramat din varg, idag? 'U`Article: 46557
In article <al08lg$1litsf$1@id-84877.news.dfncis.de>, Falk Brunner <Falk.Brunner@gmx.de> wrote: > I never understood what the hell is the advantage of putting some kind of > realtime compiler into expensive silliy-cone? Wouldnt those Transmeta guys > be much smarter (and nowadays much richer) if the had done a nice optimiced > RISC or whatever CPU and wrote a "simple" piece of software "aka translator, > compiler whatever) to translate a x86 code to native RISC just before > executing it, then load the RISC-code into RAM and execute it.? What a brilliant idea. You could call it Code Morphing. -- I've seen things you people can't imagine. Chimneysweeps on fire over the roofs of London. I've watched kite-strings glitter in the sun at Hyde Park Gate. All these things will be lost in time, like chalk-paintings in the rain. `-_-' Time for your nap. | Peter da Silva | Har du kramat din varg, idag? 'U`Article: 46558
"BROTO Laurent" <lbroto@free.fr> schrieb im Newsbeitrag news:3d746c4f$0$573$626a54ce@news.free.fr... > Hi all ! > > I would like to know what's utility of these differents buffers and > differences between them. > I've undertsood this: > - when I want to input CLK signal in my fpga, I must to use IBUFG, but why > ? Because IBUFGs are special clock input buffers, that have a very short and predictable (= short delay) connection to a DLL. This is essential for clock management at high frequencies. > - when I want to map an output of a DLL on other process or on other DLL, I > must to use BUFG. But why ? Similar to IBUFG. BUFG are global clock buffers, that (in 99.9% of all cases) feed the clock inputs of the FlipFlops and RAMs on your design, which can be some hundred to some then thousand. Again, to do clock management at high frequencies, you need predictical timing, which can only be achieved by using BUFGs (Iam not talking about hacking and manual routing stuff) -- MfG FalkArticle: 46559
There are some times when hand-coded MUXCY primitives use the XB or YB outputs but the most common use is in comparators. If you have an if(a>b) construct, the logic that follows is conditional on the a>b result which is often realized with a carry chain, the output going through XB or YB. Many designers even use the comparison in simple counters suggesting the use is pervasive: if( count>28 ) count<=0; else count<=count+1; If you've looked into the routing level implementation of the Xilinx devices (the options available in FPGA_EDITOR) you'll see that the XB and YB outputs have an initial routing path that takes them along any of 8 lines out of the CLB, the same paths available to the X, Y, XQ, and YQ outputs. I recall that these signals don't have all 8 outputs available for both outputs (XB/YB) for both slices, but have most of that first layer routing available (6 out of 8). The FPGA Editor is the best way for me to figure out where to place critical elements for best routing utilization - it should help you figure out some details as well. "Nicholas C. Weaver" wrote: > For my research purposes, I'm considering what effects a corner > turning interconnect has, by comparing apples to apples with Virtex > family interconnect (long story). > > One disadvantage is that less frequently used inputs and outputs cost > relatively more (since inputs and outputs connect to ALL > possibilities, its a different logical depopulation). So I want to, > in my comparisons, remove a couple of outputs for modeling my logic > block. > > So the question is, how often and WHY are the carry chains driven to > the XB and YB outputs. According to the slice internals, they are > only capable of being driven by the carry chain (XB) or carry chain or > routhrough (YB). > > What logic tends to use these outputs beyond the top carry out? > -- > Nicholas C. Weaver nweaver@cs.berkeley.eduArticle: 46560
Haha, you have too many "whys", for my understands, clk net is critical so it has special hardwares dedicated to it as IBUFG, BUFG, and the "golden" wires in FGPA, (hehe I ain't sure it's golden or not), anyway it's kind of expensive stuffs and we can't afford to use them everywhere in FPGA. May be someone else have better answer than mine. Regard.Article: 46561
Hi Laurent, An IBUFG is very similar to an IBUF, but it's located in a global clock input IOB. These are the IOBs that are well located and connected to your GCLK pins externally. The IBUFG is not meant to buffer a signal for high fanout, however, it's there for connecting to either the DLL or directly to the BUFG. The BUFG is not an I/O buffer like the IBUF and IBUFG, it's located outside the IOB on its own. It IS meant to buffer high fanout clock signals and keep a clock signal on the global clock routing. Have a look in FPGA Editor for the design that you've alluded to here (any design with some clocks that are properly buffered), it might help to explain the locations and usefulness of the various buffers. Cheers, Ryan BROTO Laurent wrote: > Hi all ! > > I would like to know what's utility of these differents buffers and > differences between them. > > I've undertsood this: > - when I want to input CLK signal in my fpga, I must to use IBUFG, but why > ? > - when I want to map an output of a DLL on other process or on other DLL, I > must to use BUFG. But why ? > > Thanks a lot, > > -- > > Laurent > >Article: 46562
In article <al08lg$1litsf$1@ID-84877.news.dfncis.de>, Falk Brunner <Falk.Brunner@gmx.de> wrote: >"Kunal" <kundi_forever@yahoo.com> schrieb im Newsbeitrag >news:cd714c44.0209020625.5b892675@posting.google.com... > >[Code morphing in microprocessors and FPGAs ] > >I never understood what the hell is the advantage of putting some kind of >realtime compiler into expensive silliy-cone? Wouldnt those Transmeta guys >be much smarter (and nowadays much richer) if the had done a nice optimiced >RISC or whatever CPU and wrote a "simple" piece of software "aka translator, >compiler whatever) to translate a x86 code to native RISC just before >executing it, then load the RISC-code into RAM and execute it.? Anyone can >enlighten me? >-- >MfG >Falk ---------------------- If they just want to download printable code, why don't they put in a simple in-silicon ASCII to code converter as an adjunct to the command register so that if you set it to read ASCII hex it reads it and executes it, that is nothing but a decoder!! Then people can just learn the hex numbers as commands! -Steve -- -Steve Walz rstevew@armory.com ftp://ftp.armory.com/pub/user/rstevew Electronics Site!! 1000's of Files and Dirs!! With Schematics Galore!! http://www.armory.com/~rstevew or http://www.armory.com/~rstevew/PublicArticle: 46563
why Xilinx does not make its own HDL synthesiser? why it has to use a thrid party? what it has opted for Forge for example?Article: 46564
In article <3d74d981$0$79564$8eec23a@newsreader.tycho.net>, Richard Steven Walz <rstevew@deeptht.armory.com> wrote: >In article <al08lg$1litsf$1@ID-84877.news.dfncis.de>, >Falk Brunner <Falk.Brunner@gmx.de> wrote: >>"Kunal" <kundi_forever@yahoo.com> schrieb im Newsbeitrag >>news:cd714c44.0209020625.5b892675@posting.google.com... >> >>[Code morphing in microprocessors and FPGAs ] >> >>I never understood what the hell is the advantage of putting some kind of >>realtime compiler into expensive silliy-cone? Wouldnt those Transmeta guys >>be much smarter (and nowadays much richer) if the had done a nice optimiced >>RISC or whatever CPU and wrote a "simple" piece of software "aka translator, >>compiler whatever) to translate a x86 code to native RISC just before >>executing it, then load the RISC-code into RAM and execute it.? Anyone can >>enlighten me? >>-- >>MfG >>Falk >---------------------- >If they just want to download printable code, why don't they put in a >simple in-silicon ASCII to code converter as an adjunct to the command >register so that if you set it to read ASCII hex it reads it and executes >it, that is nothing but a decoder!! Then people can just learn the hex >numbers as commands! ----------------------- Nevermind, that's best done in the display anyway, how silly. -Steve -- -Steve Walz rstevew@armory.com ftp://ftp.armory.com/pub/user/rstevew Electronics Site!! 1000's of Files and Dirs!! With Schematics Galore!! http://www.armory.com/~rstevew or http://www.armory.com/~rstevew/PublicArticle: 46565
In article <3D74D5B6.3356282B@mail.com>, John_H <johnhandwork@mail.com> wrote: >There are some times when hand-coded MUXCY primitives use the XB or YB outputs >but the most common use is in comparators. If you have an if(a>b) construct, >the logic that follows is conditional on the a>b result which is often >realized with a carry chain, the output going through XB or YB. Many >designers even use the comparison in simple counters suggesting the use is >pervasive: if( count>28 ) count<=0; else count<=count+1; OK. Let me clarify. How often is BOTH the XB and X, YB and Y used in the same logic block? -- Nicholas C. Weaver nweaver@cs.berkeley.eduArticle: 46566
They are 2 different things. It hard to convert an algorithm expressed in C into an efficient VHDL representation. It took a while before these tools were made, and they have their advantages & disadvantages. They produce slow, inefficient designs, but they do it extremely fast, however often you change the algorithm. Extremely complex algorithms that take a few hours to write can be translated to ten thousand lines of VHLD in a few minutes. I don't think Forge or Handel-C are 'screwdrivers to drive in nails'. CERN for example uses those tools, which cost up to 75,000 USD. There must be an economic/engineering justification for them in certain niches. And I think it will just be a matter of time before designing in Verilog or VHDL will be just as uncommon as programming in assembly. Frank "Rene Tschaggelar" <tschaggelar@dplanet.ch> wrote in message news:3D74938D.3070903@dplanet.ch... > For nails you use a hammer and for screws you take a screwdriver. > Likewise are the tools for FPGAs. VHDL/Verilog wasn't invented > because the C/C++ was unknown, I just doesn't fit.Article: 46567
Jim Granville <jim.granville@designtools.co.nz> wrote in message news:<3D740E06.3058@designtools.co.nz>... > Jerry D. Harthcock wrote: > > > > "Josh Model" <model@ll.mit.edu> wrote in message news:<wkc89.44$I7.3516@llslave.llan.ll.mit.edu>... > > > Has anyone come across any 3rd party prototype boards for Actel FPGA's? > > > It seems as if Actel's stuff starts at ~$1k, and I was looking for one > > > closer to the $500 range. > > > > > > Thanks, > > > --Josh > > > > QuickCores offers a low cost IP Delivery System based on Actel's new > > ProASIC+. > > Prices start at $175 for the APA075. It's all self-contained in that > > no external device programmer is required. It also includes a > > built-in JTAG boundary scan controller and built-in JTAG real-time > > debug controller for microcontroller designs. No JTAG pod is required > > since everything is done via RS-232 using Actel Libero-generated STAPL > > files. > > > > It's all packaged in a 28-pin postage stamp form factor for easy > > prototyping. It's called the Musketeer (All-for-One Stamp). > > > > Visit www.quickcores.com for info. > > Interesting lineup. > > What if a designer wants to mix some FPGA HW design, with one of > your soft-cores ? - how is that done ? > > Missing from the web, is any speed info on these cores ? > > - jg QuickCores offers the cores in synthesizable Verilog netlist format under separate license. Hook up is straightforward. You simply instantiate at the top level the CPU, memory, I/O, and whatever other modules you need for your application. We're working on an object oriented builder which will allow you to do this automatically. On the Musketeer, the ProASIC+ is fed with a 24.5 MHz clock (see data sheet at QC web)from the Musketeer's built-in "helper" micro. For the Q68HC05 soft core, this equates to 12.25 MIPs (single cycle instructions). If implemented in anti-fuse such as QuickLogic QuickDSP or Actel for example, it's about 2x that. JerryArticle: 46568
Well, The problem is worse. c, or c++ is a single thread, single process language. If a, then b, else c. A HDL can describe parallelism, concurrency, control, in such a way to provide a more optimal solution. Can you structure c code such that it can be converted more efficiently into gates and logic? Perhaps. But coding style is the one thing one can not enforce. For example, to try to use a c program for a DSP application that ran on a popular DSP uC, and retargeting it for an FPGA might be a real dissapointment (been there, done that). Since most DSP is developed from simulations using math simulators, it is far more efficient to convert the math simualtions to gates, rather than to use an inefficient intermediate language that was not even the source of the algorithm. The two leading "high level" languages being used for describing logic (system c, super verilog) have both been attempts to resolve this issue, and provide a higher level of abstraction. Recently, folks have found that each is better suited to some tasks at the exclusion of others, indicating the languages are still at too low a level of abstraction (don't solve all problems equally). Reminds me of Fortran and Cobol ..... ugly, nasty, hard to deal with ..... but the best we had at the time. Each was good for a specific area or problem. Working in MatLab is more like a high level solution, although it is also too specific, but far better than writing in c code, and expecting a massively parallel solution to somehow fall out. Other intersting work, such as http://ptolemy.eecs.berkeley.edu/ leads to a more interesting pardigm for systems design. A graphical gui "language" is perhaps the most efficient of all. If the simulation works, you can press a button, compile it (bitgen it), and ship it. Wouldn't that be heavenly? Austin Frank Andreas de Groot wrote: > They are 2 different things. > It hard to convert an algorithm expressed in C into an efficient VHDL > representation. > It took a while before these tools were made, and they have their advantages > & disadvantages. > They produce slow, inefficient designs, but they do it extremely fast, > however often you change the algorithm. > Extremely complex algorithms that take a few hours to write can be > translated to ten thousand lines of VHLD in a few minutes. > I don't think Forge or Handel-C are 'screwdrivers to drive in nails'. > CERN for example uses those tools, which cost up to 75,000 USD. > There must be an economic/engineering justification for them in certain > niches. > And I think it will just be a matter of time before designing in Verilog or > VHDL will be just as uncommon as programming in assembly. > > Frank > > "Rene Tschaggelar" <tschaggelar@dplanet.ch> wrote in message > news:3D74938D.3070903@dplanet.ch... > > For nails you use a hammer and for screws you take a screwdriver. > > Likewise are the tools for FPGAs. VHDL/Verilog wasn't invented > > because the C/C++ was unknown, I just doesn't fit.Article: 46569
"how often" I don't think anyone can say. The code I put up in the thread in this newsgroup "synthesizing hard coded numbers" (started Monday) includes a carry-out and a value in the same logic block. The applicable parts repeated here, reg [ 3:0] Index; wire [ 4:0] next_index; always @(posedge clock) if( next_index[4] ) Index <= 4'd11; // if next index is -1, reload else Index <= next_index; assign next_index = Index - 1; The next_index[4] bit should be the output of a MUXCY (YB) at the same point Index[3] is found (YQ). If the next_index were used live elsewhere rather than the registered version, all three outputs - Y, YQ, YB - would be used. If it can happen, it usually does. "Nicholas C. Weaver" wrote: > In article <3D74D5B6.3356282B@mail.com>, John_H <johnhandwork@mail.com> wrote: > >There are some times when hand-coded MUXCY primitives use the XB or YB outputs > >but the most common use is in comparators. If you have an if(a>b) construct, > >the logic that follows is conditional on the a>b result which is often > >realized with a carry chain, the output going through XB or YB. Many > >designers even use the comparison in simple counters suggesting the use is > >pervasive: if( count>28 ) count<=0; else count<=count+1; > > OK. Let me clarify. How often is BOTH the XB and X, YB and Y used in > the same logic block? > -- > Nicholas C. Weaver nweaver@cs.berkeley.eduArticle: 46570
Hi Kunal, Transmeta's "concept" is not new, though the appear to want people to believe differently. This "concept" has been around for well over 20 years, at least. The UCSD P system was of similar concept, it just did this with a software translation layer,, translating the P (pseudo) code into the machine instructions for the particular machine it was running on. Microprogramming is pretty much the same thing as well, which just so happens, is, implemented in hardware. Is what they have done of any real use? I really don't think so... They made wild claims about power savings that simply don't exist, as the CPU is not really a large chunk of the overall power budget of notebook computers. The same advances in power savings that occur in the storage/display etc. devices are applicable to ANY CPU, not just the Transmeta. I've always found their claims a bit dubious. Austin "Kunal" <kundi_forever@yahoo.com> wrote in message news:cd714c44.0209020625.5b892675@posting.google.com... > I read about Transmeta's Crusoe chip some time back, which has > something called the Code Morphing Software. This code morphing s/w > actually reads hex from its code memory, and at run-time translates > the hex code into equivalent native machine language instructions. So > the whole system itself is like a Java Virtual Machine (or a run-time > cross-assembler), only there is no partitioning between the H/W and > the system S/W. > > The whole thing is a overhead, of course, but its highly optimized and > parallelized in hardware wherever possible. Last I read, they had code > morphing software for 8086 instructions, i.e the Code Morphing > Software could only "understand" 86 hex. This system also allows you > to run programs compiled for different processors at the same time, > i.e it decides at run-time which instruction set is supported. > > Ok here's an idea... how about code-morphing HARDWARE? > > A pretty challenging VLSI project actually, possible too. Here's how I > think it may work: > This Code Morphing (CM) chip would be placed on the bus in between the > target uC and the code memory (ROM, flash wotever). It would route the > addresses generated by the uC to the code memory, and translate the > returned contents into hex code of the target uC, and send the > translated version back to the uC. This is pretty much what the JVM > does, but this virtual machine is a HARDWARE virutal machine, i.e. the > mapping between various instruction sets is HARD-WIRED. > > Ok, maybe we could make it more generic, and endow the CM chip with > large register sets and/or memory areas, which can be dynamically > loaded with the target and foreign instruction sets and the mapping > between them. In fact maybe later on we could add a number of > code-mappings onto a single device. Since all the translation happens > in hardware, there can be virtually no overheads (I think!). It will > be especially easy when dealing with similar instruction sets, like > CISC-to-CISC and RISC-to-RISC. Even if it is CISC-to-RISC, the > performance will not be truly affected, because it will simply replace > the CISC instruction with the equivalent CISC instructions, and may > actually end up saving code memory. Since we have software > cross-assemblers, it is conceivable that they can be implemented in > hardware. > > Of course, there are a LOT of issues here, and operation may be slowed > down slightly, but it IS possible. The biggest problem would be > mapping between specific registers, but we can leave that to the > application programmer or the source assembler / compiler. > > The applications of such a device would be very interesting indeed. A > code-morph for Java bytecode is only the beginning.... Backward > Compatibility will not be an issue anymore. This, I understand, is > keeping them from using all the features on the latest Intel chips. We > can load protocol-translation mappings to, transperently converting > from, say RS-232 to I2C (we already have hardware TCP/IP stacks). We > could port the hex code itself to other processors, instead of > re-writing the source code and re-compiling. Programmers can include > useful language features from other instructions sets without having > to worry about implementing them in the target processor code. > > Ok that's enough speculation for now, but could anyone well-versed in > VLSI design tell me how feasible this is? I don't think it will be > very difficult to implement, but the design of such a chip would be > very challenging. Also I need to know from experienced embedded > systems designers how truly useful such a device would be, and would > all the effort of developing it pay off, in terms of financial returns > and intellectual property rights. > > kundiArticle: 46571
"Austin Lesea" <austin.lesea@xilinx.com> wrote in message news:3D74F026.77355BAC@xilinx.com... > > c, or c++ is a single thread, single process language. The tools we are talking about extend a subset of C to include keywords for paralellism, and JAVA (Forge) has built-in mechanisms to work with threads. > For example, to try to use a c program for a DSP application that ran on a > popular DSP uC, and retargeting it for an FPGA might be a real dissapointment > (been there, done that). I don't doubt that. But some niche markets benefit greatly from a C/JAVA to HDL converter. I want to make a PCI addin card with a FPGA-based coprocessor for a massively parallel problem. As long as it approaches the speed of an equivalent implementation on an ordinary CPU, it is comercially justified. To replace a motherboard with a dual-Pentium for example would be more expensive for the customer, not to mention that most customers would not be able/willing to do that for the sake of my product. I think that for most purposes, a HDL will remain into the far future the method of choice to design ASIC's or FPGA's, but there is an emerging market that has much less strict demands for speed of execution as opposed to speed of implementation. It may be that in the future, there will be very clever optimizers for VHDL that can turn the stuff that comes out of a C --> VHDL converter into something efficient. Who knows what software improvements will bring us? A large library of VHDL used by such a converter, advanced optimization techniques etc. And there will be many directives that can be used to 'hint' the converter on what kind of hardware should be generated, constraints that can optionally be specified etc. Just because it's extremely hard to make such a tool doesn't mean that it won't be done... FrankArticle: 46572
> Also, Amdahl's law keeps Transmeta down, only about 1/3 of the > notebook power budget is to the processor. Having done a few power budget analysis for notebooks and other portable devices, the power budged allowance for the processor is quite a bit less, more like %8-%10 of the OVERALL power budget, and it's typically significantly less in fact, depending on the "typical" use ascribed to the notebook. You also have to add in the power requirements for the "code morphing" memory too AS part of the processor power budget, as it's only required FOR the Transmeta CPU. AustinArticle: 46573
Take a look at patent number 5,684,980 http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtm l/search-adv.htm&r=5&p=1&f=G&l=50&d=ft00&S1=(casselman.INZZ.+AND+virtual.ASN M.)&OS=in/casselman+AND+an/virtual&RS=(IN/casselman+AND+AN/virtual) I call it the runtime hardware generation patent... Steve Casselman > Ok here's an idea... how about code-morphing HARDWARE?Article: 46574
In fact, just this morning I added a StateCAD tutorial to the page. You can find it from the front page at http://tutor.al-williams.com
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z