Messages from 46550

Article: 46550
Subject: Re: Hardware Code Morphing?
From: "Peter Mayne" <Peter.Mayne@au1.ibm.com>
Date: Tue, 3 Sep 2002 19:27:39 +1000
Links: << >> << T >> << A >>

"Nicholas C. Weaver" <nweaver@ribbit.CS.Berkeley.EDU> wrote in message
news:al0fks$in9$2@agate.berkeley.edu...
> In article <al0f65$evv$1@vkhdsu24.hda.hydro.com>,
> Terje Mathisen  <terje.mathisen@hda.hydro.com> wrote:
> >The pure sw emulation approach was tried with DECs FX-32 (sp?), it
> >worked OK for applications but could never handle an OS.
>
> Well, it handled MOST of the OS, IIRC.  I seem to recall the portable
> NT relying very heavily on a 486 emulator.

No, FX!32 was for user-mode code only.

What is "the portable NT"? The Alpha port certainly had no reliance at all
on a 486 emulator.

PJDM
--
Peter Mayne
IBM GSA (but the opinions are mine)
Canberra, ACT, Australia
May contain traces of nuts.

Article: 46551
Subject: Re: C/C++ to Verilog/VHDL ?!
From: Rene Tschaggelar <tschaggelar@dplanet.ch>
Date: Tue, 03 Sep 2002 12:48:45 +0200
Links: << >> << T >> << A >>

For nails you use a hammer and for screws you take a screwdriver.
Likewise are the tools for FPGAs. VHDL/Verilog wasn't invented
because the C/C++ was unknown, I just doesn't fit.

Rene
-- 
Ing.Buero R.Tschaggelar - http://www.ibrtses.com
& commercial newsgroups - http://www.talkto.net


Frank Andreas de Groot wrote:
> Hi,
> 
> Has anyone experience with C++ to Verilog/VHDL convertors?
> I thought that as a C++ programmer and a total FPGA newbie with just a
> minimum of
> digital design experience, this avenue would be very interesting, especially
> because
> I seek to implement a medium-complex software algorithm into a FPGA, and
> this algorithm will be subject to gradual refinement.

Article: 46552
Subject: Re: Hardware Code Morphing?
From: kundi_forever@yahoo.com (Kunal)
Date: 3 Sep 2002 05:07:51 -0700
Links: << >> << T >> << A >>

> IMHO the biggest problem is the PC (program counter) mapping.
>....
> Another thought:
> Using variable length replacements is very difficult, because you have to 
> analyse the whole program and make a PC translation table. I wouldn't do 
> that.

I think you can avoid that. When you can have translate opcodes at
run-time, you can also translate the PC. See, the foreign language
program would have been compiled/assembled with the foreign processor
in mind, so all jumps would be relative. It would be similar to the
Logical Address/ Physical Address scenario in modern memory management
schemes. The CPU generate the address for the next instruction, the HW
Code Morpher can simply add the byte length of the previous
instruction to get the address of the next instruction.

Which brings to mind another possible advantage.... all JUMP
statements can be directly processed within the CM chip wihout having
to reach the uC/uP. Imagine the H/W VM recieves an instruction saying
"JUMP 0x001000" now it won't have to translate the instruction, it can
directly generate and address 0x001000 to the code memory and begin
translating and routing the instructions recieved from the new
"virtual PC" address

This will not work well with conditional jumps of course, but it can
save some overhead of the routing for unconditional jum instructions.
But when an instruction saying JUMP 0x0100 is parsed to the uC, the VM
can be guaranteed that the next address generated by the uC *will* be
0x0100. We are basically abstracting the uC from the actual (physical)
addresses within the program ROM. Conversely, there can be a
reverse-mapping for special addresses and interrupt vectors... i.e if
the CPU-generated address is a known interrupt-vector, the VM will
pass the foreign-code equivalent of that vector to the program ROM.

This is why I think the register translation would be most difficult.
If the target processor simply doesn't have enough onchip RAM to
simulate all that used in the foreign code, it will be a huge problem.
But then I think it should't be a problem for modern processors. All
of them have plenty of RAM! :D

> 
> The straight forward approach is to simulate every virtual instruction with 
> the same number of real instructions. If the processor you are emulating
> has multiplication builtin in and your micro hasn't that can be quite a lot.
> You would do a jump to skip the unused part in simpler instructions.
> - This is easy to implement only needs a EPROM as lookup-table
> - This has overhead of a jump for almost all instructions
> - This wastes much of the available address space
> 
> Second approach:
> Use a call for each instruction
> - almost as simple to implement
> - greater overhead (call/ret) per instruction
> - maximum usage of address space
> - simple instructions may not need a call
> 
> Third approach:
> Mix both. Do all commands that are quite short (perhaps 4 words) directly. 
> Do a call for the rest.
> - still doable with a few standard logic parts (74xxx)
....

What you are discussing seems to be a dedicated processor that
implements all this. What I was considering was having it ALL
implemented in VLSI... i.e the only code that would run would be
microcode, at most. So the translation should occur almost within the
space of time it normally takes for the uC to retrieve code without a
VM in between...the tranlsation maybe triggered by the Read/Write
signals on the bus. The foreign language code is translated to native
code outside the target processor.
 
Actually having a dedicated processor sitting in between the target
processor and foreign language code memory would be the most flexible,
and probably practical way to achieve this. It would allow the loading
of multiple translations and mappings.

> The hardware VM could be faster if most instructions can be simulated
> by one or two instrauctions. But remember the wait states you have to add 
> for the translation logic.

I think the optimum design would be a sort of highly-specialized,
asynchronous microcontroller... one where the address and instruction
mapping is instantaneous through digital logic, yet there might be
some program code that can be triggered and executed during certain
contingencies where simple mapping is not enough. This way translation
latencies can be minimized to the point where they are infrequent at
worst.

> I think you are better of with a software VM:
> - you can use uC without external program memory (more choices)
> - you can run all code from internal memory (more speed)
> - you can use a chip with harvard architecture (more speed)

Well an external H/W VM doesn't make good sense if all code memory
resides inside the uC itself. I mean you are basically getting it
translated from an external source, and that is just extra overhead.
My concept applies to uC/uPs where the code memory lies in a separate
ROM chip...so the program can be translated en route from the mem chip
to the processor on the fly.

> 
> At last: Would it be cheaper to buy the hardware VM or to buy a faster 
> processor?

That would depend on the complexity and development costs of the
hardware VM. How complex could it be? Most of the onchip space would
be dedicated to a huge bunch of registers to contain the target code,
the foreign code(s) and the code mapping(s) - unless it is all
hardwired for higher speed and zero flexibility. The mapping logic can
be quite easily done with simple digital components. The contingencies
I spoke of will require some more thought... maybe it can all be done
on a SOC. If we talk about that, we can come up with a cost estimate.
Then we can think of its possible applications, come up with some an
idea of demand, and might be able to decide if its worthwhile to
market.
 
> Still it would be a great fun project.
Ya I think so too ;D If I had any familiarity wth VHDL etc, I'd try to
simulate something for two disparate RISC instruction sets.

What I need to know is more possible applications of this, and some
sort of feasibility analysis. Any ideas?

kundi

Article: 46553
Subject: Re: Hardware Code Morphing?
From: Bernd Paysan <bernd.paysan@gmx.de>
Date: Tue, 03 Sep 2002 14:56:57 +0200
Links: << >> << T >> << A >>

Falk Brunner wrote:
> I never understood what the hell is the advantage of putting some kind of
> realtime compiler into expensive silliy-cone? Wouldnt those Transmeta guys
> be much smarter (and nowadays much richer) if the had done a nice
> optimiced RISC or whatever CPU and wrote a "simple" piece of software "aka
> translator, compiler whatever) to translate a x86 code to native RISC just
> before executing it, then load the RISC-code into RAM and execute it.?
> Anyone can enlighten me?

That's what they do.

I think that's a tradeoff issue. Crusoe has a software translator, which is 
slow (a number of cycles per translation), does a lot of optimization work, 
and caches a lot in RAM. Pentium 4 has a hardware translator, which is 
somewhat slow (1 cycle per translation), does no optimization work, and 
caches a L1 portion worth in the trace chache. Athlon, Hammer, and Pentium 
3 have a hardware translator, which is fast (3 translations per cycle), 
doesn't optimize (rumors about the Hammer optimizing a little haven't been 
confirmed), and doesn't cache anything.

I think the Pentium 4 approach should have been taken a bit further, towards 
more optimization. The P4 contains a estimation based scheduler that queues 
dependent instructions into its very fast in-order execution engine, and 
restarts (or reschedules) these dependent instruction if the calculation 
was wrong (e.g. due to a cache miss for a memory argument), making it 
effectively work like an OoOE engine, though it isn't. This stuff could 
have gone to the trace cache in this optimized form, removing a number of 
pipeline stages from the design.

The mistake of TM is that as long as you do simple benchmarks, everything 
goes well, because you have only a few core routines that get executed very 
often, and therefore, the translation overhead is small. When you run 
real-world applications, things change. When you run benchmarks like BapCo 
(which execute one piece of code after the other), performance goes down 
even further. That's also a problem for the Pentium 4, but not nearly as 
much.

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Article: 46554
Subject: Re: IT consultant vs Engineer
From: Nial Stewart <nials@britain.agilent.com>
Date: Tue, 03 Sep 2002 14:58:20 +0100
Links: << >> << T >> << A >>

Duy K Do wrote:
> 
> Do you get offended if someone label you as IT consultant?


It's the word 'consultant' I shy clear of.

As someone said a 'consultant' is someone who borrows your watch
to tell you the time.

As a contracting engineer I prefer to think I get paid for some
work.

Nial.

Article: 46555
Subject: Re: Bit serial arithmetic Vs Digit serial Arithmetic
From: Lars Wanhammar <larsw@isy.liu.se>
Date: Tue, 03 Sep 2002 16:27:42 +0200
Links: << >> << T >> << A >>

Hi
The issue is not that simple. I suggest that you take a look at some of
our papers at

http://www.es.isy.liu.se/publications/index.html

Best regards

Lasse


In article <b0ab35d4.0207280653.5ebb19b8@posting.google.com>, hristo
<hristostev@yahoo.com> wrote:

> hello,
> may be basic question
> if someone has to implement an FIR using bit serial, he has to see the 
> output wordlength, thus the FIR bit growth. Then, he needs to expand the
> input data with zero to have regular wordlength through the structure
> 
> in parrallel we have not to do that
> 
> what about digit serial, should we still need to expand the input data
> with zero digits
> 
> Many thanks

Article: 46556
Subject: Re: Hardware Code Morphing?
From: peter@abbnm.com (Peter da Silva)
Date: 3 Sep 2002 14:53:15 GMT
Links: << >> << T >> << A >>

In article <cd714c44.0209020625.5b892675@posting.google.com>,
Kunal <kundi_forever@yahoo.com> wrote:
> I read about Transmeta's Crusoe chip some time back, which has
> something called the Code Morphing Software.

Basically, the code morphing software is a combination interpreter and
JIT recompiler for the IA32 instruction set to the high perfromance
hardware-specific instruction set the processor itself uses. It's very
much like the front end recompilation hardware in just about every high
performance IA32 chip since, oh, maybe as far back as the 486 with its
much-touted "RISC core", except by doing it in software they can build a
chip that's *almost* as fast as the hardware equivalent with a lot less
silicon (and hence power consumption).

> Ok here's an idea... how about code-morphing HARDWARE? 

Congratulations, you just re-invented the last ten years of IA32 processor
design. :)

-- 
I've seen things you people can't imagine. Chimneysweeps on fire over the roofs
of London. I've watched kite-strings glitter in the sun at Hyde Park Gate.  All
these things will be lost in time, like chalk-paintings in the rain.   `-_-'
Time for your nap.  | Peter da Silva | Har du kramat din varg, idag?    'U`

Article: 46557
Subject: Re: Hardware Code Morphing?
From: peter@abbnm.com (Peter da Silva)
Date: 3 Sep 2002 14:55:08 GMT
Links: << >> << T >> << A >>

In article <al08lg$1litsf$1@id-84877.news.dfncis.de>,
Falk Brunner <Falk.Brunner@gmx.de> wrote:
> I never understood what the hell is the advantage of putting some kind of
> realtime compiler into expensive silliy-cone? Wouldnt those Transmeta guys
> be much smarter (and nowadays much richer) if the had done a nice optimiced
> RISC or whatever CPU and wrote a "simple" piece of software "aka translator,
> compiler whatever) to translate a x86 code to native RISC just before
> executing it, then load the RISC-code into RAM and execute it.?

What a brilliant idea. You could call it Code Morphing.

-- 
I've seen things you people can't imagine. Chimneysweeps on fire over the roofs
of London. I've watched kite-strings glitter in the sun at Hyde Park Gate.  All
these things will be lost in time, like chalk-paintings in the rain.   `-_-'
Time for your nap.  | Peter da Silva | Har du kramat din varg, idag?    'U`

Article: 46558
Subject: Re: Question about IOB, BUFG, IBUF and IBUG.
From: "Falk Brunner" <Falk.Brunner@gmx.de>
Date: Tue, 3 Sep 2002 17:30:32 +0200
Links: << >> << T >> << A >>

"BROTO Laurent" <lbroto@free.fr> schrieb im Newsbeitrag
news:3d746c4f$0$573$626a54ce@news.free.fr...
> Hi all !
>
> I would like to know what's utility of these differents buffers and
> differences between them.
> I've undertsood this:
>  - when I want to input CLK signal in my fpga, I must to use IBUFG, but
why
> ?

Because IBUFGs are special clock input buffers, that have a very short and
predictable (= short delay) connection to a DLL. This is essential for clock
management at high frequencies.

>  - when I want to map an output of a DLL on other process or on other DLL,
I
> must to use BUFG. But why ?

Similar to IBUFG. BUFG are global clock buffers, that (in 99.9% of all
cases) feed the clock inputs of the FlipFlops and RAMs on your design, which
can be some hundred to some then thousand. Again, to do clock management at
high frequencies, you need predictical timing, which can only be achieved by
using BUFGs (Iam not talking about hacking and manual routing stuff)

--
MfG
Falk

Article: 46559
Subject: Re: Logic on Virtex CLB, what's the YB and XB used for?
From: John_H <johnhandwork@mail.com>
Date: Tue, 03 Sep 2002 15:31:03 GMT
Links: << >> << T >> << A >>

There are some times when hand-coded MUXCY primitives use the XB or YB outputs
but the most common use is in comparators.  If you have an if(a>b) construct,
the logic that follows is conditional on the a>b result which is often
realized with a carry chain, the output going through XB or YB.  Many
designers even use the comparison in simple counters suggesting the use is
pervasive:  if( count>28 ) count<=0; else count<=count+1;

If you've looked into the routing level implementation of the Xilinx devices
(the options available in FPGA_EDITOR) you'll see that the XB and YB outputs
have an initial routing path that takes them along any of 8 lines out of the
CLB, the same paths available to the X, Y, XQ, and YQ outputs.  I recall that
these signals don't have all 8 outputs available for both outputs (XB/YB) for
both slices, but have most of that first layer routing available (6 out of 8).

The FPGA Editor is the best way for me to figure out where to place critical
elements for best routing utilization - it should help you figure out some
details as well.

"Nicholas C. Weaver" wrote:

> For my research purposes, I'm considering what effects a corner
> turning interconnect has, by comparing apples to apples with Virtex
> family interconnect (long story).
>
> One disadvantage is that less frequently used inputs and outputs cost
> relatively more (since inputs and outputs connect to ALL
> possibilities, its a different logical depopulation).   So I want to,
> in my comparisons, remove a couple of outputs for modeling my logic
> block.
>
> So the question is, how often and WHY are the carry chains driven to
> the XB and YB outputs.  According to the slice internals, they are
> only capable of being driven by the carry chain (XB) or carry chain or
> routhrough (YB).
>
> What logic tends to use these outputs beyond the top carry out?
> --
> Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Article: 46560
Subject: Re: Question about IOB, BUFG, IBUF and IBUG.
From: lng <>
Date: Tue, 3 Sep 2002 08:36:51 -0700
Links: << >> << T >> << A >>

Haha, you have too many "whys",
for my understands, clk net is critical so it has special hardwares
dedicated to it as IBUFG, BUFG, and the "golden" wires in FGPA,
(hehe I ain't sure it's golden or not), anyway it's kind of expensive stuffs and we can't afford to use them everywhere in FPGA.
May be someone else have better answer than mine.

Regard.

Article: 46561
Subject: Re: Question about IOB, BUFG, IBUF and IBUG.
From: Ryan Laity <ryan.laity@xilinx.com>
Date: Tue, 03 Sep 2002 09:44:32 -0600
Links: << >> << T >> << A >>

Hi Laurent,
   An IBUFG is very similar to an IBUF, but it's located in a global 
clock input IOB.  These are the IOBs that are well located and connected 
to your GCLK pins externally.  The IBUFG is not meant to buffer a signal 
for high fanout, however, it's there for connecting to either the DLL or 
directly to the BUFG.
   The BUFG is not an I/O buffer like the IBUF and IBUFG, it's located 
outside the IOB on its own.  It IS meant to buffer high fanout clock 
signals and keep a clock signal on the global clock routing.  Have a 
look in FPGA Editor for the design that you've alluded to here (any 
design with some clocks that are properly buffered), it might help to 
explain the locations and usefulness of the various buffers.

Cheers,
Ryan

BROTO Laurent wrote:
> Hi all !
> 
> I would like to know what's utility of these differents buffers and
> differences between them.
> 
> I've undertsood this:
>  - when I want to input CLK signal in my fpga, I must to use IBUFG, but why
> ?
>  - when I want to map an output of a DLL on other process or on other DLL, I
> must to use BUFG. But why ?
> 
> Thanks a lot,
> 
> --
> 
> Laurent
> 
>

Article: 46562
Subject: Re: Hardware Code Morphing?
From: rstevew@deeptht.armory.com (Richard Steven Walz)
Date: 03 Sep 2002 15:47:14 GMT
Links: << >> << T >> << A >>

In article <al08lg$1litsf$1@ID-84877.news.dfncis.de>,
Falk Brunner <Falk.Brunner@gmx.de> wrote:
>"Kunal" <kundi_forever@yahoo.com> schrieb im Newsbeitrag
>news:cd714c44.0209020625.5b892675@posting.google.com...
>
>[Code morphing in microprocessors and FPGAs ]
>
>I never understood what the hell is the advantage of putting some kind of
>realtime compiler into expensive silliy-cone? Wouldnt those Transmeta guys
>be much smarter (and nowadays much richer) if the had done a nice optimiced
>RISC or whatever CPU and wrote a "simple" piece of software "aka translator,
>compiler whatever) to translate a x86 code to native RISC just before
>executing it, then load the RISC-code into RAM and execute it.? Anyone can
>enlighten me?
>--
>MfG
>Falk
----------------------
If they just want to download printable code, why don't they put in a
simple in-silicon ASCII to code converter as an adjunct to the command
register so that if you set it to read ASCII hex it reads it and executes
it, that is nothing but a decoder!! Then people can just learn the hex
numbers as commands!
-Steve 
--
-Steve Walz  rstevew@armory.com   ftp://ftp.armory.com/pub/user/rstevew
Electronics Site!! 1000's of Files and Dirs!!  With Schematics Galore!!
http://www.armory.com/~rstevew or http://www.armory.com/~rstevew/Public

Article: 46563
Subject: why Xilinx does not make its own HDL synthesiser?
From: hristostev@yahoo.com (hristo)
Date: 3 Sep 2002 08:47:46 -0700
Links: << >> << T >> << A >>

why Xilinx does not make its own HDL synthesiser?
why it has to use a thrid party?
what it has opted for Forge for example?

Article: 46564
Subject: Re: Hardware Code Morphing?
From: rstevew@deeptht.armory.com (Richard Steven Walz)
Date: 03 Sep 2002 15:58:00 GMT
Links: << >> << T >> << A >>

In article <3d74d981$0$79564$8eec23a@newsreader.tycho.net>,
Richard Steven Walz <rstevew@deeptht.armory.com> wrote:
>In article <al08lg$1litsf$1@ID-84877.news.dfncis.de>,
>Falk Brunner <Falk.Brunner@gmx.de> wrote:
>>"Kunal" <kundi_forever@yahoo.com> schrieb im Newsbeitrag
>>news:cd714c44.0209020625.5b892675@posting.google.com...
>>
>>[Code morphing in microprocessors and FPGAs ]
>>
>>I never understood what the hell is the advantage of putting some kind of
>>realtime compiler into expensive silliy-cone? Wouldnt those Transmeta guys
>>be much smarter (and nowadays much richer) if the had done a nice optimiced
>>RISC or whatever CPU and wrote a "simple" piece of software "aka translator,
>>compiler whatever) to translate a x86 code to native RISC just before
>>executing it, then load the RISC-code into RAM and execute it.? Anyone can
>>enlighten me?
>>--
>>MfG
>>Falk
>----------------------
>If they just want to download printable code, why don't they put in a
>simple in-silicon ASCII to code converter as an adjunct to the command
>register so that if you set it to read ASCII hex it reads it and executes
>it, that is nothing but a decoder!! Then people can just learn the hex
>numbers as commands!
-----------------------
Nevermind, that's best done in the display anyway, how silly.
-Steve 
--
-Steve Walz  rstevew@armory.com   ftp://ftp.armory.com/pub/user/rstevew
Electronics Site!! 1000's of Files and Dirs!!  With Schematics Galore!!
http://www.armory.com/~rstevew or http://www.armory.com/~rstevew/Public

Article: 46565
Subject: Re: Logic on Virtex CLB, what's the YB and XB used for?
From: nweaver@ribbit.CS.Berkeley.EDU (Nicholas C. Weaver)
Date: Tue, 3 Sep 2002 16:15:36 +0000 (UTC)
Links: << >> << T >> << A >>

In article <3D74D5B6.3356282B@mail.com>, John_H  <johnhandwork@mail.com> wrote:
>There are some times when hand-coded MUXCY primitives use the XB or YB outputs
>but the most common use is in comparators.  If you have an if(a>b) construct,
>the logic that follows is conditional on the a>b result which is often
>realized with a carry chain, the output going through XB or YB.  Many
>designers even use the comparison in simple counters suggesting the use is
>pervasive:  if( count>28 ) count<=0; else count<=count+1;

OK.  Let me clarify.  How often is BOTH the XB and X, YB and Y used in
the same logic block?
-- 
Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Article: 46566
Subject: Re: C/C++ to Verilog/VHDL ?!
From: "Frank Andreas de Groot" <nospam@nospam.com>
Date: Tue, 03 Sep 2002 17:01:13 GMT
Links: << >> << T >> << A >>

They are 2 different things.
It hard to convert an algorithm expressed in C into an efficient VHDL
representation.
It took a while before these tools were made, and they have their advantages
& disadvantages.
They produce slow, inefficient designs, but they do it extremely fast,
however often you change the algorithm.
Extremely complex algorithms that take a few hours to write can be
translated to ten thousand lines of VHLD in a few minutes.
I don't think Forge or Handel-C are 'screwdrivers to drive in nails'.
CERN for example uses those tools, which cost up to 75,000 USD.
There must be an economic/engineering justification for them in certain
niches.
And I think it will just be a matter of time before designing in Verilog or
VHDL will be just as uncommon as programming in assembly.

Frank


"Rene Tschaggelar" <tschaggelar@dplanet.ch> wrote in message
news:3D74938D.3070903@dplanet.ch...
> For nails you use a hammer and for screws you take a screwdriver.
> Likewise are the tools for FPGAs. VHDL/Verilog wasn't invented
> because the C/C++ was unknown, I just doesn't fit.

Article: 46567
Subject: Re: Actel Proto Boards
From: jerry@quickcores.com (Jerry D. Harthcock)
Date: 3 Sep 2002 10:16:47 -0700
Links: << >> << T >> << A >>

Jim Granville <jim.granville@designtools.co.nz> wrote in message news:<3D740E06.3058@designtools.co.nz>...
> Jerry D. Harthcock wrote:
> > 
> > "Josh Model" <model@ll.mit.edu> wrote in message news:<wkc89.44$I7.3516@llslave.llan.ll.mit.edu>...
> > > Has anyone come across any 3rd party prototype boards for Actel FPGA's?
> > > It seems as if Actel's stuff starts at ~$1k, and I was looking for one
> > > closer to the $500 range.
> > >
> > > Thanks,
> > > --Josh
> > 
> > QuickCores offers a low cost IP Delivery System based on Actel's new
> > ProASIC+.
> > Prices start at $175 for the APA075.  It's all self-contained in that
> > no external device programmer is required.  It also includes a
> > built-in JTAG boundary scan controller and built-in JTAG real-time
> > debug controller for microcontroller designs.  No JTAG pod is required
> > since everything is done via RS-232 using Actel Libero-generated STAPL
> > files.
> > 
> > It's all packaged in a 28-pin postage stamp form factor for easy
> > prototyping.  It's called the Musketeer (All-for-One Stamp).
> > 
> > Visit www.quickcores.com for info.
> 
>  Interesting lineup.
> 
>  What if a designer wants to mix some FPGA HW design, with one of 
>  your soft-cores ? - how is that done ?
> 
>  Missing from the web, is any speed info on these cores ?
> 
>  - jg

QuickCores offers the cores in synthesizable Verilog netlist format
under separate license.  Hook up is straightforward.  You simply
instantiate at the top level the CPU, memory, I/O, and whatever other
modules you need for your application.  We're working on an object
oriented builder which will allow you to do this automatically.

On the Musketeer, the ProASIC+ is fed with a 24.5 MHz clock (see data
sheet at QC web)from the Musketeer's built-in "helper" micro.  For the
Q68HC05 soft core, this equates to 12.25 MIPs (single cycle
instructions).  If implemented in anti-fuse such as QuickLogic
QuickDSP or Actel for example, it's about 2x that.

Jerry

Article: 46568
Subject: Re: C/C++ to Verilog/VHDL ?!
From: Austin Lesea <austin.lesea@xilinx.com>
Date: Tue, 03 Sep 2002 10:23:50 -0700
Links: << >> << T >> << A >>

Well,

The problem is worse.

c, or c++ is a single thread, single process language.

If a, then b, else c.

A HDL can describe parallelism, concurrency, control, in such a way to provide a
more optimal solution.

Can you structure c code such that it can be converted more efficiently into
gates and logic?  Perhaps.  But coding style is the one thing one can not
enforce.

For example, to try to use a c program for a DSP application that ran on a
popular DSP uC, and retargeting it for an FPGA might be a real dissapointment
(been there, done that).  Since most DSP is developed from simulations using
math simulators, it is far more efficient to convert the math simualtions to
gates, rather than to use an inefficient intermediate language that was not even
the source of the algorithm.

The two leading "high level" languages being used for describing logic (system
c, super verilog) have both been attempts to resolve this issue, and provide a
higher level of abstraction.  Recently, folks have found that each is better
suited to some tasks at the exclusion of others, indicating the languages are
still at too low a level of abstraction (don't solve all problems equally).

Reminds me of Fortran and Cobol ..... ugly, nasty, hard to deal with ..... but
the best we had at the time.  Each was good for a specific area or problem.

Working in MatLab is more like a high level solution, although it is also too
specific, but far better than writing in c code, and expecting a massively
parallel solution to somehow fall out.

Other intersting work, such as  http://ptolemy.eecs.berkeley.edu/ leads to a
more interesting pardigm for systems design.  A graphical gui "language" is
perhaps the most efficient of all.  If the simulation works, you can press a
button, compile it (bitgen it), and ship it.  Wouldn't that be heavenly?

Austin

Frank Andreas de Groot wrote:

> They are 2 different things.
> It hard to convert an algorithm expressed in C into an efficient VHDL
> representation.
> It took a while before these tools were made, and they have their advantages
> & disadvantages.
> They produce slow, inefficient designs, but they do it extremely fast,
> however often you change the algorithm.
> Extremely complex algorithms that take a few hours to write can be
> translated to ten thousand lines of VHLD in a few minutes.
> I don't think Forge or Handel-C are 'screwdrivers to drive in nails'.
> CERN for example uses those tools, which cost up to 75,000 USD.
> There must be an economic/engineering justification for them in certain
> niches.
> And I think it will just be a matter of time before designing in Verilog or
> VHDL will be just as uncommon as programming in assembly.
>
> Frank
>
> "Rene Tschaggelar" <tschaggelar@dplanet.ch> wrote in message
> news:3D74938D.3070903@dplanet.ch...
> > For nails you use a hammer and for screws you take a screwdriver.
> > Likewise are the tools for FPGAs. VHDL/Verilog wasn't invented
> > because the C/C++ was unknown, I just doesn't fit.

Article: 46569
Subject: Re: Logic on Virtex CLB, what's the YB and XB used for?
From: John_H <johnhandwork@mail.com>
Date: Tue, 03 Sep 2002 17:27:22 GMT
Links: << >> << T >> << A >>

"how often" I don't think anyone can say.  The code I put up in the thread in this
newsgroup "synthesizing hard coded numbers" (started Monday) includes a carry-out
and a value in the same logic block.  The applicable parts repeated here,

reg  [ 3:0] Index;
wire [ 4:0] next_index;
always @(posedge clock)
   if( next_index[4] )  Index <= 4'd11;  // if next index is -1, reload
   else Index <= next_index;
assign next_index = Index - 1;

The next_index[4] bit should be the output of a MUXCY (YB) at the same point
Index[3] is found (YQ).  If the next_index were used live elsewhere rather than the
registered version, all three outputs - Y, YQ, YB - would be used.

If it can happen, it usually does.

"Nicholas C. Weaver" wrote:

> In article <3D74D5B6.3356282B@mail.com>, John_H  <johnhandwork@mail.com> wrote:
> >There are some times when hand-coded MUXCY primitives use the XB or YB outputs
> >but the most common use is in comparators.  If you have an if(a>b) construct,
> >the logic that follows is conditional on the a>b result which is often
> >realized with a carry chain, the output going through XB or YB.  Many
> >designers even use the comparison in simple counters suggesting the use is
> >pervasive:  if( count>28 ) count<=0; else count<=count+1;
>
> OK.  Let me clarify.  How often is BOTH the XB and X, YB and Y used in
> the same logic block?
> --
> Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Article: 46570
Subject: Re: Hardware Code Morphing?
From: "Austin Franklin" <austin@da98rkroom.com>
Date: Tue, 3 Sep 2002 14:05:52 -0400
Links: << >> << T >> << A >>

Hi Kunal,

Transmeta's "concept" is not new, though the appear to want people to
believe differently.  This "concept" has been around for well over 20 years,
at least.  The UCSD P system was of similar concept, it just did this with a
software translation layer,, translating the P (pseudo) code into the
machine instructions for the particular machine it was running on.
Microprogramming is pretty much the same thing as well, which just so
happens, is, implemented in hardware.

Is what they have done of any real use?  I really don't think so...  They
made wild claims about power savings that simply don't exist, as the CPU is
not really a large chunk of the overall power budget of notebook computers.
The same advances in power savings that occur in the storage/display etc.
devices are applicable to ANY CPU, not just the Transmeta.  I've always
found their claims a bit dubious.

Austin


"Kunal" <kundi_forever@yahoo.com> wrote in message
news:cd714c44.0209020625.5b892675@posting.google.com...
> I read about Transmeta's Crusoe chip some time back, which has
> something called the Code Morphing Software. This code morphing s/w
> actually reads hex from its code memory, and at run-time translates
> the hex code into equivalent native machine language instructions. So
> the whole system itself is like a Java Virtual Machine (or a run-time
> cross-assembler), only there is no partitioning between the H/W and
> the system S/W.
>
> The whole thing is a overhead, of course, but its highly optimized and
> parallelized in hardware wherever possible. Last I read, they had code
> morphing software for 8086 instructions, i.e the Code Morphing
> Software could only "understand" 86 hex. This system also allows you
> to run programs compiled for different processors at the same time,
> i.e it decides at run-time which instruction set is supported.
>
> Ok here's an idea... how about code-morphing HARDWARE?
>
> A pretty challenging VLSI project actually, possible too. Here's how I
> think it may work:
> This Code Morphing (CM) chip would be placed on the bus in between the
> target uC and the code memory (ROM, flash wotever). It would route the
> addresses generated by the uC to the code memory, and translate the
> returned contents into hex code of the target uC, and send the
> translated version back to the uC. This is pretty much what the JVM
> does, but this virtual machine is a HARDWARE virutal machine, i.e. the
> mapping between various instruction sets is HARD-WIRED.
>
> Ok, maybe we could make it more generic, and endow the CM chip with
> large register sets and/or memory areas, which can be dynamically
> loaded with the target and foreign instruction sets and the mapping
> between them.  In fact maybe later on we could add a number of
> code-mappings onto a single device. Since all the translation happens
> in hardware, there can be virtually no overheads (I think!). It will
> be especially easy when dealing with similar instruction sets, like
> CISC-to-CISC and RISC-to-RISC. Even if it is CISC-to-RISC, the
> performance will not be truly affected, because it will simply replace
> the CISC instruction with the equivalent CISC instructions, and may
> actually end up saving code memory. Since we have software
> cross-assemblers, it is conceivable that they can be implemented in
> hardware.
>
> Of course, there are a LOT of issues here, and operation may be slowed
> down slightly, but it IS possible. The biggest problem would be
> mapping between specific registers, but we can leave that to the
> application programmer or the source assembler / compiler.
>
> The applications of such a device would be very interesting indeed. A
> code-morph for Java bytecode is only the beginning....  Backward
> Compatibility will not be an issue anymore. This, I understand, is
> keeping them from using all the features on the latest Intel chips. We
> can load protocol-translation mappings to, transperently converting
> from, say RS-232 to I2C (we already have hardware TCP/IP stacks). We
> could port the hex code itself to other processors, instead of
> re-writing the source code and re-compiling. Programmers can include
> useful language features from other instructions sets without having
> to worry about implementing them in the target processor code.
>
> Ok that's enough speculation for now, but could anyone well-versed in
> VLSI design tell me how feasible this is? I don't think it will be
> very difficult to implement, but the design of such a chip would be
> very challenging. Also I need to know from experienced embedded
> systems designers how truly useful such a device would be, and would
> all the effort of developing it pay off, in terms of financial returns
> and intellectual property rights.
>
> kundi

Article: 46571
Subject: Re: C/C++ to Verilog/VHDL ?!
From: "Frank Andreas de Groot" <nospam@nospam.com>
Date: Tue, 03 Sep 2002 18:21:59 GMT
Links: << >> << T >> << A >>

"Austin Lesea" <austin.lesea@xilinx.com> wrote in message
news:3D74F026.77355BAC@xilinx.com...
>
> c, or c++ is a single thread, single process language.

The tools we are talking about extend a subset of C to include keywords for
paralellism,
and JAVA (Forge) has built-in mechanisms to work with threads.

> For example, to try to use a c program for a DSP application that ran on a
> popular DSP uC, and retargeting it for an FPGA might be a real
dissapointment
> (been there, done that).

I don't doubt that. But some niche markets benefit greatly from a C/JAVA to
HDL converter.
I want to make a PCI addin card with a FPGA-based coprocessor for a
massively parallel problem.
As long as it approaches the speed of an equivalent implementation on an
ordinary CPU, it is comercially justified.
To replace a motherboard with a dual-Pentium for example would be more
expensive for the customer,
not to mention that most customers would not be able/willing to do that for
the sake of my product.

I think that for most purposes, a HDL will remain into the far future the
method of choice to design ASIC's or FPGA's, but there is an emerging market
that has much less strict demands for speed of execution as opposed to speed
of implementation.

It may be that in the future, there will be very clever optimizers for VHDL
that can turn the stuff that comes out of a C --> VHDL converter into
something efficient.
Who knows what software improvements will bring us? A large library of VHDL
used by such a converter, advanced optimization techniques etc. And there
will be many directives that can be used to 'hint' the converter on what
kind of hardware should be generated, constraints that can optionally be
specified etc. Just because it's extremely hard to make such a tool doesn't
mean that it won't be done...

Frank

Article: 46572
Subject: Re: Hardware Code Morphing?
From: "Austin Franklin" <austin@da98rkroom.com>
Date: Tue, 3 Sep 2002 14:31:22 -0400
Links: << >> << T >> << A >>

> Also, Amdahl's law keeps Transmeta down, only about 1/3 of the
> notebook power budget is to the processor.

Having done a few power budget analysis for notebooks and other portable
devices, the power budged allowance for the processor is quite a bit less,
more like %8-%10 of the OVERALL power budget, and it's typically
significantly less in fact, depending on the "typical" use ascribed to the
notebook.  You also have to add in the power requirements for the "code
morphing" memory too AS part of the processor power budget, as it's only
required FOR the Transmeta CPU.

Austin

Article: 46573
Subject: Re: Hardware Code Morphing?
From: "Steve Casselman" <sc@vcc.com>
Date: Tue, 03 Sep 2002 18:35:26 GMT
Links: << >> << T >> << A >>

Take a look at patent number 5,684,980
http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtm
l/search-adv.htm&r=5&p=1&f=G&l=50&d=ft00&S1=(casselman.INZZ.+AND+virtual.ASN
M.)&OS=in/casselman+AND+an/virtual&RS=(IN/casselman+AND+AN/virtual)

I call it the runtime hardware generation patent...

Steve Casselman

> Ok here's an idea... how about code-morphing HARDWARE?

Article: 46574
Subject: Re: Webpack 4.2 Schematic
From: alw@al-williams.com (Al Williams)
Date: 3 Sep 2002 12:01:12 -0700
Links: << >> << T >> << A >>

In fact, just this morning I added a StateCAD tutorial to the page.
You can find it from the front page at http://tutor.al-williams.com

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search