Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
In article <al2icr$9mm@web.eng.baileynm.com>, Peter da Silva <peter@abbnm.com> wrote: >In article <cd714c44.0209020625.5b892675@posting.google.com>, >Kunal <kundi_forever@yahoo.com> wrote: >> Ok here's an idea... how about code-morphing HARDWARE? > >Congratulations, you just re-invented the last ten years of IA32 processor >design. :) There have been some recent papers on doing optimizations on code in the trace cache that is probably a bit closer to what he was thinking about than the traditional 486/P5/P6 design you are talking about. nateArticle: 46576
hristostev@yahoo.com (hristo) writes: > why Xilinx does not make its own HDL synthesiser? > why it has to use a thrid party? > what it has opted for Forge for example? Xilinx does have their own synthesizer. It's called "XST".Article: 46577
Nial Stewart wrote: > Duy K Do wrote: > > > > Do you get offended if someone label you as IT consultant? > > It's the word 'consultant' I shy clear of. > > As someone said a 'consultant' is someone who borrows your watch > to tell you the time. > Judging by the fee:performance ratio of most management consultants the best term is probably ``insultant''. As someone once said of MBA types ``Yesterday's solutions to last year's problems''.Article: 46578
The flops are inside the dedicated multiplier, not add-ons that use up CLB resources. If you can tolerate the latency, they are a win since the cut the minimum clock cycle time considerably. In order to get reasonable performance, you'll also want to avoid logic between the multiplier and the registers leading into and out of it, and you'll want to put those registers immediately adjacent to the multiplier. The set-up and clock to Q of the multipliers are pretty lengthy, and adding a long route plus combinatorial delays to those will kill your performance quite quickly. The "enhanced" multiplier is a redesign of the multiplier that speeds it up a bunch. The original multiplies have a very slow path in them that limits performance to 130 Mhz or so with a pipelined multiplier with the added i/o registers in the fabric in a -4 part. The enhanced multipliers get the speed up to over 200 Mhz. You want the silicon with the "enhanced" multipliers. Jason Phillips wrote: > I've been attempting to determine just where those flops are myself. In my case the 20% speed-up is an absolute necessity on a 6000 -4 part I have, but not required on a -5 1000 part I also have. The -5 part non-pipelined multipliers are actually the same speed as the -4 pipelined mults. Latency is not an issue for me, so I am attempting to determine why I shouldn't always use the faster piplined multipliers no matter what the speed grade. However, if the flops are external, I won't have the resources on the smaller part. > > Also, what is this I have seen concerning an "enhanced" multiplier? > > -Jason -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759Article: 46579
I've been that drunk, too. -MooCow "Kunal" <kundi_forever@yahoo.com> wrote in message news:cd714c44.0209020625.5b892675@posting.google.com... > I read about Transmeta's Crusoe chip some time back, which has > something called the Code Morphing Software. This code morphing s/w > actually reads hex from its code memory, and at run-time translates > the hex code into equivalent native machine language instructions. So > the whole system itself is like a Java Virtual Machine (or a run-time > cross-assembler), only there is no partitioning between the H/W and > the system S/W. > > The whole thing is a overhead, of course, but its highly optimized and > parallelized in hardware wherever possible. Last I read, they had code > morphing software for 8086 instructions, i.e the Code Morphing > Software could only "understand" 86 hex. This system also allows you > to run programs compiled for different processors at the same time, > i.e it decides at run-time which instruction set is supported. > > Ok here's an idea... how about code-morphing HARDWARE? > > A pretty challenging VLSI project actually, possible too. Here's how I > think it may work: > This Code Morphing (CM) chip would be placed on the bus in between the > target uC and the code memory (ROM, flash wotever). It would route the > addresses generated by the uC to the code memory, and translate the > returned contents into hex code of the target uC, and send the > translated version back to the uC. This is pretty much what the JVM > does, but this virtual machine is a HARDWARE virutal machine, i.e. the > mapping between various instruction sets is HARD-WIRED. > > Ok, maybe we could make it more generic, and endow the CM chip with > large register sets and/or memory areas, which can be dynamically > loaded with the target and foreign instruction sets and the mapping > between them. In fact maybe later on we could add a number of > code-mappings onto a single device. Since all the translation happens > in hardware, there can be virtually no overheads (I think!). It will > be especially easy when dealing with similar instruction sets, like > CISC-to-CISC and RISC-to-RISC. Even if it is CISC-to-RISC, the > performance will not be truly affected, because it will simply replace > the CISC instruction with the equivalent CISC instructions, and may > actually end up saving code memory. Since we have software > cross-assemblers, it is conceivable that they can be implemented in > hardware. > > Of course, there are a LOT of issues here, and operation may be slowed > down slightly, but it IS possible. The biggest problem would be > mapping between specific registers, but we can leave that to the > application programmer or the source assembler / compiler. > > The applications of such a device would be very interesting indeed. A > code-morph for Java bytecode is only the beginning.... Backward > Compatibility will not be an issue anymore. This, I understand, is > keeping them from using all the features on the latest Intel chips. We > can load protocol-translation mappings to, transperently converting > from, say RS-232 to I2C (we already have hardware TCP/IP stacks). We > could port the hex code itself to other processors, instead of > re-writing the source code and re-compiling. Programmers can include > useful language features from other instructions sets without having > to worry about implementing them in the target processor code. > > Ok that's enough speculation for now, but could anyone well-versed in > VLSI design tell me how feasible this is? I don't think it will be > very difficult to implement, but the design of such a chip would be > very challenging. Also I need to know from experienced embedded > systems designers how truly useful such a device would be, and would > all the effort of developing it pay off, in terms of financial returns > and intellectual property rights. > > kundiArticle: 46580
> > Ok here's an idea... how about code-morphing HARDWARE? > > Congratulations, you just re-invented the last ten years of IA32 processor > design. :) Give him a break, Peter. Although it's not a totally novel idea, it is a point on the spectrum of decoding CISC instructions. * Decode CISC instructions and execute directly * Decode CISC instructions into simpler instructions (not necessarily RISC, unless you call an instruction that is more than 100 bits wide, with more than 2 inputs, lots of widgets, possibly load-alu, etc., "RISC") * Decode CISC instructions into simpler instructions and store them in a decoded instruction cache ... trace cache * Decode CISC instructions into simpler instructions and perform optimizations on them ... actually, the "spectrum" forks here, since you can perform some optimizations on a stream of instructions without storing them, although storing the optimized instructions us attractive * Continuing, as to where to store the decoded and optimized instructions, and how visible they are: ** Store the decoded instructions in SRAM on-CPU - the decoded instruction cache or trace cache we already described - visible only to hardware - special purpose storage - store them in a big uniform structure, e.g. take over part of the L2 cache array - visible only to hardware / microcode - less special purpose stuff, but you may still need the PC mapper - store them in main, DRAM, memory a) in main memory that the OS has dedicated to microarchitecture purposes e.g. at boot - requires "protection" to prevent viruses modifying the cached micro-operations b) in OS visible main memory * Finally, OS visibility - hardware/ucode only - OS cooperative - running in a virtual machine layer "under" the so-called native OS === Personal post. Not opinions of employer.Article: 46581
Following up on my earlier article, "RAM32X1S, Virtex-II, 4.1i PAR travails" ( http://groups.google.com/groups?selm=a5j91k%24akt%241%40slb3.atl.mindspring. net ), I am now trying to make a trivial RPM, consisting of a RAM32X1S and a DFF, RLOC'd to the same slice. No matter what I try (including adding BELs), no joy. I get a variant of: ERROR:Pack:679 - Unable to obey design constraints (MACRONAME=r/hset, RLOC=X0Y0) which require the combination of the following symbols into a single slice component: FLOP symbol "r/dff" (Output Signal = q) RAM symbol "r/ram/F" (Output Signal = r/ram/F) RAM symbol "r/ram/G" (Output Signal = r/ram/G) WEDECODE symbol "r/ram/WEDECODE" (Output Signal = r/ram/WEDECODE) BUF symbol "r/ram/BXBUF" (Output Signal = r/ram/A4') Unable to pack the register r/dff because of connectivity restrictions. Please correct the design constraints accordingly. I happen to know this is a perfectly legal packing of the slice, because if I DELETE the RLOC constraint on the DFF, the placer is quite happy to place the DFF in the same slice as the RAM32X1S on its own initiative. Specifically, the two 4-LUT RAM outputs go through the F5MUX, the F5 input of the FXMUX, and out onto output X, then back in on input DX, through the DXMUX and into the D input of FFX. This problem occurs in 4.1i, so I thought I'd try it under 4.2i. So I installed 4.2i. And 4.2i SP3. See following message for my blue screen of death travails there. Nope, not even 4.2i SP3 will accept my RAM32X1S and my DFF RLOC'd to the same slice. Can anyone offer any workarounds short of hard LOCs or XDL? I use RPMs to get repeatable datapath placements and timings so that I can tune up my design in a methodical way. Anything else is playing whack-a-mole on the critical paths. Quick and dirty Synplicity example: module tramq(clk, i, o); input clk, i; output o; reg o; reg we; reg [4:0] ad; reg d; ramq r(clk, we, ad, d, q); always @(posedge clk) begin {we, ad, d} <= {ad, d, i}; o <= q; end endmodule module ramq(clk, we, ad, d, q) /* synthesis syn_hier="hard"*/; input clk, we, d; input [4:0] ad; output q; wire o; RAM32X1S ram(.A0(ad[0]), .A1(ad[1]), .A2(ad[2]), .A3(ad[3]), .A4(ad[4]), .D(d), .O(o), .WCLK(clk), .WE(we)) /* synthesis RLOC="X0Y0" */; FD dff(.C(clk), .D(o), .Q(q)) /* synthesis RLOC="X0Y0" */; endmodule Thanks. Jan Gray, Gray Research LLCArticle: 46582
I finally installed 4.2i ... on my Windows 2000 SP3 system. At the end of the installation process, it prompted me to reboot. When I did so, my system blue screened, stop 00000050, page fault in non-paged area. Scary and rather disappointing. After figuring out and fixing the problem myself (over an hour of anxiety and head scratching), I discovered that Xilinx is already aware of the problem: 4.2i Install - Windows 2000 Service Pack 3 causes a crash or "STOP 50" blue screen error: http://support.xilinx.com/xlnx/xil_ans_display.jsp?iLanguageID=1&iCountryID= 1&getPagePath=15380 It would be nice if Xilinx made this very severe problem more prominent on this page: http://support.xilinx.com/support/software/install_info.htm. (Philip, you win. I have now seen my Windows 2000 machine do a BSOD. I have now experienced booting to SAFE MODE. And so forth.) Jan Gray, Gray Research LLCArticle: 46583
> 4.2i Install - Windows 2000 Service Pack 3 causes a crash or "STOP 50" blue > screen error: > http://support.xilinx.com/xlnx/xil_ans_display.jsp?iLanguageID=1&iCountryID= > 1&getPagePath=15380 Oops, I suppose I ought to describe the problem and the remedy. 4.2i (which predates Windows 2000 SP3) installs a driver (windrvr.sys) which BSODs under Windows 2000 SP3. The fix is to remove windrvr.sys (in SAFE MODE) and later install a newer windrvr.sys (version 5.05b). Jan Gray, Gray Research LLCArticle: 46584
The original premise is a fantasy, anyway. There's no tool that's going to take the place of actual hardware design knowledge. Sorry, but if you want to do an effective hardware design, you're just going to have to learn about hardware. If you want to program i C the way you've always done, use a gen. purpose micro and run C on it. -Stan "Frank Andreas de Groot" <nospam@nospam.com> wrote in message news:tV5d9.17346$sR2.306931@news4.ulv.nextra.no... > They are 2 different things. > It hard to convert an algorithm expressed in C into an efficient VHDL > representation. > It took a while before these tools were made, and they have their advantages > & disadvantages. > They produce slow, inefficient designs, but they do it extremely fast, > however often you change the algorithm. > Extremely complex algorithms that take a few hours to write can be > translated to ten thousand lines of VHLD in a few minutes. > I don't think Forge or Handel-C are 'screwdrivers to drive in nails'. > CERN for example uses those tools, which cost up to 75,000 USD. > There must be an economic/engineering justification for them in certain > niches. > And I think it will just be a matter of time before designing in Verilog or > VHDL will be just as uncommon as programming in assembly. > > Frank > > > "Rene Tschaggelar" <tschaggelar@dplanet.ch> wrote in message > news:3D74938D.3070903@dplanet.ch... > > For nails you use a hammer and for screws you take a screwdriver. > > Likewise are the tools for FPGAs. VHDL/Verilog wasn't invented > > because the C/C++ was unknown, I just doesn't fit. > > >Article: 46585
> ...but there is an emerging market > that has much less strict demands for speed of execution as opposed to speed > of implementation. ...and less strict demands on quality of engineering, if even any engineering at all... AustinArticle: 46586
"Frank Andreas de Groot" <nospam@nospam.com> wrote in message news:b57d9.17394$sR2.307156@news4.ulv.nextra.no... > "Austin Lesea" <austin.lesea@xilinx.com> wrote in message > news:3D74F026.77355BAC@xilinx.com... > > > > c, or c++ is a single thread, single process language. > > The tools we are talking about extend a subset of C to include keywords for > paralellism, > and JAVA (Forge) has built-in mechanisms to work with threads. We already have that... it's called Verilog > > For example, to try to use a c program for a DSP application that ran on a > > popular DSP uC, and retargeting it for an FPGA might be a real > dissapointment > > (been there, done that). > > I don't doubt that. But some niche markets benefit greatly from a C/JAVA to > HDL converter. > I want to make a PCI addin card with a FPGA-based coprocessor for a > massively parallel problem. > As long as it approaches the speed of an equivalent implementation on an > ordinary CPU, it is comercially justified. > To replace a motherboard with a dual-Pentium for example would be more > expensive for the customer, > not to mention that most customers would not be able/willing to do that for > the sake of my product. Really? Go out and compare the price of a large, fast FPGA to the price of a Pentium. > I think that for most purposes, a HDL will remain into the far future the > method of choice to design ASIC's or FPGA's, but there is an emerging market > that has much less strict demands for speed of execution as opposed to speed > of implementation. > > It may be that in the future, there will be very clever optimizers for VHDL > that can turn the stuff that comes out of a C --> VHDL converter into > something efficient. > Who knows what software improvements will bring us? A large library of VHDL > used by such a converter, advanced optimization techniques etc. And there > will be many directives that can be used to 'hint' the converter on what > kind of hardware should be generated, constraints that can optionally be > specified etc. Just because it's extremely hard to make such a tool doesn't > mean that it won't be done... When it's harder to make and use a tool than it is to do the actual work, it's not very cost effective. -StanArticle: 46587
> SRout[i] <= Coeff[i] >> Index; > end > assign next_index = Index - 1; > > Real hardware guys already have their shift together. > > - John_H > OK OK so there are real applications for 'shift' in this world !! :-) StanArticle: 46588
I am planning on using an FPGA to drive a small LCD display (not much more than a simple counter circuit), and this should run off of a typical watch battery. Does anyone have any reccomendations as to which FPGA I should use? Thanks in advance!Article: 46589
I remember reading one of his appnotes on why DSP implementation is much better/faster in FPGAs than in DSP processors. Amazingly, that article was written way back in 1994(or 95), when FPGAs were not so feature-rich as today.... Steve is definitely as asset to Xilinx. Welcome back! --Neeraj "Peter Alfke" <peter@xilinx.com> wrote in message news:3D6FE22D.959F4F3B@xilinx.com... > Xilinx welcomes Steve Knapp, back from 5 years trying his luck > elsewhere. > Some of you may remember his Optimagic Jumpstation. > Here we remember him as a hell of an Applications engineer, > a good writer, and a genuinely nice guy. > We are glad he found the way back home, where he belongs ! > > Peter Alfke, Xilinx Applications >Article: 46590
Xilinx CoolRunner (CPLD) is your best bet for low current consumption, but may require two batteries in series. Peter Alfke ================ Sheila Sim wrote: > I am planning on using an FPGA to drive a small LCD display (not much more > than a simple counter circuit), and this should run off of a typical watch > battery. Does anyone have any reccomendations as to which FPGA I should > use? > > Thanks in advance!Article: 46591
Jan Gray wrote: > Oops, I suppose I ought to describe the problem and the remedy. 4.2i (which > predates Windows 2000 SP3) installs a driver (windrvr.sys) which BSODs under > Windows 2000 SP3. The fix is to remove windrvr.sys (in SAFE MODE) and later > install a newer windrvr.sys (version 5.05b). > What in *'s name is a user mode app. like Foundation doing installing a device driver?! What does this driver do, and why must it be in kernel mode? -- Steve Williams "The woods are lovely, dark and deep. steve at icarus.com But I have promises to keep, steve at picturel.com and lines to code before I sleep, http://www.picturel.com And lines to code before I sleep." abuse@xo.com uce@ftc.govArticle: 46592
"Stephen Williams" <1vntkd4i001@sneakemail.com> wrote > What in *'s name is a user mode app. like Foundation doing installing > a device driver?! What does this driver do, and why must it be in > kernel mode? I believe the driver in question is used to wiggle parallel port I/Os to drive the JTAG programming cable. Jan Gray, Gray Research LLCArticle: 46593
Dear all, I want to develop a simple MCU and DSP chip with FPGA. Apart from Opencore, any other resource in internet for me to learn "How to design MCU/DSP" step by step? or any suggest for me? Thanks you. RealaArticle: 46594
"Jerry D. Harthcock" <jerry@quickcores.com> wrote in message news:c4cfbb5c.0209030916.415f8e44@posting.google.com... > > QuickCores offers the cores in synthesizable Verilog netlist format > under separate license. Hook up is straightforward. You simply > instantiate at the top level the CPU, memory, I/O, and whatever other > modules you need for your application. We're working on an object > oriented builder which will allow you to do this automatically. > > On the Musketeer, the ProASIC+ is fed with a 24.5 MHz clock (see data > sheet at QC web)from the Musketeer's built-in "helper" micro. For the > Q68HC05 soft core, this equates to 12.25 MIPs (single cycle > instructions). If implemented in anti-fuse such as QuickLogic > QuickDSP or Actel for example, it's about 2x that. > > Jerry Any plans for a 6809 core? -- Greg readgc@xxxhotmail.com (Remove the 'xxx' to send Email)Article: 46595
Thanks, I had the same problem, not very nice... The second-latest version doesn't have that problem, but now you posted the solution, great. I *knew* I shouldn't have installed SP3 though. People just don't test their software with it... Frank "Jan Gray" <jsgray@acm.org> wrote in message news:al3r9j$9a9$2@slb0.atl.mindspring.net... > I finally installed 4.2i ... on my Windows 2000 SP3 system. > > At the end of the installation process, it prompted me to reboot. When I did > so, my system blue screened, stop 00000050, page fault in non-paged area. > Scary and rather disappointing.Article: 46596
In a SystemC behavioral specification I want to map an array to a FPGA (Virtex) memory implementation (DesignWare memory model "DW_ram_rw_s_dff " for example). with Synopsys tools : CCC SystemC Behavioral Compiler, dc_shell, Memory Wrapper Generator. Has anybody been able to do this ? CharlesArticle: 46597
nweaver@ribbit.CS.Berkeley.EDU (Nicholas C. Weaver) wrote in message news:<al0fks$in9$2@agate.berkeley.edu>... > In article <al0f65$evv$1@vkhdsu24.hda.hydro.com>, > Terje Mathisen <terje.mathisen@hda.hydro.com> wrote: > >The pure sw emulation approach was tried with DECs FX-32 (sp?), it > >worked OK for applications but could never handle an OS. > > Well, it handled MOST of the OS, IIRC. I seem to recall the portable > NT relying very heavily on a 486 emulator. Hardly. NT booted, and ran, natively on non-x86 systems since day one (when it was NT 3.1 running on MIPS and x86). Of course, the non-x86 NT systems would not run x86 Win32 binaries. They *did*, however, have a 16 bit x86 emulator so that they could run DOS and Win16 applications (obviously the x86 versions used the real hardware to run Win16 and DOS applications). Until DEC shipped FX!32 for NT (Alpha), there was never any x86 (Win32) capability on non-x86 versions of NT.Article: 46598
Xanatos <fpsbb98@yahoo.com> wrote: > And Works great too.....Good show guys. I have been using Quartus 2.0 for Linux - but this was only a modified Windows- version using Wine, it was in no way a native Linux program. Is this the same with Quartus 2.1? But I must admitt, I did not test it very long, since I was missing Leonardo and Modelsim under Linux, and it did not make very much sense to steadily reboot the system just to switch beween the applications. Roman > -Xanatos > "LET" <vvcd@ath.forthnet.gr> wrote in message > news:3D51661F.976EDFC6@ath.forthnet.gr... >> QUARTUS II V2.1 LINUX (C) ALTERA >> fully functional >> just arrived >> >>Article: 46599
>Hmm, this sounds an awful lot like the MCNC suite. There are several problems >with benchmarks done this way: I wasn't trying to start the political/marketing wars that I remember from the SPEC days. I wonder if it is possible to have a non-political "benchmark". Or what would you call a good example? Here is my straw man. Remember, I'm wearing my rosy tinted glasses. We find a neat problem that fits into an FPGA. The problem is simple to describe. Perhaps that description uses some c code. It seems reasonable to implement a RISC machine to run that sample software. So we invent a RISC machine, publish the ISA, implement it in an FPGA, and publish a "good" sample implemantation in some HDL. This will take a year or more out of somebody's life. (I'm thinking of something like Knuth's TeX/Metafont.) Now the fun begins. One obvious step is to port the implementation to other HDLs and see how well it works and/or propose alternatives. Somewhere in here we have to actually implement it on various developement boards and make the LEDs blink as predicted. I'd also hope that various people (FAEs?) would make vendor specific implementations/patches. First would be "simple" things like floorplanning. Next would be tweaking the source code to help the compiler. Next would be major changes to take advantage or architecure specific ideas/hacks. Then somebody could redesign the ISA and compiler and/or source code to fit better in their favorite chip/HDL/software. >1) Subtle changes in the RTL coding can make a big difference in how well a >design maps to a particular family. Case in point: an adder tree with a clock >enable. If done in altera 10K, this forces each level of the tree to be done ... Isn't that something a compiler should be able to recognize? (Remembe when c compilers didn't recognize common sub expressions?) >2) performance of a given part is usually very dependent on place and route. ... Yup. We all know that P&R software sucks, but the marketing types keep pushing the "big green button". Maybe a few examples showing how much crap you have to enter to get reasonable placement would help either convince people that floorplanners are important or get them to work on the placer. So I'll add another metric to my list above. If you use a GUI floorplanner, how long does it take you do enter the necessary info? How many key/mouse actions? Other people can add their times to show their skill or poke fun at horrible GUIs. >3) performance and density is affected by use of architecture specific features. ... This gets interesting. How much of this can/should a compiler recognize? How much of this is appropriate use of macros or IP cores or ??? (an example would be a multipler) I guess I think both/anything is reasonable. >4) What is a representative design? You suggest a barrel shift, ... I wasn't trying to focus on a barrel shifter. It was just an example. My hope was that a RISC machine would be big/complicated enough so that it would include enough things to demonstrate many of the special features offered by various vendors/architectures. Of course, if any of that works then we get to (try to?) do it all over again with a new/different problem. -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z