Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
I'm pretty sure I'll be time-multiplexing the register file. I at least have to multiplex the write port. Depending on timing and space constraints, I may multiplex the read ports, clone the register file, or some of both. This shouldn't be _too_ much of a problem. The Distributed RAM in the Xilinx FPGAs is pretty fast, and the register file shouldn't be on a critical path. The register file write data will come straight from the Re-Order Buffer with little or no logic in-between. The register file read data however will likely have more logic between it and a flop. If multiplexing the read ports puts it on the critical path, then I can always clone the register file for each read port. It's most likely that the bypass logic will be on the critical path, so this shouldn't be an issue. (on second thought: I may need to register the output of the register file as the output will essentially be clocked 4x the system clock...) I think that's the only data structure that I will need to time-multiplex. I've been analyzing the various data structures for instruction scheduling and it looks like I will be able to neatly partition their write ports into segments based on instruction issue slots. This will take up marginally more logic than regular Distributed RAM (just a couple LUTs if it's placed well and uses built in MUXes), and will be about the same speed. On the other hand, the write-back bus will need to properly route results to the correct partitions, potentially limiting throughput.Article: 102026
Jim Granville schrieb: > Bringing this back into the FPGA domain: > > The idea is to build the closest thing a FPGA fabric allows. Use the > routing path-lengths to dominate the delays, and place the (series) > buffers only sparingly. > The result should be a Physical Ring Osc, where the Physical > ring dominates, and thus gives better precision. > With each FPGA generation, the buffer effects will decrease. > 65nm FPGAs are in the labs now ? Hmm. Probably currently the RC-Delay of the wires will dominate. At least R is not well controlled. But for large chips you start seeing inductive effects inside large ICs, and then you might be right. KoljaArticle: 102027
I actually did build a CPU for pure MHz speed. It was a super fast dual-issue cpu, but in order to get the high-clock rates, I had to make some serious trade-offs. Number one tradeoff, the execution stage is two stages and there is at least one delay slot after every instruction before the result can be used. This version runs at 150MHz. I have another version with not much bypass hardware that makes it up to 180MHz. But with three delay slots and only 8 registers per issue slot, scheduling becomes a major issue. Number two: 16-bit architecture. Addition actually takes a long time using ripple-carry in an FPGA, and there's reall no way around it. 16-bit is pretty easy to add, so that's what it gets. It's also 16-bit to cut down on utilization. Number three: Some arithmetic instructions are split in two. For example, shift instructions and 32-bit addition is split into two instructions. I simply could not afford the logic and delay of doing these with one instruction. Number four: 16 bit addressing. Same deal with addition, it takes too long, and i don't want to extend the delay slots any further, so I have 16 bit addressing only. Also, instruction sizes were 16 bits to cut down on logic and keep things "simple". So besides being a total pain in the butt to schedule and program, it really is rocket-fast. It is at it's very worst 2 times faster than a 32-bit pipleined processor i designed, and at it's best, it is 10 times faster. With decent scheduling and a superb compiler or hand coding, it should be able to sustain 5-8 times faster. The other advantage is that I could put 12 of them on a spartan 3 1000. Theoretically, I could get the performance of a really crappy modern computer with these things. And now I come back to reality. It's such a specialized system, and the memory architecture, ISA and all around whole system is a mess. Yes, it's super fast, but so what? I would be so much better off just designing custom pipelined logic to do something rather than this gimp of a cpu. So that's why I'm designing a "modern" processor. It's a general purpose CPU that could run real software such as Linux. It's that much more useful ;) Also, anyone can create a fast, simple processor. What's important is proper balance. I do agree that OOO and all that is not suitable for an FPGA. But it sure is fun!Article: 102028
The assumption may have been made (but not thoroughly communicated) that shifting would be through SRLs to deliver the max power. Transition of every *register* from 1s to 0s would be one heck of a strain but perhaps not as much of a strain as half that many SRLs (SLICEM vs SLICEL these days) feeding an all-1 to all-0 transition on the registers fed by those SRLs. "Austin Lesea" <austin@xilinx.com> wrote in message news:e3qvsb$rel14@xco-news.xilinx.com... > lenz, > > You really have to draw yourself a picture. > > I don't think anyone has really thought this through, unless they are > doing it in reality. > > For example, if I place a 1,1,1,1,... in a shift register, and clock it, I > get a transition from a 1 to a 1, and no charging or discharging, so no > current! > > If I place 1,0,1,0,1,0 ... in the shift register, then I maximize my > average dynamic current, as on every clock, I make a node change from 0 to > 1, or 1 to 0. > > If I place an isolated 0 to 1 transition I can see the effective impulse > response for a single transition of 0 to 1. This would have to be done > with all the DFF tied to the same D input, and not a giant shift register, > however. > > One has to examine how the skew across the global clock will affect the > outcome (nothing is really synchronous in reality - never all the exact > same phase). > > So, there are many experiments one can perform, and as Peter points out, > many of them are degenerate cases (unlikely to exist in reality). > > These are exacly the kinds of patterns we use in verification and > charaterization. And, we have been doing this for many years now. > > AustinArticle: 102029
Hi Clark, Anonymous wrote: > Has anyone used USB from linux on the ml-403 board? I'd like to get some > peripherals like usb memory or bluetooth adapter to work on it but usb is > not in the kernel they provide. The hardware does appear to be on the board > though. We (PetaLogix) recently ported the Cypress Reference drivers for the Cypress-EZUSB devices to the ML40x boards. We targeted MicroBlaze but they'll work on PPC (PetaLogix auto-config, not MontaVista) without difficulty. The drivers are in the uClinux public CVS tree. Regards, JohnArticle: 102030
John_H, I don't doubt that one can construct a ultra worst case scenario. We have done that in the past. Part configures, DONE goes high, and then after a few clocks, the part configures, DONE goes high.... You get the picture. The thump from the switching resets the entire power on reset circuit, and the part starts all over again. A very very expensive relaxation oscillator, whose output is the DONE pin. What is far more important is for the customer to know how much system jitter there will be for their pcb, their bypass network, and their bitstream. THAT is a real problem! Austin John_H wrote: > The assumption may have been made (but not thoroughly communicated) that > shifting would be through SRLs to deliver the max power. Transition of > every *register* from 1s to 0s would be one heck of a strain but perhaps not > as much of a strain as half that many SRLs (SLICEM vs SLICEL these days) > feeding an all-1 to all-0 transition on the registers fed by those SRLs. > > "Austin Lesea" <austin@xilinx.com> wrote in message > news:e3qvsb$rel14@xco-news.xilinx.com... > >>lenz, >> >>You really have to draw yourself a picture. >> >>I don't think anyone has really thought this through, unless they are >>doing it in reality. >> >>For example, if I place a 1,1,1,1,... in a shift register, and clock it, I >>get a transition from a 1 to a 1, and no charging or discharging, so no >>current! >> >>If I place 1,0,1,0,1,0 ... in the shift register, then I maximize my >>average dynamic current, as on every clock, I make a node change from 0 to >>1, or 1 to 0. >> >>If I place an isolated 0 to 1 transition I can see the effective impulse >>response for a single transition of 0 to 1. This would have to be done >>with all the DFF tied to the same D input, and not a giant shift register, >>however. >> >>One has to examine how the skew across the global clock will affect the >>outcome (nothing is really synchronous in reality - never all the exact >>same phase). >> >>So, there are many experiments one can perform, and as Peter points out, >>many of them are degenerate cases (unlikely to exist in reality). >> >>These are exacly the kinds of patterns we use in verification and >>charaterization. And, we have been doing this for many years now. >> >>Austin > > >Article: 102031
I'm working on a design that needs to be able to write to a DDR ram at 133MHz but only needs to read back the data at a slower rate. I thought I could greatly ease the design by slowing down the clock on reads to say 66MHz. This really opens up my read timing budget. Doing fast writes is easier because I can use a 90 degree shifted clock to drive the DQS lines. The problem is I'm not sure how to create a constraints file that enforces the timing required for different read and write clock speeds. Anybody have any ideas? I'm using a Spartan-3[E] and ISE 8.1. Thanks, David CarrArticle: 102032
Doesn't DDR have a system clock that runs a DLL? The physical interface will have to maintain the same clock. Your reads can come into DDR IOB registers in bursts of 2 rather than 4 or 8 allowing the input registers to be read at your slower clock speed. Look at the system level clocking that includes the DDR and the FPGA clocks. "DC" <dc@dcarr.org> wrote in message news:1147214548.342501.9200@v46g2000cwv.googlegroups.com... > I'm working on a design that needs to be able to write to a DDR ram at > 133MHz but only needs to read back the data at a slower rate. I > thought I could greatly ease the design by slowing down the clock on > reads to say 66MHz. This really opens up my read timing budget. Doing > fast writes is easier because I can use a 90 degree shifted clock to > drive the DQS lines. > > The problem is I'm not sure how to create a constraints file that > enforces the timing required for different read and write clock speeds. > Anybody have any ideas? I'm using a Spartan-3[E] and ISE 8.1. > > Thanks, > David CarrArticle: 102033
I did some of the same things as your 16b design but with some differences. I used 2 clock per microcycle at 300Mhz or so on V2Pro and <<200MHz on SP3 so that the BlockRam or a 16bit add were the critical paths or 3 levels of Lut logic in the control. The 32b ISA therefore gets 4 ports out of a BlockRam with a large register set upto 64 regs but encodes the ops in variable length 16b opcodes using prefixes to stretch address widths 3b at a time for 3 Reg fields or to set up a literal in 8b chunks. I was always partial to the old 9900 and 68000. The pipeline is quite long since it runs 4 way threads and therefore has no hazard or forwarding logic. The MTA logic allows the instruction decode to occur over many pipelines since only every 4th pair is per thread. The BlockRam also holds the instruction queue reminiscent of the 8086 per thread and the IFetch unit also has another tiny queue to reassemble 16b words into 1..4 x 16b final opcode. Since the PEs only run RRR codes in 2 clocks and average Bcc or Ld,St in 4 or sometimes 6 clocks, the branch latency covers the cc decode so no prediction needed. Bcc inside the queue are also 2 clocks. The Ld, St codes go out to a MMU that hashes 32b address with a 32b object id into RLDRAM on 8 word line blocks. The idea there is that the RLDRAM interleaved throughput can be shared over 10 or so of these PEs which leaves me with 40 odd threads. The RLDRAM 20ns latency is easily covered by the MTA pipeline so Ld,St have a slight Memory Wall effect over effectively the whole DRAM address space so no SRAM Dcache needed. So instead of a Memory Wall, I have a Thread Wall, use em or lose em ~40*~25mips. The MMU also implements the remainder of the Transputer process scheduler and DMA, message passing and links so thats where most of the real fun is, and much remains to be done. Ofcourse it only really makes sense if you want to run lots of processes as any Transputer user would. The basic version was done 25yrs ago so I guess its not a modern processor yet. Post your results when you have some, you might be the 1st to tackle OoO SS I haven't heard of anything else. John Jakson transputer guyArticle: 102034
Isaac Bosompem wrote: > Are you attempting a 100% hardware solution or are you doing a mix of > both hardware and PC software? Hi Isaac. Yes, as I mentioned to Tom elsewhere in this thread, I have written a 100% Verilog implementation of ECM (as well as Fermat's method and Pollard-Rho). It's runs completely standalone and is not connected to anything except power. It would display the answer (if found) on the development board's LCD display. I have no objection to using an FPGA as an accelerator for a program running on a conventional computer, but was hoping to fit the entire thing (ECM) on one or more FPGA development boards because that keeps things fast and simple. For me, it's at least as easy to code in Verilog as it is in a typical assembly language, so I see no reason to clutter things up by going off-board or adding a micro-CPU core. My reasoning is that eventually (hopefully within my lifetime, ha!) FPGA's will become huge and cheap, and I'm hoping that the LUT count of FPGAs increase faster than the performance of traditional computers. That hope may not be justified of course, because some of the same type of technology is used in both CPUs and FPGAs; however, there is no chance that I'll ever be able to afford a cluster of Opterons, but if I can find a way around the exorbitant prices FPGA vendors charge for proprietary software design tools (that can only be used with their own products no less), I could probably afford a fairly high performance FPGA. In any case, writing ECM in Verilog has been fun and I've learned a lot about Verilog. I now have a working Verilog ECM design, and I'll spend the time until development s/w gets cheap enough for me to afford by tweaking and improving the design. I've been forced to slow my design down to a crawl in order to try to get it to fit into something I can afford, but whenever FPGA's and the requisite design s/w become plentiful and cheap, I hope to be able to take full advantage of all the opportunities for parallelization and pipelining that ECM offers. Regards, RonArticle: 102035
That's pretty slick! I like it. I've got a blog that I write random ideas and notes about my projects online: http://bitstuff.blogspot.com You'll either find it incredibly dull, or it'll pique your interest. Most people I know could care less about this stuff, so I appreciate the dialouge and comments.Article: 102036
When you shift a 1010101 pattern, there is a lot of power consumption, but on each transition half the nodes go Low-to-High, and the adjacent other half goes High-to-Low. I call that benign. If you switch on every even clock cycle from 1111111 to 0000000 and on every odd clock cycle back to 11111111, you do not get the compensation effect, although the total average power consumption is the same. I hope this is clearer. Peter AlfkeArticle: 102037
"Luke" <lvalenty@gmail.com> writes: > Number one tradeoff, the execution stage is two stages and there is at > least one delay slot after every instruction before the result can be > used. This version runs at 150MHz. I have another version with not > much bypass hardware that makes it up to 180MHz. I must be confused. Since your two-stage 150 MHz processor needed a delay slot for all access to results, it must not have used any bypassing. So are you telling us that adding some ("not much") bypass hardware sped it up by 20%? That seems counterintuitive. > Number two: 16-bit architecture. Addition actually takes a long time > using ripple-carry in an FPGA, and there's reall no way around it. Actually, I've implemented wide carry-select adders using carry lookahead in the Spartan 3. So there is a "way around it", but it probably won't help for a 16-bit adder. EricArticle: 102038
Joel wrote: > I recently finished my Masters Thesis on Algorithm Acceleration in > FPGA. Part of my research and experiementation was running some > algorithms on the PPC405 core in V2PRO. I used both ISE/EDK 7.1 and > EDK 8.1 (and then developing IP in the FPGA and attaching to PLB). I > got software profiling to work using PIT and I used PLM BRAM memory for > storing the profiling information. Initially I ran into a lot of > problems, but eventually got it to work on 7.1. I was using latest ISE > SP and EDK SP for 7.1. Most of my research was infact done on ISE/EDK > 7.1, and towards the end I repeated same experiements using EDK 8.1. > Profiling worked on EDK 8.1 also for PPC406 using -pg gcc and setting > PIT for software profiling in software platform settings. Thank you for your response. I finally got profiling working with edk 8.1. But since you said edk 7.1 worked, I went back and tried it again, and it worked too! I think I mixed up the compilers since I have edk 6.3 and edk 7.1 installed on the same computer. Alan NishiokaArticle: 102039
I must not have been very clear. The 180MHz version had no bypassing logic whatsoever. It had three delay slots. The 150MHz version did have bypassing logic, it had one delay slot. I read up on carry lookahead for the spartan 3, and you're correct, it wouldn't help for 16-bits. In fact, it's slower than just using the dedicated carry logic.Article: 102040
Good point about the DLLs in RAM itself. In this application (a digital scope) I do all writes in essentially one long burst and then go back and read the aquired waveforms. I could potentially pause while the DLLs relock at the new clock rate for the reads. The physical interface to the DDR RAM itself is presenting the problem. At 133MHz you really need to use the DQS strobes to latch the data. Unfortunately in the Spartan 3, its difficult to use the DQS as a latch signal and as a result most Spartan 3 DDR designs only use the system clock for reads. -DCArticle: 102041
SongDragon wrote: > 1) device driver (let's say for linux 2.6.x) requests some (snip snip) > writes a zero to a register ("serviced descriptor"), telling the PCIe > device the interrupt has been fielded. > I have a number of questions regarding this. First and foremost, is > this view of the transaction correct? Is this actually "bus > mastering"? It seems like for PCIe, since there is no "bus", there is > no additional requirements to handle other devices "requesting" the > bus. So I shouldn't have to perform any bus arbitration (listen in to > see if any of the other INT pins are being triggered, etc). Is this > assumption correct? Your description of events is pretty much correct. The exact registers and sequencing will of course depend on your implementation of a DMA controller. You'll need a source register too unless the data is being supplied by a FIFO or I/O "pipe" on the device. "Bus mastering" is a PCI term and refers to the ability to initiate a PCI transfer - which also implies the capability to request the bus. In PCIe nomenclature, an entity that can initiate a transfer is referred to as a "requestor" and you're right, there's no arbitration involved as such. But this is the equivalent of a PCI bus master I suppose. The target of the request is called the "completer". This is where my knowledge of PCIe becomes thinner, as I'm currently in the process of ramping up for a PCIe project myself. But I have worked on several PCI projects so I think my foundations are valid. For example, using a (bus-mastering) PCI core you wouldn't have to 'worry about' requesting the bus etc - initiating a request via the back-end of the core would trigger that functionality in the core transparently for you. As far as your device is concerned, you have "exclusive" use of the bus - you may just have to wait a bit to get to use it (and you may get interrupted occasionally). Arbitration etc is not your problem. > In PCI Express, you have to specify a bunch of things in the TLP > header, including bus #, device #, function #, and tag. I'm not sure > what these values should be. If the CPU were requesting a MEMREAD32, > the values for these fields in the MEMREAD32_COMPLETION response > would would be set to the same values as were included in the > MEMREAD32. However, since the PCIe device is actually sending out a > MEMWRITE32 command, the values for these fields are not clear to me. This is where I'll have to defer to others... Regards, -- Mark McDougall, Engineer Virtual Logic Pty Ltd, <http://www.vl.com.au> 21-25 King St, Rockdale, 2216 Ph: +612-9599-3255 Fax: +612-9599-3266Article: 102042
Mark McDougall wrote: > SongDragon wrote: > >> 1) device driver (let's say for linux 2.6.x) requests some BTW if you're writing Linux device drivers as opposed to Windows drivers, you're in for a *much* easier ride! :) Regards, -- Mark McDougall, Engineer Virtual Logic Pty Ltd, <http://www.vl.com.au> 21-25 King St, Rockdale, 2216 Ph: +612-9599-3255 Fax: +612-9599-3266Article: 102043
DC wrote: > Good point about the DLLs in RAM itself. In this application (a > digital scope) I do all writes in essentially one long burst and then > go back and read the aquired waveforms. I could potentially pause > while the DLLs relock at the new clock rate for the reads. The > physical interface to the DDR RAM itself is presenting the problem. At > 133MHz you really need to use the DQS strobes to latch the data. > Unfortunately in the Spartan 3, its difficult to use the DQS as a latch > signal and as a result most Spartan 3 DDR designs only use the system > clock for reads. > > -DC By matching the DDR clock to the data I'm convinced that the read interface is doable without using the DQS though lane-to-lane skew matching has to tighten up to achieve that goal. Generating the DDR clock with the FPGA and routing a copy of that clock out to your memories and back will give you a matched round trip such that the clock copy can return to the FPGA at the same time as the read data would return. I started the design process and got great timing results but never got to where working silicon showed all the timings were precise. So I'm convinced it's straightforward, I'm just not certain of the numbers I was achieving.Article: 102044
> > The huge DRAM and the fast FPGA seem to make this board ideal for video > > and sound processing. But.. why on earth does the board come with a > > 3bit VGA output? We do not live in the 80ies anymore. Adding a couple > > of resistors to get 8 or 12bit color resolution would hardly have > > changed the BOM. Even a video DAC is not expensive. > > It would, however, have used up more IO pins. I don't know if that was > a consideration, but they do seem to share pins for the devices on the > SPI bus. > > Does anyone know if doing PWM on the VGA output pins would cause Bad > Things to happen to a typical monitor? In some ways this is very similar to dithering. I already used ordered dithering to improve the color resolution from 8 to 14bit on another board. Unfortunately 3bit color resolution is extremely low to start with. One way out could be to use a PAL/NTSC encoder. Is there any free core out there?Article: 102045
Luke wrote: > I must not have been very clear. The 180MHz version had no bypassing > logic whatsoever. It had three delay slots. The 150MHz version did > have bypassing logic, it had one delay slot. > > I read up on carry lookahead for the spartan 3, and you're correct, it > wouldn't help for 16-bits. In fact, it's slower than just using the > dedicated carry logic. I also used a 32b CSA design in the design prior to one described above. It worked but was pretty expensive, IIRC it gave me 32b adds in same cycle time as 16b ripple add but it used up 7 of 8b ripple add blocks and needed 2 extra pipe stages to put csa select results back together and combined that with the CC status logic. The 4 way MTA though just about took it without too much trouble but it also needed hazard and forwarding logic. Funny thing I saw was the doubling of adders added more C load to the pipeline FFs and those had to be duplicated as well and so the final cost of that 32 datapath was probably 2 or 3 x bigger than a plain ripple datapath and much harder to floorplan the P/R. What really killed that design was an interlock mechanism designed to prevent the I Fetch and I Exec blocks from ever running code from same thread at same time, that 1 little path turned out to be 3 or 4 x longer than the 32b add time and no amount of redesign could make it go away, all that trouble for nout. The lesson learned was that complex architecture with any interlocks usually gets hammered on these paths that don't show up till the major blocks are done. The final state of that design was around 65MHz when I thought I would hit 300MHz on the datapath, and the total logic was about 3x the current design. Not much was wasted though, much of the conceptual design got rescued in simpler form. In an ASIC this is much less of a problem since transistor logic is relatively much faster per clock freq than FPGAs, it would have been more like 20 gates and ofcourse the major cpu designers can throw bodies at such problems. I wonder how you will get the performance you want without finding an achilles heal till the later part is done. You have to finish the overall logic design before commiting to design specific blocks and it ends up taking multiple iterations. When I said 25MHz in the 1st post I meant that to reflect these sorts of critical paths that can't be forseen till your done rather than the datapaths. Thats why I went to the extreme MTA solution to abolish or severely limit almost all variables, make it look like a DSP engine and you can't fail. Curiously how do you prototype the architecture, in cycle C, or go straight to HDL simulation? Anyway have fun John Jakson transputer guy the details are at wotug.org if interestedArticle: 102046
Hi all, I am working on a chip design, where the frontend is interfaced to PCI bus and the backend has asynchronous FIFOs and a UART. I load the data into the FIFO at 33 MHz and then read it at 40 mhz and pass it to the UART module to transmit the data out. The design works great in simulation, but it is giving me entirely different results when i check it through a logic analyzer on the actual chip. I don't know what I am doing wrong. Please advise. Thank you SandeepArticle: 102047
Luke wrote: > I actually did build a CPU for pure MHz speed. It was a super fast > dual-issue cpu, but in order to get the high-clock rates, I had to make > some serious trade-offs. > > Number one tradeoff, the execution stage is two stages and there is at > least one delay slot after every instruction before the result can be > used. This version runs at 150MHz. I have another version with not > much bypass hardware that makes it up to 180MHz. But with three delay > slots and only 8 registers per issue slot, scheduling becomes a major > issue. > > Number two: 16-bit architecture. Addition actually takes a long time > using ripple-carry in an FPGA, and there's reall no way around it. > 16-bit is pretty easy to add, so that's what it gets. It's also 16-bit > to cut down on utilization. > > Number three: Some arithmetic instructions are split in two. For > example, shift instructions and 32-bit addition is split into two > instructions. I simply could not afford the logic and delay of doing > these with one instruction. > > Number four: 16 bit addressing. Same deal with addition, it takes too > long, and i don't want to extend the delay slots any further, so I have > 16 bit addressing only. Also, instruction sizes were 16 bits to cut > down on logic and keep things "simple". > > So besides being a total pain in the butt to schedule and program, it > really is rocket-fast. It is at it's very worst 2 times faster than a > 32-bit pipleined processor i designed, and at it's best, it is 10 times > faster. With decent scheduling and a superb compiler or hand coding, > it should be able to sustain 5-8 times faster. > > The other advantage is that I could put 12 of them on a spartan 3 1000. > Theoretically, I could get the performance of a really crappy modern > computer with these things. > > And now I come back to reality. It's such a specialized system, and > the memory architecture, ISA and all around whole system is a mess. > Yes, it's super fast, but so what? Well, to some users, that is important. > I would be so much better off just > designing custom pipelined logic to do something rather than this gimp > of a cpu. > > So that's why I'm designing a "modern" processor. It's a general > purpose CPU that could run real software such as Linux. It's that much > more useful ;) Sounds more like a microprocessor, whereas the first one is more like a microcontroller. There is room for both, so don't throw the first one away! With a small/nimble core, you have the option to deploy more than one, and in an FPGA, that's where soft-cpu's can run rings around other solutions. How much Ram/Registers could the 16 bit one access ? -jgArticle: 102048
In article <1147153450.028603.66700@u72g2000cwu.googlegroups.com>, JJ <johnjakson@gmail.com> wrote: > >Phil Tomson wrote: >> In article <1146981253.226901.102660@i39g2000cwa.googlegroups.com>, >> JJ <johnjakson@gmail.com> wrote: >> >I always hated that the PCI cores were so heavily priced compared to >> >the FPGA they might go into. The pricing seemed to reflect the value >> >they once added to ASICs some 10 or 15 years ago and not the potential >> >of really low cost low volume applications. A $100 FPGA in small vol >> >applications doesn't support $20K IP for a few $ worth of fabric it >> >uses. It might be a bargain compared to the cost of rolling your own >> >though, just as buying an FPGA is a real bargain compared to rolling my >> >own FPGA/ASIC too. >> >> That's why OpenCores is so important. (http://opencores.org) As FPGAs >> become cheaper we're going to need an open source ecosystem of cores. >> They've got a PCI bridge design at Open cores, for example. >> >> BTW: it would also be nice to have an open source ecosystem of FPGA >> design tools... but that's a bit tougher at this point. >> >> Phil > >Yes but open source and closed source are also like oil and water esp >together in a commercial environment. If I were doing commercial work I >doubt I'd ever use opencores but I might peek at it for an >understanding of how it might be done or ask someone else to. On a >hobbyist level, What's the hesitation? > well I have mixed feelings about gpl too. There are many more open source licenses besides gpl, though gpl is pretty commonly used. > I suspect the >software world does far better with it since enough people support the >gpl movement and there is a large user base for it. Hardware ultimately >can't be made for free so it can't be the same model. > Hardware itself cannot be made for free, however various cores (such as a PCI bridge that sparked this) can be created for free as it's pretty much the same process as software development: code it up in synthesizable HDL, simulate it to make sure it does what you want, synthesize it and try it out in an FPGA. Computers aren't free either, but there is plenty of open source software being created to run on them. PhilArticle: 102049
I have a fairly large Altera-based design that will soon be updated to Cyclone II and Quartus (from Flex10K and Max+II). Has anyone else been through this migration that would be willing to share any gotchas? Is the migration tool in Quartus worthwhile? Thanks, Keith
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z