Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Kolja Sulimma wrote: > Bevan Weiss wrote: >> Getting single cycle high speed multipliers is a very challenging >> prospect, and one which much research is still ongoing. > Actually, if you cannot do full custom circuit optimizations > (e.g. because you do standard cell design or because you are using > LUTs in an FPGA) swapping wires is the only possible structural > optimization. All other multiplier transformations can be reduced to swaps. > > An extremely nice property of swapping wires is, that it can be done > after placement. This is such a huge advantage that we were able to beat > sophisticated multiplier generators with a simple greedy algorithm when > applying it after placement: > http://eis.eit.uni-kl.de/eis/research/publications/papers/iccd04.pdf > I was referring to custom design, not the use of standard cells or FPGAs. It is certainly obvious that if you can't design your cells from scratch then you're just arranging the cells that you have available. I'm not sure if it can be reduced to swapping wires however, though certainly in FPGAs where the entire logic design is already laid out and the only configuration possible is via routing changes then this is the case.Article: 90826
Antti, Thanks for all your input. I have done as you suspected, and I have downloaded an evaluation version of ChipScope. And now I have one more datapoint. ChipScopePro can not find the JTAG chain, but it can find the USB box. It reports that a connection has been made to Platform USB and then reports... ERROR: Failed detecting JTAG chain. So, at least at this point I am VERY suspecious a communication issue from USB to JTAG. Does anyone know if there's some way to snoop the communication inside the Platform Cable?Article: 90827
If your .dat file is simply the binary data, convert it straight to .hex, no need for .bit. Get the .hex format from Wotsit's Format, & write a half-page of C to convert (may be even less, depending on exactly what your .dat is). Subhasri krishnan wrote: > Hi all, > I have this xess board which has a tool to initialise an SDRAM with > data. I have 16-bit numbers that I want to load into the SDRAM. The > tool needs .hex/.mcs format and I read that I need the promgen to > convert from .bit to .hex. > > My question is how to convert from this .dat file (which is my input > data file) to the .bit file. Does anyone know of such an utility? > Otherwise can someone explain wat the .bit file contains? > > Thanks >Article: 90828
>I don't know how they do it this days, but I do know that with a >whole shitpot load of adders, you could do it in n propagation delays, >where n is the width of whichever operand you arrange to come in >sideways. I almost drew a schematic. You have a set of adders as >wide as operand "A", and its inputs are operand "A" and the "latest" >partial product - and its outputs go to another bank of adders whose >other inputs are either "A" again or 0, and so on - the other operand, >"B", would be presented down the side of the array, deciding which >partial products get added to and which don't. The LSB, of course, >gets sent out as "product", and the carry is the MSB of the next >partial product. They form a parallelogram. Wouldn't it go faster (log N) if you used a tree rather than a long skinny chain? -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.Article: 90829
"Jim Granville" <no.spam@designtools.co.nz> wrote in message news:43583dc8$1@clear.net.nz... > langwadt@ieee.org wrote: >> Tim Wescott skrev: >> >> snip >> >>>The original question was for an under-$2 DSP chip capable of doing >>>audio frequency stuff, including FFTs. I'm not the fellow who asked; it >>>just sparked a tangential thought in my head about why there isn't some >>>intermediate step on the way to a full-speed DSP. >>> >> >> >> I never have to buy stuff so I don't know anything about prices, but >> philips recently announced a couple of 70MHz ARM7TDMIs in the 2$ >> range, it's not DSPs but at 70MHz and one cycle per 8bits of >> 32*32->64bit multiply it'll do some dsp > > TI have just volume-released their 100MHz FLASH controllers, start at sub $5, > so not quite a $2 target, but these have FLASH(not ROM) and include 12 bit > 6Msps ADCs, a 150ps resolution PWM, and CAN bus > > 150ps PWM is a challenge even for FPGA .... > > http://focus.ti.com/docs/pr/pressrelease.jhtml?prelId=sc05231 > > The sub $2 Philips devices have quite low code sizes, but they could do some > 'audio frequency stuff'... Wavefront (formerly Alesis Semiconductor) has an audio-specific DSP. It is <$4 for 50 MIPS. http://www.wavefrontsemi.com/index.php?id=11,13,0,0,1,0 Also, if you open up most any DSP-based Behringer product or many of the cheap DSP-based stomp-boxes, you will find an obsolete 24-bit TI DSP that apparently was never sold in the US. I don't know what they cost, but it must not be much given that a lot of that gear retails for <$100. Unfortunately, you and I can't obtain those parts, AFAIK.Article: 90830
hi, sure, that is exactly what im doing ( using the IOBUF) . i just explained the structure of IOBUF, as it looks like one OBUF and one IBUF connected together. the problem with the IOBUF is translation error "ERROR:924". ( which made me to try various other combinations. ) . i dont even understand that error and there are no xilinx answers associated with it. CMOSArticle: 90831
But i just can do that because i am building this to imitate ARM pipeline.Article: 90832
I'm not familiar with ARM cores, so I can't help you emulate their architecture. My advice would be to read datasheets for ARM cores, if they're available, but they're probably not. It seems to me that if they gave away their architecture (the world knew it) and they already "give away" their instruction set they would have no product anymore. I suppose their implementation is probably pretty good, but still architecture is a major part of design. My question for John though is what do you mean by a threaded architecture? I don't see how adding a second core will make the first one run twice as fast. It seems to me each "thread" needs to be independent enough to have its own pipeline. If each has its own pipeline, register forwarding and hazard logic are going to be needed. If you didn't have those I suppose you could just bubble the pipeline, but that seems pretty wasteful. The big problem I see with multi-core FPGA based processors is that it's very easy to be memory bound in an FPGA. Fetch from an SDRAM is only so fast. I know you can put several of them in parallel to improve performance and I suppose that would do it, but the limits are definitely close without some good caching schemes. Unfortunately, it seems that associative caches are very expensive to implement in an FPGA. -ArlenArticle: 90833
On Tue, 18 Oct 2005 14:13:54 -0400, Ray Andraka wrote: > John McCluskey wrote: > In principal, anything you can parse in >>VHDL is fair game, although in practice, I've found the file IO a little >>fragile, especially when dealing with access types. Read the XST >>documentation to see how it's done. >> >> > Cool! I didn't know that they had actually implemented it. Now if > Synplify would follow suit, it would > surely get filtered into the other fpgA tools over time. The write > seemed like a logical extension, although > I'm not sure how useful it is beyond writing a serial number or key to a > file at compile time. Hmm, it > may be a (albiet, kludgey) way to pass a propagation delay or latency > back up to a higher level in the design hiearchy. Alas, I've thought of this already, and it will only work for iterated synthesis. If you think about it for a second, you'll realize that the language semantics force elaboration from the top of the hierarchy down to the bottom. Generics have to be fully calculated before the subcomponents can be elaborated. If you have a generic parameter which depends on a value created during elaboration of the subcomponent, then a dependency loop will be created. This is not necessarily a bad thing, but the programmer will have to be aware of it, and make sure to stop the iteration of the elaboration of the subcomponent (at some point). I'd really like to work out a design flow where this is possible, but right now, this will require scripting a control structure to iteratively call the synthesis tool to compile the subcomponents with specified generic parameters. I think that right now, XST doesn't accept top level generics as command line arguments :-( Bottom line: It's really tough to write code that explicitly supports multiple topologies that can automatically be explored at compile time to close timing. VHDL needs to be extended to support design space exploration, as well as to support physical timing feedback into the elaboration control structures. It'll probably take the rest of the decade to get something going with Accelera along these lines. John McCluskeyArticle: 90834
The real problem here is that the H & P comp arch bible and most books that repeat the same material don't teach anyone today how to do anything that doesn't look like a DLX so tough if you have to figure it out yourself. This is esp true for MTA design, a much overlooked technique. Now MTA (multi threaded architecture) isn't even new, it goes back to the 50s in previous century (the one that starts with 1). The idea is really simple, very familiar to DSP people who do a lot of transposition between parallel & serial DSP, bit wise v word wise. I will elaborate a simple design that works well for me at 300MHz in V2Pro and still uses only < 500 FFs LUT sites (not the 1000 typically needed for 32b work) but not complete in some opcode decodes. I use a single Blockram to hold 4 sets of state, 128 by 32 bit words each. That is further split for each thread, half for register file and half for ICache. The RFile therefore gives 64 regs, and the ICache or queue is 128 16 bit opcodes (or 64 32b opcodes or 256 8b opcodes).. Please don't ever do 8b opcodes! The primary controller is a 3 bit counter counting through 8 states, b0 is used for odd,even for each instruction slot. B1,2 used to distuinguish which thread is in effect. The odd,even bit lets me do 32bit math over 2 clocks 16b at a time. It also lets me get 2 operand reads and a later write back paired with an early opcode fetch in 2 cycles so its 4 way ported. These reads, writes, Ifetch are for 3 different threads though. All Blockram accesses are 32b wide. The datapath takes 32b every other cycle for x,y inputs and 5 clocks later returns z 32b result on opposite phase to Blockram. At same phase, another opcode pair is fetched. The big bang is that the design clocks at the limit of the Blockram, or 16b add or 3 LUTs of logic which is about 2x faster than the usual 32b flat single thread pipelines. The usual instruction decision logic that is often crammed into 1 pipeline, now straddles 8 pipelines so very little logic needed between pipes. Now thread i+0 reads data operands in clock t0 but writes results back at t5 and later in next opcode for that thread reads operands in t8, same for t16 & so on. Thread i+0 uses t0,t5, t8,t13 etc for reg reads & writes Thread i+1 uses t1,t6, t9,t14 etc. Thread i+2 uses t2,t7, t10,t15 etc Thread i+3 uses t3,t8, t11,t16 etc So all threads stay out of each others way, no interlocks, no forwarding, no hazards, no branch prediction, but 4 thread states. I missed out alot of detail, hey you have to figure this out on own nickel if you want this sort of design. The cond codes, PC and other cpu state regs will exist 4 times, these can use Srl16s, a DP ram or a barrel wheel of 4 states moving on mostly 1 phase. The ARM is a problem period, you tend to get chased or sued if you get anything done esp if any intent to give away or resell. I don't think it is that great anyway, copying any cpu designed for VLSI into FPGA leaves bad taste. Instead use own opcode set and look at Jan Gray's site for Lcc hints to port compiler etc. As for associative caches, doing things the regular way with 1 or 2 way set assoc is very expensive, instead I use hashing and that makes things look very associative. I also expect to use RLDRAM but thats another story. One nice thing about 4 way and esp 2 phase design is that every opcode takes 8* 1 or sometimes 2,3 actual cycles. RLDRAM can clock at 300MHz and has latency of 8 cycles per threaded bank. So my DRAM is faster than my min opcode sequence for load,store so I don't need DCache. The ICache is there to help the much more predictable I flow but isn't really associative since its just a queue of opcodes near PC value. All 4 threads over many processor copies see their own private DRAM shared in 1 device. As for multi core, this design is intended to be replicated a few times to combine with 1 MMU dispensing RLDRAM bandwidth amongst 4N threads. Since there is no memory wall, each thread compares with a scaler x86 at 2GHz/8 /4 so 8 PEs comes out about same. Deal with 4N threads and no cache misses or deal with broken serial model that dare not miss any cache. The SDRAM is not actually too slow, it is only 2-3 x slower than RLDRAM as latency goes, the problem is it has no concurrency so only 1 bank in flight v 8 so RLDRAM gets 20x more work done. Threaded DRAM goes with threaded processor. Think I said enough for now. JohnArticle: 90835
Subhasri krishnan wrote: > Hi all, > I have this xess board which has a tool to initialise an SDRAM with > data. I have 16-bit numbers that I want to load into the SDRAM. The > tool needs .hex/.mcs format and I read that I need the promgen to > convert from .bit to .hex. > > My question is how to convert from this .dat file (which is my input > data file) to the .bit file. Does anyone know of such an utility? > Otherwise can someone explain wat the .bit file contains? > > Thanks PROMGEN is only used to convert Xilinx bitstreams into a .hex/.mcs format that can be loaded into a flash device. Since you have data (and not a bitstream), it would take less effort to convert it directly to .hex/.mcs rather than trying to convert it to the .bit format and then use PROMGEN. Here is the spec for the .hex/.mcs file format: http://www.xess.com/faq/intelhex.pdf. If your data file is "large" (i.e. more than 64K), then you will have to insert page address records into the .hex/.mcs file since its standard data record is limited to a 16-bit address. It may be easier to convert your data to the .xes format that is also supported by the GXSLOAD/XSLOAD tool. Here is a description of the .xes format: The .xes file formats are simple. Each line is a data record. Each data record is structured as follows: - An initial letter indicates the length of the starting address for the data: '-' indicates a 16-bit address is used '=' indicates a 24-bit address is used '+' indicates a 32-bit address is used - Next, a two-digit hexadecimal number indicates the number of bytes in the data record, N. - Next, the starting address for the data is given as a 16, 24 or 32-bit hexadecimal number. - The remainder of the record is composed of N two-digit hexadecimal numbers for the data. - There is no checksum. Here are some example data records in the XES-16, XES-24 and XES-32 file formats: - 10 0000 83 2C 4F 88 F2 2B B3 39 7E 1F 15 63 46 5E FB 89 = 10 000010 C4 A5 C4 C7 D2 26 A0 50 58 EA 85 66 9B C9 EE DE + 10 00000020 DD AC C2 94 63 5B 33 D3 6A 76 FA 20 36 F5 BC 68Article: 90836
dear all I have a question on how to get maximum clock frequency of real hardware. i am using XST and ISE6.3 Cnsider following data is obtained from RTL synthesis in XST ------- Minimum period: 6.608ns (Maximum Frequency: 151.332MHz) Minimum input arrival time before clock: 4.990ns Maximum output required time after clock: 3.442ns Maximum combinational path delay: No path found ------- Problem is that those information are just an estimation. So I am trying to getting information after Place and Route. What I am doing is to put following constraint in UCF file ----- TIMESPEC "TS_clk" = PERIOD "clk" 6.608ns HIGH 50 %; ----- Is it a right way to get Max. frequency ? If not, let us know some rule (of thumb)....:) Thankyou in advanceArticle: 90837
The DSP is a TMS57002, which up until now is not obsolete. It's a 24 bit fixed point DSP whichis sold in the US also used by Line6, Zoom and others. Behringer is using it on older designs but recently I saw much of the Motorla 56364 and very powerful Shark processors in their products. Also I heard they designed their own DSP which are used in the current stomp boxes.Article: 90838
Pasacco wrote: > dear all > > I have a question on how to get maximum clock frequency of real > hardware. i am using XST and ISE6.3 > > Cnsider following data is obtained from RTL synthesis in XST > > ------- > Minimum period: 6.608ns (Maximum Frequency: 151.332MHz) > Minimum input arrival time before clock: 4.990ns > Maximum output required time after clock: 3.442ns > Maximum combinational path delay: No path found > ------- > > Problem is that those information are just an estimation. So I am > trying to getting information after Place and Route. > > What I am doing is to put following constraint in UCF file > > ----- > TIMESPEC "TS_clk" = PERIOD "clk" 6.608ns HIGH 50 %; > ----- > > Is it a right way to get Max. frequency ? Close. Before the above line, you probably need to add NET "clk" TNM_NET = "clk"; Have fun, MarcArticle: 90839
>> 150ps PWM is a challenge even for FPGA .... Seems reasonable to me. Use the DCM clock shifter to get a fraction of a clock cycle. 10ns/256 is 40 ps. I can't quite understand the fine print well enough to work out a design on the fly. Maybe Peter will take it as a challenge. -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.Article: 90840
Hi Pete, Check out the Xilinx Gigabit System Reference Design (GSRD) at http://www.xilinx.com/esp/wired/optical/xlnx_net/gsrd.htm It uses the Embedded Virtex4 TEMAC in an EDK project. Paul Pete wrote: > > Hello > > I want to do a little EDK design that uses the embeded Tri-mode Ethernet MAC > (TEMAC) of the Virtex4 FX parts. EDK offers several options for Ethernet MAC > type but they are all soft MACs. The embedded MAC is a major selling point > for me because of the logic saved and because compiling the soft MACs takes > a long time. I will be connecting to a 10/100 switch using the MII port. > > Is there a convenient way to incorporate the embedded MAC into an EDK > project? > > Thank you for any suggestions. > > Pete DudleyArticle: 90841
Hi Lionel, Sounds like you could use the new Xilinx PlanAhead floor planning tool: http://www.xilinx.com/ise/optional_prod/planahead.htm Since you are a doctoral student, contact the Xilinx University Program (http://www.xilinx.com/univ/) to see if they could donate it to you. This looks to be exactly like the sort of application that Plan Ahead was developed for. I would be interested in hearing about the results here on c.a.f. should you go that route. Good Luck! Paul Kolja Sulimma wrote: > > Mike Lewis wrote: > >Lionel Damez wrote: > > > Is it possible to do that with the EDK interface, or do I have to export > > > my design to Projet Navigator(ISE)? > > The tool is telling you that 16 won't fit ... I doubt there is much you can > > do other than reduce the number of instances of the processor. > > > > Mike > > - The logic for 16 microblazes fits. > - A single microblaze is routable. > - The communication between the prozessors is systolic. > > From that information I would say that floorplanning is very > likely to yield a routable design. > This means that you tell the placer beforehand were in which > reagion of the chip it should put each processor and the corresponding memory. > > You can not do that from EDK AFAIK. > > Kolja SulimmaArticle: 90842
The "Create and Import Peripheral Wizard" (chapter 4: http://www.xilinx.com/ise/embedded/est_rm.pdf) is a good starting point. Use the skeleton files created as a starting point for your own custom core. An example device driver to access the custom core from the procossor is generated as well. Paul Athena wrote: > > Hi all, > > Thank you for all your suggestions. > > I have both ISE and EDK7.0.1. At first, I don't have any conception about how to connect my IP CPRE with the plb or opb bus. I just know one side is the ip core written in VHDL, and the other side should be some driver written in C. > > I have read the drivers of Uartlite and spi, but I still couldn't be clear about how to connet the two sides. How to get the .tcl files? > > The articles suggested by Kunal are very good. I am reading them at present. I will discuss my thinking with you later. > > Thanks to all of you! AthenaArticle: 90843
yes i added of course as below. I find that 5% - 15% performance degradation before PAR and after PAR... Wondering how to reduce the gap.... Thankyou. --------V2pro NET "clk" LOC = "AJ15"; NET "clk" TNM_NET = "clk"; TIMESPEC "TS_clk" = PERIOD "clk" 6.608ns HIGH 50 %; NET "rst" LOC = "AE10" ; NET "out1" LOC = "AB4" ; -----Article: 90844
Pasacco wrote: > yes i added of course as below. > I find that 5% - 15% performance degradation before PAR and after > PAR... > > Wondering how to reduce the gap.... > Thankyou. > > > --------V2pro > NET "clk" LOC = "AJ15"; > NET "clk" TNM_NET = "clk"; > TIMESPEC "TS_clk" = PERIOD "clk" 6.608ns HIGH 50 %; > NET "rst" LOC = "AE10" ; > NET "out1" LOC = "AB4" ; > ----- With no information on what is failing timing, I can only make general suggestions: There are effort levels in PAR - have you played with those? Timing based mapping also makes a big difference. If those don't get you anywhere, you may need to reduce levels of logic. Good luck, MarcArticle: 90845
I think you have to set path: EDK -> Option -> Projekt Option -> HDL and simulation > hi,, > > I am trying to generate simulation for a edk project. I download the newest MXE files from the xilinx.com and ran Generate Simulation HDL files. > > I get the follwing error msg > > C:\EDK\hw\XilinxProcessorIPLib\pcores\microblaze_v4_00_a\hdl\vhdl\microblaze _ isa_be_pkg.vhd is distributed by Xilinx encrypted and will not be read by any simulator. Please use compedklib to setup the EDK precompiled libraries and provide the path to them using the -E switch. > > But when I try manual compilation using the compedklib, it says for ModemSim XE, use the precompiled libraries.. > > Any suggestions to get around this? > > My setup is EDK7.1.2 + Modelsim Xe starter 6.0a > > Cheers ShakithArticle: 90846
On a sunny day (22 Oct 2005 03:16:49 -0700) it happened devb@xess.com wrote in <1129976209.609974.106780@g44g2000cwa.googlegroups.com>: - An initial letter indicates the length of the starting address for >the >data: > '-' indicates a 16-bit address is used > '=' indicates a 24-bit address is used > '+' indicates a 32-bit address is used > >- Next, a two-digit hexadecimal number indicates the number of bytes in >the data record, N. > >- Next, the starting address for the data is given as a 16, 24 or >32-bit >hexadecimal number. > >- The remainder of the record is composed of N two-digit hexadecimal >numbers for the data. > >- There is no checksum. > >Here are some example data records in the XES-16, XES-24 and XES-32 >file >formats: > >- 10 0000 83 2C 4F 88 F2 2B B3 39 7E 1F 15 63 46 5E FB 89 >= 10 000010 C4 A5 C4 C7 D2 26 A0 50 58 EA 85 66 9B C9 EE DE >+ 10 00000020 DD AC C2 94 63 5B 33 D3 6A 76 FA 20 36 F5 BC 68 So, in that format the '-' '=' and '+' prefixes are redundant, as leading zeros in the address show how many bits the address space is. Clever format! _________________________________________ Usenet Zone Free Binaries Usenet Server More than 140,000 groups Unlimited download http://www.usenetzone.com to open accountArticle: 90847
Hi all, I'm looking for proper tools to write on a DG834GT modem. I've been using debricks soft, but for some reason I can't make it write only read the content of the broadcom chip. Anybody can help out here ?Article: 90848
I understand the basic idea. I can see how it solves a lot of problems because the time between cycles for an individual thread is long enough that you don't have to deal with forwarding or hazards or branch prediction or anything like that. Each thread is something of a multicycle architecture. Unfortunately it seems that a multi-threaded architecture definitely needs a new programming paradigm. I don't think your standard C program would map well onto that. (If you were running 4 C programs, however, I could see it working quite well). But I suppose that is a different sort of problem to face. Thanks for the info. I may very well look into an architecture like this at some point. -ArlenArticle: 90849
Hi pls see the errata of that particular device. we had a similar problem, which was due to the power supply sequencing,1.2 V should not be the last one to come up, this is applicable only for some particular batch of spartan-3 chips from their new fab.
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z