Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
On Jun 24, 1:14 pm, "Amontec, Larry" <laurent.ga...@ANTI- SPAMamontec.com> wrote: > Antti wrote: > > On 21 Jun., 18:41, "Amontec, Larry" <laurent.ga...@ANTI- > > SPAMamontec.com> wrote: > > >>Antti wrote: > > >>>On 21 Jun., 17:47, cs_post...@hotmail.com wrote: > > >>>>On Jun 21, 9:54 am, Antti <Antti.Luk...@googlemail.com> wrote: > > >>>>>and too bad tha the "source code" published is 100% useless, as all > >>>>>the actual JTAG handling is hidden in an DLL and there is no source > >>>>>code available for it. > > >>>>Strongly disagree. > > >>>>First you can use it as intended. > > >>>>Then you can use the functions in the provided header file to > >>>>accomplish various other jtag tasks. > > >>>>And if you really want to understand it, well, you have a header file > >>>>for Larry's DLL, and Larry's DLL calls the FTD2xx.dll. So you make up > >>>>a fake version of the later, and see what a given trial call to > >>>>Larry's DLL produces in terms of FTD2xx operations... Yeah, reverse > >>>>engineering, but simpler than reverse engineering the xilinx stuff, > >>>>and people have done that! > > >>>eh there is absolutly no sense to RE Larry DLL's ;) > >>>its nothing magical to found there. > > >>>the "functions provided" did look like primitive replacement for > >>>something calles > >>>"command line parameter passing" - but well I only looked 2 minutes, > >>>maybe there > >>>is something more to see. But what I did see did look like useless. > >>>I would prefer just run from batch file, then using this customization > >>>API > > >>>Antti > > >>Electronic is not Magic but Logic. Only Physic is Magic! > >>True Random Number Generator is Magic and Physic but use some Electronics! > > >>It is very simple to talk and think about True Random Number but you > >>need more than 2 minutes for developing a True Random Number Generator! > > >>Bla - bla ... as your bla - bla Antti. > > >>Antti, you CANNOT take 2 minutes and then resume by a "too bad" and by a > >>"100% useless". > > >>Laurent- Zitierten Text ausblenden - > > >>- Zitierten Text anzeigen - > > > bla-bla, BTW how did you test that your DLL can handle "infinite" > > length chain? > > if I make SVF file that does masked compare and has single chain > > length of 33GBit, this would then be executed ok? 33G is defenetly > > less than infinite I think? > > > and if I look at 2 pages of source code, then 2 minutes can tell a lot > > already > > > Antti > > Our JTAG HAL (Amontec X Hardware Abstraction Layer) was designed for > infinite SCANS ! And it is ... > > We can do a 33 Giga Scan length via our AMTXHAL. Very easy. > If you have a REAL application with a 33 Giga bit please call me. And we > will try ! > > Yes two minutes can tell a lot for me two, but this is not enough for > publishing your comments as a "too bad" and by a "100% useless". > > Laurent- Hide quoted text - > > - Show quoted text - Larry Easy - if my comments are too heavy, well I am hard to please - and the fact that I have interest to check out your products and verify the claims made in your announcments should actually make you happy - it should generate some more interest in your products. Any extra publicity and public noise can eventually increase your sales. total silence and no replies is worse than hyper-critical comments. AnttiArticle: 121076
On Sun, 24 Jun 2007 12:12:25 -0700, Mike Treseler wrote: >commiserate about the general problem of where to >put cpu io registers/decoders that cover multiple modules. can I join in please? :-) >For now, I prefer to distribute a logical bus to >all the modules with the following fields. > >address, writeData, write_stb, readData, read_stb Which, interestingly, means that you are specifying your own bus protocol - albeit a (quite properly) very simple one - specialised to the needs of device registers. >I infer registers and decoders in the module >that uses them, and keep track of address >and bit allocations using global constants. Indeed. But how, pray, do you multiplex the various readback data values? Do you like to imply tri-state drivers, and hope that your synthesis tools will map the tri-states (across many instances) to muxes? Or do you, as I generally do, take care to ensure that all the data outputs are held to zero whenever they are not being read, and then OR them all together somehow? On one recent project I got so fed up with this problem that I arranged for each readable register to have both input and output ports for read-data: ... if (someone is trying to read from me) then my_register_read_data <= my_register_contents; else my_register_read_data <= (others=>'0'); end if; end process; read_data_out <= read_data_in or my_register_read_data; end architecture; Thus I get a long ripple-OR structure on the readback data, and the tools seem to do quite a good job of turning that into an OR tree. Address decoding is now localised to each register, and I don't need any global readback mux block. The one solution that I really DON'T like is to describe the muxes explicitly - that usually requires me to express the address decoding information in at least two places: at the register itself, for write enabling; and in the (horrid) global mux, for readback. And the mux block is extremely nasty to extend when you add some more registers. Any better ideas, anyone? -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.Article: 121077
Dear B79, <bwilson79@gmail.com> wrote in message news:1182750507.391910.307350@o11g2000prd.googlegroups.com... > I'm currently working on a design where there isn't an available GC > pin to bring a clock in (bad design, not mine). > What's bad about that? > > It's an 80MHz single- > ended clock, but the design requires an 80MHz and 120MHz clock, so I > clocks wrt the input. This design has an internal loopback mode in > which case the skew on these clocks should not matter because they are > no longer being used to clock in/out IOB data. Even in this mode, > intermittent failures are seen, which leads me to believe it's not a > skew issue, but rather just an intermittently dirty clock. > What sort of failures do you intermittently have? Describe what happens. Cheers, Syms. p.s. Are you the bwilson out of the Beach Boys? You were very good last year at the Bridge School Benefit! I expect you'll be adding "Surfin' FPGA" to your repertoire.Article: 121078
ok, thanks for your response. I hoped for a reply sooner, is this ng always dead on weekends? I found this zoom bug, now I managed to place an "and" gate. What do I do next, do I now compile the project?Article: 121079
We do an Ethernet module for a Raggedstone1development board. The only problem is we are out of stock on Ethernet Phy module. We expect these back in about 4 weeks. Raggedstone1 details here http://www.enterpoint.co.uk/moelbryn/raggedstone1.html. We also have other modules coming that will offer some high speed link capability to other boards. These will be announced as soon as we get them built and tested. John Adair Enterpoint Ltd. On 24 Jun, 18:10, PFC <l...@peufeu.com> wrote: > Hi guys. > First post to this newsgroup, let's pop the champagne ;) > > I need your advice on selecting a FPGA module for my needs. > > Anyway. I need to stream data from a PC to a device, and back. Around > 60-80 Mbps. I want to use Ethernet, 100 Mbps should do the job. > I need a FPGA somewhere too, to format the data, handle the IO, and do > some processing/FIR filtering. > > For now this will have two applications : streaming many channels of high > bandwidth audio (24/192), and instrumentation (DSO). More later. > > I have a working prototype with the Atmark Techno Suzaku FPGA module. It > handles the full 100 Mbps bandwidth (barely) but not in full duplex. > However I do not want to use this module in the final design : it isn't > really suited. > > After experimenting with the Suzaku which has a MAC chip, and an Atmel > NGW100 which has no FPGA but a decent CPU/MAC with smart DMA I have come > to the following conclusions : > > - I'd like to use the OpenCores MAC core to drive a PHY and DMA the data > to a fast memory buffer. > - The FPGA will then forward the buffered data to the IO channels in due > time > - I need a CPU, but it will not touch the data, only parse the packet > headers (I use UDP which is nice for this application). > - CPU will also handle other stuff like LCD and GPIO > - Microblaze is OK > - No OS will be used > > So, I'd need a FPGA module with the following : > > - 100BaseT Ethernet PHY connected to FPGA > - FPGA with enough gates to instantiate OpenCores MAC core, Microblaze, > some BRAM buffers, some FIR filters, etc. > - 1MB fast memory (SDRAM or SRAM) where I can put my buffers and also my > executable code. More is also good ! > - Should be small, and not radiate EMI like a radio station > - Max cost $200, cheaper is better obviously ! > - It should work ;) > > I'd like something like this : > > http://www.ixxat.de/powerlink_module_en,18116,5873.html > > But noone sells it apparently and where is the docs ? I dunno. > I could also use a module with a hard CPU if there is a fast access (ie. > bus) from CPU to FPGA. > I know the Xilinx tools, but I am not a Xilinx fanatic... > > So, what is your advice ? > I'd rather buy a module instead of having to build it...Article: 121080
Hi, You are of course copying the default project back to the FPGA that comes on the CD? DE1_USB_API.pof/sof ? (or whatever Altera call it on their CD, I bought mine direct from TerASIC) The control panel will 'see" the USB port, but will only "talk" to the default project (or something that emulates it). Red "mitshek" <noal@ajkl.com> wrote in message news:e3Afi.8490$c06.4187@newssvr22.news.prodigy.net... > I'm using Altera's "Cyclone II Starter Kit", and while the board seems to > work fine, I can't figure out one of the bundled utility programs. > > I installed the Cyclone II Starter Kit CD on my Windows/XP machine. I'm > able to use Quartus II 7.1 (web) to compile and then download projects to > the board -- That works fine. > > However, I cannot get the "Control Panel - Starter II Kit" application to > work. I launch it, and the application window comes up. It appears to work > fine -- I can 'open USB' connection to my board, then click on the various > tabs (sram, sdram, led, etc.) and toggle the controls. However, when I > click 'set' on anything, nothing happens. I see the blue-light on my Kit > board flicker momentarily, so it's getting some kind of traffic from the > PC. But otherwise, nothing happens. > > For example, I tried the 'SRAM' tab, and wrote several unique values > (0x0123, 0x4567) to the SRAM at address 0x00 and 0x01. But, when I read > them back, they always come back 0x0000. The Control Panel's built-in > self-test also fails on the sdram, sram, and flash. > > What am I doing wrong? I tried toggling the RUN/PROG switch (SW12) back > and forth, then cycling power to the board. With either position, Control > Panel applet still doesn't work. >Article: 121081
<hitsx@hit.edu.cn> wrote in message news:1182739659.854107.45200@e9g2000prf.googlegroups.com... >I have post an topic serveral days ago, and there is the link. > http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/ee8cd744f6c3c10b?tvc=2# > > The total computation is described below: > integer add 2442 Giga operations > float add 814 Giga operations > float substract 2424 Giga operations > float muliply 1610 Giga operations > float divide 809 Giga operations > > And I need these operations done in 1 ~ 3 minutes, so what kind of > FPGA is needed? And > should I use multiple FPGAs to finish the computation? Well, it depends on the amount of data to be processed, and whether it is possible to pipeline the computation. Could you describe the algorithm, without going into too many details? -Michael.Article: 121082
> I'm working with the xilinx corgen cic v3.0. I'm finding that to get a > decent rejection in the images (60 dB) I need about 4 stages. My input is > only 10 bit and I still end up with a 66 bit output, 50 of which are thrown > away. As a result my design won't fit in my device. If they are truely thrown away, it shouldn't be the cause of why you aren't fitting into the device. > 1. My coregen says it doesn't support V4 for the cic so I've been compiling > for V2. Seems like the DSP48 with the large accumulator is ideal for CICs? Why do you need DSP48s? Isn't the whole point of a CIC that it doesn't use multiplies? > 2. Looks like the exponential bit growith is from the number of stages. > Since noone uses more than 16 bits at the output why can't the output of the > first integrator be trimmed back to 16 bit before feeding the next and so > on? Aren't all the integrators cascaded together, then followed by all the combs? Cheers, JonArticle: 121083
On 6?25?, ??12?34?, Marc Randolph <m...@my-deja.com> wrote: > Howdy Perry, > Unfortunately your description doesn't really provide enough info for > people need to help you. You said that you looked at some signals > with a scope, but you didn't describe what you saw in detail. Solving > problems like this is ALL about detail: > 1. What voltage swing (and offset from ground) did you see? As I see from the scope, the voltages are not very stable. The voltage swing is range from about 800mv to 1.2v, However, I am not sure if I have got the correct answer to this question, for I am not familiar with using oscilloscopes at present. > 2. Is your .ucf file correct (pin numbers and voltage type set > correctly for both input and outputs signals?) yes, i have checked that. > 3. Exactly how do you have signals connected in the design (posting > the HDL for your clock tree is the best way to have that checked) Here is the instance of DCM, copied from the .mhs file of the EDK project: The input frequency is 100MHz, and clk_200mhz_s is the problematic output clock. BEGIN dcm_module PARAMETER INSTANCE = dcm_0 PARAMETER HW_VER = 1.00.c PARAMETER C_CLK0_BUF = TRUE PARAMETER C_CLK2X_BUF = TRUE PARAMETER C_CLKDV_BUF = TRUE PARAMETER C_CLKDV_DIVIDE = 2.0 PARAMETER C_CLKFX_BUF = TRUE PARAMETER C_CLKFX_DIVIDE = 1 PARAMETER C_CLKFX_MULTIPLY = 3 PARAMETER C_CLKIN_PERIOD = 10.000000 PARAMETER C_CLK_FEEDBACK = 1X PARAMETER C_DFS_FREQUENCY_MODE = HIGH PARAMETER C_DLL_FREQUENCY_MODE = LOW PARAMETER C_EXT_RESET_HIGH = 1 PORT CLKIN = dcm_clk_s PORT CLK0 = sys_clk_s PORT CLK2X = clk_200mhz_s PORT CLKFB = sys_clk_s PORT RST = net_gnd PORT LOCKED = dcm_0_lock PORT CLKFX = proc_clk_s PORT CLKDV = dcm_0_CLKDV END > Good luck, > Marc Thanks again for your kindly concern :-)Article: 121084
"Jonathan Bromley" <jonathan.bromley@MYCOMPANY.com> wrote in message news:20tu73do02v645fm8aoo57oqlltq225kh8@4ax.com... > On Sun, 24 Jun 2007 12:12:25 -0700, > Mike Treseler wrote: > >>commiserate about the general problem of where to >>put cpu io registers/decoders that cover multiple modules. > > can I join in please? :-) > Me too...:-) > > The one solution that I really DON'T like is to > describe the muxes explicitly - that usually requires > me to express the address decoding information in at > least two places: at the register itself, for write > enabling; and in the (horrid) global mux, for readback. > And the mux block is extremely nasty to extend when you > add some more registers. > > Any better ideas, anyone? > For readback, how about muxing (a case statement on the address) within each module that registers appear in, but include the 'Bromley zero' on the way out of the module so that you can 'or' all the module outputs together at the 'top level'. That way you get to utilise the purpose built muxes that come 'for free' in the CLBs, but keep the simplicity of the 'or' gate at the 'top level'. (BTW, I would hope that a good synthesis tool would build the 'or' with a carry chain.) Or maybe that's what you're suggesting. Cheers, Syms.Article: 121085
I'm currently developing a design for the XUP development board. The development software is Xilinx EDk 8.2 The system requires several frequencies. Power PC : 100 MHz PLB : 50 MHz User IP : 50 MHz, 2.5 MHz The EDK uses DCM_0 to divide the 100 MHz by 2. I use 2 cascaded DCMs to generate the 2.5 MHz (first divides by 2, second divides by 10). But when I want to generate the bitstream the following messages and errors occur: INFO:NgdBuild:889 - Pad net 'plb_bram_if_cntlr_1_port_BRAM_Clk' is not connected to an external port in this design. A new port 'plb_bram_if_cntlr_1_port_BRAM_Clk' has been added and is connected to this signal. INFO:NgdBuild:889 - Pad net 'board1_unit_0/board1_unit_0/USER_LOGIC_I/eth_mac_if_3_1/CLK_DIV/clkdv_dcm1' is not connected to an external port in this design. A new port 'board1_unit_0/board1_unit_0/USER_LOGIC_I/eth_mac_if_3_1/CLK_DIV/clkdv_dcm1' has been added and is connected to this signal. Applying constraints in "xup_morpheus5.ucf" to the design... Checking timing specifications ... INFO:XdmHelpers:851 - TNM "sys_clk_pin", used in period specification "TS_sys_clk_pin", was traced into DCM instance "dcm_0/dcm_0/Using_Virtex.DCM_INST". The following new TNM groups and period specifications were generated at the DCM output(s): CLK2X: TS_dcm_0_dcm_0_CLK2X_BUF=PERIOD dcm_0_dcm_0_CLK2X_BUF TS_sys_clk_pin/2 HIGH 50% CLKDV: TS_dcm_0_dcm_0_CLKDV_BUF=PERIOD dcm_0_dcm_0_CLKDV_BUF TS_sys_clk_pin*2 HIGH 50% INFO:XdmHelpers:851 - TNM "dcm_0_dcm_0_CLKDV_BUF", used in period specification "TS_dcm_0_dcm_0_CLKDV_BUF", was traced into DCM instance "board1_unit_0/board1_unit_0/USER_LOGIC_I/eth_mac_if_3_1/CLK_DIV/dcm_10mbit_I /DCM_INST". The following new TNM groups and period specifications were generated at the DCM output(s): CLKDV: TS_board1_unit_0_board1_unit_0_USER_LOGIC_I_eth_mac_if_3_1_CLK_DIV_dcm_10mbit_I_ CLKDV_BUF=PERIOD board1_unit_0_board1_unit_0_USER_LOGIC_I_eth_mac_if_3_1_CLK_DIV_dcm_10mbit_I_CLK DV_BUF TS_dcm_0_dcm_0_CLKDV_BUF*2 HIGH 50% INFO:XdmHelpers:851 - TNM "board1_unit_0_board1_unit_0_USER_LOGIC_I_eth_mac_if_3_1_CLK_DIV_dcm_10mbit_I _CLKDV_BUF", used in period specification "TS_board1_unit_0_board1_unit_0_USER_LOGIC_I_eth_mac_if_3_1_CLK_DIV_dcm_10mbi t_I_CLKDV_BUF", was traced into DCM instance "board1_unit_0/board1_unit_0/USER_LOGIC_I/eth_mac_if_3_1/CLK_DIV/dcm_10mbit_2 _I/DCM_INST". The following new TNM groups and period specifications were generated at the DCM output(s): CLKDV: TS_board1_unit_0_board1_unit_0_USER_LOGIC_I_eth_mac_if_3_1_CLK_DIV_dcm_10mbit_2_ I_CLKDV_BUF=PERIOD board1_unit_0_board1_unit_0_USER_LOGIC_I_eth_mac_if_3_1_CLK_DIV_dcm_10mbit_2_I_C LKDV_BUF TS_board1_unit_0_board1_unit_0_USER_LOGIC_I_eth_mac_if_3_1_CLK_DIV_dcm_10mbit_I_ CLKDV_BUF*10 HIGH 50% ERROR:NgdBuild:455 - logical net 'plb_bram_if_cntlr_1_port_BRAM_Clk' has multiple driver(s): pin PAD on block plb_bram_if_cntlr_1_bram/plb_bram_if_cntlr_1_bram/plb_bram_if_cntlr_1_port_BR AM_Clk with type PAD, pin O on block dcm_0/dcm_0/Using_BUGF_for_CLKDV.CLKDV_BUFG_INST with type BUFG ERROR:NgdBuild:924 - input pad net 'plb_bram_if_cntlr_1_port_BRAM_Clk' is driving non-buffer primitives: pin C on block reset_block/reset_block/core_cnt_en with type FD, pin C on block reset_block/reset_block/Bus_Struct_Reset_0 with type FD, pin C on block reset_block/reset_block/Rstc405resetchip with type FD, pin C on block reset_block/reset_block/Peripheral_Reset_0 with type FD, pin C on block reset_block/reset_block/Rstc405resetsys with type FD, pin C on block reset_block/reset_block/Core_Reset_Req_d3 with type FD, pin C on block reset_block/reset_block/CORE_RESET/q_int_0 with type FDRE, pin C on block reset_block/reset_block/CORE_RESET/q_int_1 with type FDRE, pin C on block reset_block/reset_block/CORE_RESET/q_int_2 with type FDRE, pin C on block reset_block/reset_block/CORE_RESET/q_int_3 with type FDRE, pin C on block reset_block/reset_block/SEQ/pr_dec_0 with type FDR, pin C on block reset_block/reset_block/SEQ/pr_dec_1 with type FDR, pin C on block reset_block/reset_block/SEQ/chip_dec_0 with type FDR, pin C on block reset_block/reset_block/SEQ/chip_dec_2 with type FD, pin C on block reset_block/reset_block/SEQ/pr_dec_2 with type FD, pin C on block reset_block/reset_block/SEQ/chip_dec_1 with type FDR, pin C on block reset_block/reset_block/SEQ/bsr_dec_0 with type FDR, pin C on block reset_block/reset_block/SEQ/bsr_dec_2 with type FD, pin C on block reset_block/reset_block/SEQ/seq_clr with type FDR, pin C on block reset_block/reset_block/SEQ/ris_edge with type FDR ERROR:NgdBuild:455 - logical net 'board1_unit_0/board1_unit_0/USER_LOGIC_I/eth_mac_if_3_1/CLK_DIV/clkdv_dcm1' has multiple driver(s): pin O on block board1_unit_0/board1_unit_0/USER_LOGIC_I/eth_mac_if_3_1/CLK_DIV/dcm_10mbit_I/ CLKDV_BUFG_INST with type BUFG, pin PAD on block board1_unit_0/board1_unit_0/USER_LOGIC_I/eth_mac_if_3_1/CLK_DIV/clkdv_dcm1 with type PAD ERROR:NgdBuild:924 - input pad net 'board1_unit_0/board1_unit_0/USER_LOGIC_I/eth_mac_if_3_1/CLK_DIV/clkdv_dcm1' is driving non-buffer primitives: pin O on block board1_unit_0/board1_unit_0/USER_LOGIC_I/eth_mac_if_3_1/CLK_DIV/dcm_10mbit_I/ CLKDV_BUFG_INST with type BUFG Does anybody know this problem. I did not apply any changes to the PLB_BRAM_IF_CNTL. Do I have to specify the new clock lines in one of the EDK files? Thanks in advance Sebastian GollerArticle: 121086
Thank you both. For others who had a memory lapse like me - it cannot be done interactively, so make sure you do 'quit -sim'. Scope of visibility can be narrowed down by doing log -r /tb/dut/ module1/*, etc.Article: 121087
Jonathan Bromley wrote: > Mike Treseler wrote: >> commiserate about the general problem of where to >> put cpu io registers/decoders that cover multiple modules. > > can I join in please? :-) > >> For now, I prefer to distribute a logical bus to >> all the modules with the following fields. >> >> address, writeData, write_stb, readData, read_stb > > Which, interestingly, means that you are specifying > your own bus protocol - albeit a (quite properly) very > simple one - specialised to the needs of device registers. Yes. >> I infer registers and decoders in the module >> that uses them, and keep track of address >> and bit allocations using global constants. > > Indeed. But how, pray, do you multiplex the various > readback data values? A big mux on the CPU interface. Only the module interface looks clean. > Do you like to imply tri-state > drivers, and hope that your synthesis tools will map > the tri-states (across many instances) to muxes? No. Readable code has priority over LUTs for me. > Or > do you, as I generally do, take care to ensure that > all the data outputs are held to zero whenever they > are not being read, and then OR them all together > somehow? Great idea. Thanks. Andy has posted here about mutual exclusion and I could never quite follow it. Maybe this is what he was on about. > On one recent project I got so fed up with > this problem that I arranged for each readable register > to have both input and output ports for read-data: > ... > if (someone is trying to read from me) then > my_register_read_data <= my_register_contents; > else > my_register_read_data <= (others=>'0'); > end if; > end process; > read_data_out <= read_data_in or my_register_read_data; > end architecture; > Thus I get a long ripple-OR structure on the readback data, > and the tools seem to do quite a good job of turning that > into an OR tree. Address decoding is now localised to each > register, and I don't need any global readback mux block. I'll give that a try. Thanks. -- Mike TreselerArticle: 121088
On 23 Jun., 19:30, "cpope" <cep...@nc.rr.com> wrote: > 3. If the cic is just a box car filter wouldn't it be easier to implement as > a single subtractor/accumulator whose inputs are the current sample and the > sample delayed by R? At least for reasonable R (< 8192) seems like it should > fit in block ram okay. It's all here: http://www.phptr.com/articles/article.asp?p=361985&rl=1 If you sum up R values, you have a gain of R, independently of your implementation. If you do that k times, you have a gain of R^k. The CIC implementation has exactly the same cost as the boxcar minus the RAM. So indeed, if you can afford the RAM you can use the boxcar. Kolja SulimmaArticle: 121089
"hitsx@hit.edu.cn" <hitsx@hit.edu.cn> wrote: >I have post an topic serveral days ago, and there is the link. >http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/ee8cd744f6c3c10b?tvc=2# > >The total computation is described below: >integer add 2442 Giga operations >float add 814 Giga operations >float substract 2424 Giga operations >float muliply 1610 Giga operations >float divide 809 Giga operations > >And I need these operations done in 1 ~ 3 minutes, so what kind of >FPGA is needed? And >should I use multiple FPGAs to finish the computation? I think a fast PC can do this easely. 2.5G operations in 3 minutes is 14M operations per second. -- Reply to nico@nctdevpuntnl (punt=.) Bedrijven en winkels vindt U op www.adresboekje.nlArticle: 121090
[On multiplexing the readback values from numerous addressable registers, without the HDL code becoming a dog's dinner] Mike's and Symon's responses got me thinking some more (a rare occurrence these days) and I came up with a couple of ideas that are probably well-known to half the population of comp.arch.fpga but are new to me. Idea 1: Wide, extensible readback mux. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I am pretty sure that, for FPGA architectures at least, it's more efficient to do a wide OR than a wide MUX. In other words, as I suggested earlier, make sure that all deselected registers jam their data outputs to zero, and then OR together all the data outputs. Here's a rather neat solution to the wide OR gate - much nicer than the ripple thing I suggested earlier. It depends on the use of an unconstrained array port of record types, so you'll need to check it works with your chosen synthesis tool. -- Step 1: Package to define some types and constants. -- We define a record "T_gated_databus" to reflect the -- readback data coming out of a register. The "data" -- element is, of course, the data; "enable" is a single- -- bit enable signal that's asserted when the register -- is addressed. -- library ieee; use ieee.std_logic_1164.all; package P_databus is -- constant databus_width : positive := 32; subtype T_databus is std_logic_vector(databus_width-1 downto 0); -- type T_gated_databus is record enable : std_logic; data : T_databus; end record; constant unused_databus: T_gated_databus := ( enable => '0' , data => (others => '0') ); -- type A_gated_databus is array(natural range <>) of T_gated_databus; end; -- Step 2: Make an arbitrarily wide OR structure using -- an unconstrained array input port. Feed it as many -- T_gated_databus records as you have registers. -- At most one of those will have its enable asserted -- at any given time. The output y.enable is asserted -- when one of the input enables is asserted. -- library ieee; use ieee.std_logic_1164.all; use work.P_databus.all; entity radialmux is port ( d: in A_gated_databus ; y: out T_gated_databus ); end; architecture RTL of radialmux is begin process (d) variable vy: T_gated_databus; begin vy := unused_databus; for i in d'range loop if d(i).enable = '1' then vy.enable := '1'; vy.data := vy.data or d(i).data; end if; y <= vy; end loop; end process; end; -- Step 3: Build your registers. -- Each register decodes its own address -- (you've configured its address with a generic, -- I hope). The readback part of each register -- works a bit like this... -- entity SOME_REGISTER is port (... readback: out T_gated_databus); .... process (address, read_enable, register_contents) begin if (address = MY_ADDRESS) and (read_enable = '1') then readback <= (enable => '1', data => register_contents); else readback <= (enable => '0', data => (others => '-')); end if; ... Idea 2: How to use that in a bigger design ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ So, here's the payoff. You distribute address, write enable, write data, read enable to all your registers. Each register has its own output, of type T_gated_databus. Now you plug those outputs into the unconstrained input port of your readback mux entity: signal regA_output, regB_output, ... : T_gated_databus; signal CPU_readback: T_gated_databus; ... Readback_Mux: entity work.radialmux port map ( d(1) => regA_output , d(2) => regB_output , y => CPU_readback ); Addresses are applied once only, as generics on the instances of register-containing entities. If you add another register-containing entity, you simply add another signal to the top level architecture and bolt it in to the port map of Readback_Mux, which then grows wider to suit the extended port map. You don't even need the numbering to be contiguous: I tried this... Readback_Mux: entity work.radialmux port map ( d(0 to 45) => (others => unused_databus) , d(46) => regA_output , d(47 to 62) => (others => unused_databus) , d(63) => regB_output , y => CPU_readback ); So you can make the port subscripts match-up with your register addresses, if it makes you feel better. Because "unused_databus" is an all-zero constant, synthesis optimises away the zero inputs. Your mileage may vary, but I think this shows promise. The synth tool I tried made a really excellent job of this, using a tree of LUTs in the obvious optimal way. It didn't, though, use carry chains - sorry Symon! -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.Article: 121091
"comp.arch.fpga" <ksulimma@googlemail.com> wrote in message news:1182783858.357110.273650@p77g2000hsh.googlegroups.com... > On 23 Jun., 19:30, "cpope" <cep...@nc.rr.com> wrote: > > > 3. If the cic is just a box car filter wouldn't it be easier to implement as > > a single subtractor/accumulator whose inputs are the current sample and the > > sample delayed by R? At least for reasonable R (< 8192) seems like it should > > fit in block ram okay. > > It's all here: > http://www.phptr.com/articles/article.asp?p=361985&rl=1 > > If you sum up R values, you have a gain of R, independently of your > implementation. > If you do that k times, you have a gain of R^k. > > The CIC implementation has exactly the same cost as the boxcar minus > the RAM. > So indeed, if you can afford the RAM you can use the boxcar. > > Kolja Sulimma > Thanks, I had a colleague forward me to a similar article that included information on how to trim the bits between the integrator sectrions. I guess my point with the ram is in V4 the bram and dsp48 are designed to be efficiently integrated so it might be possible to just implement a cascade of N boxcar filters using just N dsp48/BRAM pairs rather than doing the very high bitwidth integrators in the fabric. Should be lower power and run faster. -ClarkArticle: 121092
Hello everybody, I am trying interface to the J5 expansion port of the Digilent XUP2VP board through EDK. I read a previous post in the group and tried adding the following to the .xbd file for the board: BEGIN IO_INTERFACE ATTRIBUTE IOTYPE = XIL_GPIO_V1 ATTRIBUTE_INSTANCE = Exp_Conn_J5 PARAMETER num_bits = 32, IO_IS=num_bits PARAMETER is_dual=0, IO_IS=is_dual PARAMETER bidir_data=0, IO_IS=is_bidir # bidir data pins PARAMETER all_inputs=1, IO_IS=all_inputs PORT J5_4 = CONN_J5_4, IO_IS = gpio_io[0] PORT J5_5 = CONN_J5_5, IO_IS = gpio_io[1] PORT J5_6 = CONN_J5_6, IO_IS = gpio_io[2] PORT J5_34 = CONN_J5_34, IO_IS = gpio_io[30] PORT J5_35 = CONN_J5_35, IO_IS = gpio_io[31] END and to the FPGA section : # Expansion connector J5 # PORT J5_4 = CONN_J5_4, UCF_NET_STRING=("LOC=N6") PORT J5_5 = CONN_J5_5, UCF_NET_STRING=("LOC=N5") PORT J5_6 = CONN_J5_6, UCF_NET_STRING=("LOC=L5") PORT J5_34 = CONN_J5_34, UCF_NET_STRING=("LOC=L4") PORT J5_35 = CONN_J5_35, UCF_NET_STRING=("LOC=M2") END After this BSB shows up a blank peripheral with no options of connecting to PLB/OPB bus. Finally when I try to build the xps project thru XPS it crashes. I was wondering if somebody could provide me some pointers on how to connect the expansion ports through EDK. Any help would be really appreciated. Thanks, KoustavArticle: 121093
"Jon Beniston" <jon@beniston.com> wrote in message news:1182773495.080303.224560@q69g2000hsb.googlegroups.com... > > I'm working with the xilinx corgen cic v3.0. I'm finding that to get a > > decent rejection in the images (60 dB) I need about 4 stages. My input is > > only 10 bit and I still end up with a 66 bit output, 50 of which are thrown > > away. As a result my design won't fit in my device. > > If they are truely thrown away, it shouldn't be the cause of why you > aren't fitting into the device. > I throw them away at the output, I'm not convinced that the compiler trims them all the way back through the integrators. In fact I'm pretty sure it doesn't because I found a colleague that had to implement his own CIC that uses significantly less resources then the coregen block because he was able to trim the widths of the integrator sections. > > 1. My coregen says it doesn't support V4 for the cic so I've been compiling > > for V2. Seems like the DSP48 with the large accumulator is ideal for CICs? > > Why do you need DSP48s? Isn't the whole point of a CIC that it doesn't > use multiplies? I have them available. Should be less power and faster speed than implementing a 48 bit accumulator in slices right? > > > 2. Looks like the exponential bit growith is from the number of stages. > > Since noone uses more than 16 bits at the output why can't the output of the > > first integrator be trimmed back to 16 bit before feeding the next and so > > on? > > Aren't all the integrators cascaded together, then followed by all the > combs? Yes. My point is the width of the integrators and combs don't seem to be optimized at all. For example, If I'm only using 16 bits at the output why would the combs need to be more than say 16+N*2 wide? My coregen sets them at 66. Similarly, the first integrator should only need input width plus log2(R) width, the second needs input width + 2*log2(R) and so on. And that's only if you really need full precision which I suspect you don't. At any rate the coregen doesn't seem to employ any of these optimizations? > > Cheers, > Jon > >Article: 121094
I have some hard to debug issues in my FPGA image processing project: when I read the memory from the Virtex 4 chip at 15KHz rate (rather low). There are some undefined delays that are semi random: sequential reading of the memory has delays after every 32 accesses. I suspect that "after 10ns" constraint is not accurate enough. I use that constraint to make sure that the read memory clock is delayed from the process' clock that generates the read address.Article: 121095
"EEngineer" <maricic@gmail.com> wrote in message news:1182792188.946838.209360@p77g2000hsh.googlegroups.com... >I have some hard to debug issues in my FPGA image processing project: > when I read the memory from the Virtex 4 chip at 15KHz rate (rather > low). There are some undefined delays that are semi random: sequential > reading of the memory has delays after every 32 accesses. I suspect > that "after 10ns" constraint is not accurate enough. I use that > constraint to make sure that the read memory clock is delayed from the > process' clock that generates the read address. The only "after 10ns" style constraint I use is for "OFFSET IN AFTER 10 ns" which I don't actually use or "OFFSET OUT AFTER 10 ns" which I do. These are timing constarints to verify that your input setup times are within your needs or the clock-to-out times are acceptable. If you're trying to delay an output clock by 10 ns, your approach is incorrect. The output might be delayed through phase shifting of the Digital Clock Manager (DCM) but you have to pay attention to clock domain crossings. If you have a faster clock - such as a 100 MHz clock - you can delay an output by another 10 ns by just adding one more clock of delay into your generated signal. So - since "after 10ns" isn't an actual constraint - what are you trying to do? - John HandworkArticle: 121096
On Jun 1, 7:27 am, ashes....@gmail.com wrote: > On Jun 1, 2:54 pm, ashes....@gmail.com wrote: > > > > > On May 22, 3:50 am, Ligeti <jlls...@gmail.com> wrote: > > > > Hello > > > I get the same problem working with Spartan 3 and Virtex II Pro, it > > > started when I was trying the ISE 9.1i Quick Start Tutorial (am new to > > > ISE in general), when it comes to pin assigning the pins, PACE gives > > > me this message: > > > "PACE was unable to parse the HDL source file 'C:\...\counter.vhd' " > > > and after that PACE shows this (whatever you call it): > > > > Loading device for application Rf_Device from file '3s200.nph' in > > > environmentC:\Xilinx91i. > > > ERROR:HDLParsers:3562 - pepExtractor.prj line 1 Expecting 'vhdl' or > > > 'verilog' keyword, found 'work'. > > > > I searched and search, and the only result was this topic ... so I > > > sent an Email to mludwig hoping that he knows by now an answer for > > > this, but he didnt answer me :-( > > > So I am trying to refresh the topic ... Thats all for the moment, > > > thank you! > > > > note: sorry for my bad English. > > > The problem is in the pepExrtractor.prj file that ISE generates before > > calling Pace. I dont know what generates this file, but in a project > > that is OK the file does not exist. If you delete it, ISE just > > regenerates it. I am sure if you can fix the generation of this file > > all problems will go away! The contents of this file looks like: > > > work C:/Repository Working Copies/Link_Peak_and_Hold/ > > top_level_schematic.vhd > > > Notice the 'work' keyword it is complaining about at the start... > > OK HERE IS THE ANSWER ... IF THERE ARE SPACES IN THE DIRECTORY NAMES > IN THE PATH THEN THIS PROBLEM AOCCURS. Make sure all directory names > right back to the root directory have no spaces ... sheesh, that took > some working out!!! Thanks for all the useful information on the > diagnostic code xilinx!!! You are a genius. Thank you very much...I've spend the last 2 days pulling my hair out over that one. I was even about to throw my computer out the window at one stage....and I live on the 5th floor!!! You'ld think Xilinx would have more infomation about such a problem....thanks again though:)Article: 121097
On Jun 25, 1:47 pm, "John_H" <newsgr...@johnhandwork.com> wrote: > "EEngineer" <mari...@gmail.com> wrote in message > > news:1182792188.946838.209360@p77g2000hsh.googlegroups.com... > > >I have some hard to debug issues in my FPGA image processing project: > > when I read the memory from the Virtex 4 chip at 15KHz rate (rather > > low). There are some undefined delays that are semi random: sequential > > reading of the memory has delays after every 32 accesses. I suspect > > that "after 10ns" constraint is not accurate enough. I use that > > constraint to make sure that the read memory clock is delayed from the > > process' clock that generates the read address. > > The only "after 10ns" style constraint I use is for "OFFSET IN AFTER 10 ns" > which I don't actually use or "OFFSET OUT AFTER 10 ns" which I do. These > are timing constarints to verify that your input setup times are within your > needs or the clock-to-out times are acceptable. > > If you're trying to delay an output clock by 10 ns, your approach is > incorrect. > > The output might be delayed through phase shifting of the Digital Clock > Manager (DCM) but you have to pay attention to clock domain crossings. If > you have a faster clock - such as a 100 MHz clock - you can delay an output > by another 10 ns by just adding one more clock of delay into your generated > signal. > > So - since "after 10ns" isn't an actual constraint - what are you trying to > do? > > - John Handwork I did not use the ucf constraints file for the delay. Here is the actual line of code I am using: clock_rd <= clock after 20ns WHEN frame_done = '0' ELSE clock_dby8_logic; This works fine for one part of my design but doesn't for the other. Second issue may be the clock "clock_dby8_logic" that I generate in one of the processes. I may need to use the clock component - DCM. Also there is a warning after the programming file has been generated that there is a gated clock (because of WHEN condition for the clock, I guess) which may cause problems, I am not sure if this is affecting my design. Thanks, DanArticle: 121098
EEngineer wrote: ... >>> I suspect >>> that "after 10ns" constraint is not accurate enough. ... > I did not use the ucf constraints file for the delay. Here is the > actual line of code I am using: > clock_rd <= clock after 20ns WHEN frame_done = '0' ELSE > clock_dby8_logic; > > This works fine for one part of my design but doesn't for the other. The "after 20ns" is ignored for synthesis. -- Mike TreselerArticle: 121099
On Jun 25, 4:08 pm, Mike Treseler <mike_trese...@comcast.net> wrote: > EEngineer wrote: > > ... > > >>> I suspect > >>> that "after 10ns" constraint is not accurate enough. > ... > > I did not use the ucf constraints file for the delay. Here is the > > actual line of code I am using: > > clock_rd <= clock after 20ns WHEN frame_done = '0' ELSE > > clock_dby8_logic; > > > This works fine for one part of my design but doesn't for the other. > > The "after 20ns" is ignored for synthesis. > > -- Mike Treseler Why is it ignored? It seems that it works fine in the other design. Dan
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z