Messages from 61750

Article: 61750
Subject: Re: Input capacitance
From: brimdavis@aol.com (Brian Davis)
Date: 9 Oct 2003 19:28:28 -0700
Links: << >> << T >> << A >>

Austin,

>
> Brian just seems to be stuck, and is unwilling to
>grant that there are ways to make it work just fine
>

  Exactly what part of "requires external back termination and/or
input matching scheme when driving FPGA inputs from a modern high
speed LVDS driver" didn't you read?

 The way you "make it work just fine" with a high speed driver
is by adding "external back termination and/or input matching" -
that's what I've been saying, repeatedly, since my first post.
 
 Item 13 from my original post:
>
>13) Massive 8pf IBIS C_COMP input capacitance value for the
>   V2 LVDS inputs requires external back termination and/or
>   input matching scheme to achieve reasonable signaling when
>   driving FPGA inputs from a modern high speed LVDS driver
>

 Why did I feel it necessary to include this item on the list:

 Because inexperienced designers wouldn't know any better, and 
even experienced designers with ECL/GaAs/SiGe high speed digital
components may be caught off guard by such a high Cin spec - when
first reading  the Virtex2 datasheet, I thought it was a tester 
specification limit until I did initial system SPICE modeling 
and real world driver/TDR testing on a Virtex2 prototype board.


Brian


Austin Lesea <Austin.Lesea@xilinx.com> wrote in message news:<3F8578D1.AAB14B8B@xilinx.com>...
> Rick,
> 
> Now you, I can have a discussion with.
> 
> Anything that is unlcear or still in doubt about the input C issue that I might explain?
> 
> After all, many posts ago I explained why the C was what it was, how it is documented, and made
> comment that there are ways to deal with it, but Brian just seems to be stuck, and is unwilling to
> grant that there are ways to make it work just fine, and that perhaps there are valid reasons why
> the C input can not be 0.5pF.
> 
> Do you have, or have you run an IBIS simulation of the ORCA-4 IOB and looked at how its input C
> affects the signal?
> 
> Austin
>

Article: 61751
Subject: Re: MICROBLAZE: Using external instruction memory
From: John Williams <jwilliams@itee.uq.edu.au>
Date: Fri, 10 Oct 2003 12:56:45 +1000
Links: << >> << T >> << A >>

Hi Arkaitz,

arkaitz wrote:
> Hi Antti,
> 
> I've done a flash loader but I don't know which file do I have to
> store in flash in order to enable to execute it.
> 
> I've proved storing the "executable.elf" file which contains the
> crt0.o initialization code linked and then I jump to that address from
> a program stored in the Block RAMS, but as I supposed it doesn't work.

re-reading your messages, it occurs to me - are you hoping to execute 
the code directly from the flash?  In that case, you will need a custom 
link script, because otherwise your data segment (read/write) will also 
be located in the flash address space, and of course that won't work at all!

In my applications I simply use the flash as somewhere to store the 
image when the power is off - at bootup I copy the image from flash, 
down into RAM to the address at which it was originally linked, then 
jump to it.  You will need to modify this sequence somewhat...

Regards,

John

Article: 61752
Subject: Inferring an accumulator using Verilog on Xilinx Spartan 2e
From: yk00001@hotmail.com (Y K)
Date: 9 Oct 2003 20:03:46 -0700
Links: << >> << T >> << A >>

I need to infer an 8 bit accumulator (acc8) using Verilog on the
Xilinx Webpack.
The Library guide seems to contain syntax errors.
I could not get the tool to infer a loadable accumulator, no matter
how I play around with the implementation. I get an adder using 10
slices, instead of 5 slices I should get when an accumulator is
inferred.
Does anybody know the solution?

Article: 61753
Subject: FPGA/PLD Reliability: High Speeds and Advanced Processes
From: "Richard B. Katz" <richard.b.katz@nospamplease.nasa.gov>
Date: 10 Oct 2003 03:30:40 GMT
Links: << >> << T >> << A >>

Hi,

I am interested in the reliability of modern FPGA/PLD hardware and 
am surveying groups of users for their experience along with 
studying reliability data provided by various manufacturers.  So, 
two basic questions:

   1. Are there reliability issues for modern devices with the
      higher clock speeds that we are using today?  I shall set,
      for the sake of discussion, an artificial boundary of 100 MHz
      clock frequency for the dividing line between high and not
      high speed.

   2. Are there handling/assembly/application issues for modern
      devices as compared to say devices from 5 years ago?  That
      is, are there observed changes in sensitivity to conditions
      such as ESD, input voltage excursions, transients on the 
      power supplies, etc.

Please categorize the application environment in terms of 
commercial, industrial, or mil/aerospace and specify clock 
frequency.  If possible, quantities of devices might be helpful for 
evaluating trends.

Posts to the newsgroup are of course fine.  If you wish to be 
anonymous, please demunge and use the e-mail address in the header.

Thanks,

Richard B. Katz
NASA

Article: 61754
Subject: Re: Digesting runs of ones or zeros "well"
From: Ray Andraka <ray@andraka.com>
Date: Fri, 10 Oct 2003 00:15:27 -0400
Links: << >> << T >> << A >>

Depends on the HDL.  VHDL certainly is.  There is a link on my links
page of my website to an example of some VHDL that does exactly this,
which IIRC is a function call.

Tim wrote:

> Ray Andraka wrote:
> > I'd rather use a function or procedure within the HDL so that the
> > boolean expression is in the code and is used directly to generate
> > the init value.
>
> Not possible for the HDL which is not a complete
> programming language ;-)

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 61755
Subject: Re: Digesting runs of ones or zeros "well"
From: jeremywebb@ieee.org (Jeremy Webb)
Date: 9 Oct 2003 21:22:10 -0700
Links: << >> << T >> << A >>

John,

This was only a snippit of my code.  In my final design I accounted
for all possible rotations of the code.  Using the casez worked quite
well in my application.  Sorry for any confusion.

Sincerely,

Jeremy

"John_H" <johnhandwork@mail.com> wrote in message news:<9Sjhb.18$jU3.8636@news-west.eli.net>...
> I'm afraid I'm lost.  The example you give shows a single alignment for a
> 6-ones check.  There are 10 total alignments that can apply.  Once detected,
> there doesn't seem to be an indication THAT the detection occurred except
> that one of the bits from the SERDES is now a one (by definition of your
> pattern, it would have been a zero).
> 
> Were you suggesting that, in general, a casez might produce good results
> from the synthesizer for run detection?
> 
> "Jeremy Webb" <jeremywebb@ieee.org> wrote in message
> news:4d807c8a.0310091158.5a0ba215@posting.google.com...
> > John,
> >
> > I did something similar to this in a Spartan II.  I was searching
> > through a 2^7-1 PRBS pattern (at the output of a SERDES, data bus is
> > 10-bits wide) for the longest string of zeros.  Granted the longest
> > string of zeros in a 2^7-1 PRBS pattern is 6, the idea could be
> > extrapolated to longer strings like 9 in a 65-bit wide bus.
> >
> > Here's an example of what I did for searching for 6 zeros in a row.
> > You'll notice that in my casez statement, I'm actually searching for 6
> > ones in a row.  This is because the BERT that I was using inverted
> > it's output PRBS pattern.
> >
> >           always @(posedge clock1)
> >           begin
> > casez (datasi[9:0])
> > 10'b???111111? : Q[9:0] = {datasi[9:8],7'b1111111,datasi[0]};
> > default : Q[9:0] = datasi[9:0];
> > endcase
> >           end
> >
> > Once you find the string that you're looking for, you can do what ever
> > you'd like.
> >
> > Hope this helps,
> >
> > Jeremy
> >
> > johnhandwork@mail.com (John_H) wrote in message
>  news:<6c803f5f.0310060552.267dc963@posting.google.com>...
> > > "Morten Leikvoll" <m-leik@online.nospam> wrote in message
>  news:<5z9gb.28389$os2.397003@news2.e.nsc.no>...
> > > > I just started reading this thread.. Am I correct if you really want
>  to
> > > > detect 9 EQUAL bits in a row from a stream?
> > > > Could you not do this just with a 4bits counter and a comparator/zero
> > > > detector?
> > >
> > > Correct, I need "equal" bits, either 9'h000 or 9'h1ff, starting from
> > > 0, 8, 16, ... 56.
> > >
> > > The input is 65 bits per clock with a fast clock, output from BlockRAM
> > > which was loaded at full width.
> > >
> > > Counters require more than one clock.

Article: 61756
Subject: Re: Inferring an accumulator using Verilog on Xilinx Spartan 2e
From: Ray Andraka <ray@andraka.com>
Date: Fri, 10 Oct 2003 00:25:15 -0400
Links: << >> << T >> << A >>

You have to be a little careful because  the carry logic in the slice
follows the LUT, so in order to get in one level of logic you need to
visualize the load preceeding the add.  To do that, one input of the
adder has an and gate so that it is forced to zero when load is active,
the other input is a mux to select your load value or the addend (could
be the same, depending on your design).  Note in that case that there
is logic in front of both add inputs.  That same logic needs to preceed
the carry mux DI input, which does have an AND gate available (the mult
and).  In order to use that, your load signal has to be active low.
Some synthesis tools will infer the right structure as long as it is
realizable in the hardware, while others need you to be more explicit.

Y K wrote:

> I need to infer an 8 bit accumulator (acc8) using Verilog on the
> Xilinx Webpack.
> The Library guide seems to contain syntax errors.
> I could not get the tool to infer a loadable accumulator, no matter
> how I play around with the implementation. I get an adder using 10
> slices, instead of 5 slices I should get when an accumulator is
> inferred.
> Does anybody know the solution?

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 61757
(removed)

Article: 61758
(removed)

Article: 61759
Subject: Re: Inferring an accumulator using Verilog on Xilinx Spartan 2e
From: Y K <yksp0a0m01@hotmail.com>
Date: Fri, 10 Oct 2003 04:52:23 GMT
Links: << >> << T >> << A >>

This pointed me at a workaround:
The problem is indeed around the load input. The following code shows
three versions: The Xilinx "Verilog" version (self explanatory), the
same written in Verilog, and one where reset and load are or'ed
together. Only the third one works.
I prefer a pure Verilog solution for future portability reasons. The
code is going to a product line that may live for 20 years, and I don't
want to rewrite it everytime I have to design a board using a new FPGA
family.
The workaround is good enough for me, but perhaps Xilinx can fix the
documentation and XST code too?

Thank you
Yishai Kagan
yk_four_zeroes_one@hotmail.com
(convert to digits to get the correct e-mail).

Here it is:
module accumulator8(input C,R,L,D,CE,ADD, input [7:0] B, output reg
[7:0] Q);
/* The following does not work for obvious reasons.
// Verilog Inference Code copied from the Libraries Guide Page 163
always @ (posedge C)
begin
    if (R)
       Q <= 0;
    else if (L)
       Q <= D;
    else if (CE)
       end
    if (ADD)
       Q <= Q + B;
    else
       Q <= Q - B;
end
*/
// The following does not infer an accumulator for less obvious reasons:
/*
always @ (posedge C)
begin
    if (R)
       Q <= 0;
    else if (~L)
       Q <= D;
    else if (CE)
    begin
       if (ADD)
          Q <= Q + B;
       else
          Q <= Q - B;
    end
end
*/
// The problem is in L, as you claim.
// The following code does infer an accumulator,
// at the expense of losing the distinct reset:
wire RorL = R || L;
always @ (posedge C or posedge RorL)
begin
    if (RorL)
       Q <= D;
    else if (CE)
    begin
       if (ADD)
          Q <= Q + B;
       else
          Q <= Q - B;
    end
end


endmodule


Ray Andraka wrote:
> You have to be a little careful because  the carry logic in the slice
> follows the LUT, so in order to get in one level of logic you need to
> visualize the load preceeding the add.  To do that, one input of the
> adder has an and gate so that it is forced to zero when load is active,
> the other input is a mux to select your load value or the addend (could
> be the same, depending on your design).  Note in that case that there
> is logic in front of both add inputs.  That same logic needs to preceed
> the carry mux DI input, which does have an AND gate available (the mult
> and).  In order to use that, your load signal has to be active low.
> Some synthesis tools will infer the right structure as long as it is
> realizable in the hardware, while others need you to be more explicit.
> 
> 
> Y K wrote:
> 
> 
>>I need to infer an 8 bit accumulator (acc8) using Verilog on the
>>Xilinx Webpack.
>>The Library guide seems to contain syntax errors.
>>I could not get the tool to infer a loadable accumulator, no matter
>>how I play around with the implementation. I get an adder using 10
>>slices, instead of 5 slices I should get when an accumulator is
>>inferred.
>>Does anybody know the solution?
> 
> 
> --
> --Ray Andraka, P.E.
> President, the Andraka Consulting Group, Inc.
> 401/884-7930     Fax 401/884-7950
> email ray@andraka.com
> http://www.andraka.com
> 
>  "They that give up essential liberty to obtain a little
>   temporary safety deserve neither liberty nor safety."
>                                           -Benjamin Franklin, 1759
> 
>

Article: 61760
Subject: Problems with PCI-CardbusCard (interface is an FPGA) on Windows
From: "Joachim Mann" <jogges@web.de>
Date: Fri, 10 Oct 2003 08:21:35 +0200
Links: << >> << T >> << A >>

Dear all,
I have one big problem with my cardbus PC-card.
I developed this card on my own and a collegue developed the driver. The
CardBus-interface is included in an FPGA APEX20K100E.
Since there were some statements in the Specifications about burst read, I
adapted
everything on this card for this burst read. But when I insert the card in a
Notebook and read my memory space, no burst happens. All what is done are
normal single accesses. What do I have to do to perform a burst read? Are
there any settings which have to be made to enable such a burst?
My PC-Card is burst-read capable and on the PCI-to-CARDBUS-Bridge I also set
the MBURSTUP and MBURSTDN-Bit to enable such burst. The sys-driver itself
should also be burst-capable I think.

I would be very pleased if anyone could help me solving this problem.
Thanks in advance

Joachim

Article: 61761
Subject: Re: Why no synthesis?
From: andres.vazquez@gmx.de (Vazquez)
Date: 9 Oct 2003 23:58:05 -0700
Links: << >> << T >> << A >>

Dear Mr Treseler,

thank youu for your answer. I have looked at the pdf-file you recommended.
On page 50 there is the VHDL description of a single-clock synchronous RAM,
but I use two clocks.
The problem seems to be the signal writing:
When I do not use it, the compiler inferres RAM-memory.
But it should not be such a big problem to combine the write-signal with a
writing signal! But the compiler seems to have a recognition problem of
the RAM structure when doing so.
I am trying to find out why, but without any success yet.

Thanks.

Kind regards

Andres Vazquez
G&D System Development

Mike Treseler <mike.treseler@flukenetworks.com> wrote in message news:<3F85DCD8.60208@flukenetworks.com>...
> Vazquez wrote:
> 
> > Why does QuartusII not synthesize it as a RAM structure using
> > the memory bits of Cyclone?
> 
> see pg. 50
> http://www.altera.com/literature/an/an238.pdf
> 
>   -- Mike Treseler

Article: 61762
Subject: Re: Xilinx dedicated multiers vs multipliers in slice fabric
From: "Ken" <aeu96186_MENOWANTSPAM@yahoo.co.uk>
Date: Fri, 10 Oct 2003 09:35:18 +0100
Links: << >> << T >> << A >>


Great answer Ray - thanks very much.

Ken


"Ray Andraka" <ray@andraka.com> wrote in message
news:3F85D4D1.EFC91BDC@andraka.com...
> Bzzzt.  The 'pipeline' register in the multiplier is in the middle.  the
setup
> and clock to Q of the 'pipelined' multiplier is substantial.  In order to
get
> the data sheet max performance, you need to add CLB registers to the
> multiplier I/O AND you need to place them in the slices where there are
direct
> connects to the multiplier.  If you do this, and as long as you don't have
> 'stepping 0' parts, the embedded multipliers can be clocked faster than an
18
> bit carry chain.  The advantage of in the fabric multipliers is that you
can
> make them whatever size you need, and put them where they are convenient
> rather than being restricted to the mult/bram columns.  In the fabric, you
can
> also take advantage of cases where you have multiple clocks per sample to
> reduce the size of the multiplier.  I look at the FPGA sort of like a bin
of
> different Legos (tm).  You use what you have in the box to the best
advantage
> for your particular project.  Sometimes there are more multipliers than
you
> need, so you can use them for things like shifters or muxes if you get
real
> cute about it.  Other times, there are not enough, so you pick and choose
what
> goes where.
>
> Ken wrote:
>
> > <snip>
> >
> > > 3: Use of the BlockRAMs.  Since the BlockRAMs and multipliers share
> > > interconnect, there are limits on when they can be used
> > > simultaneously.
> > >
> > > 4: Pipelined, throughput-optimized performance.  The fixed multipliers
> > > are unpipelined or single-stage, a LUT multiplier can be much more
> > > finely pipelined (higher thorughput).
> >
> > Ok - it is my understanding that there are registers just before and
just
> > after the dedicated multipliers that can be used to speed them up.
> >
> > But what you are saying is that the LUT multipliers will have a higher
max
> > MHz when both solutions are as pipelined as they can be?
> >
> > Thanks for your time,
> >
> > Ken
>
> --
> --Ray Andraka, P.E.
> President, the Andraka Consulting Group, Inc.
> 401/884-7930     Fax 401/884-7950
> email ray@andraka.com
> http://www.andraka.com
>
>  "They that give up essential liberty to obtain a little
>   temporary safety deserve neither liberty nor safety."
>                                           -Benjamin Franklin, 1759
>
>

Article: 61763
Subject: Re: MICROBLAZE: Using external instruction memory
From: arkagaz@yahoo.com (arkaitz)
Date: 10 Oct 2003 02:58:42 -0700
Links: << >> << T >> << A >>

Hi John,

My first idea was to execute it directly from flash memory, but now I
will first try copying to it SRAM before I execute it.

I will use an specific linker script later to execute it directly.

I have resolved the problem. Was that I was creating the elf file in
XMDSTUB mode instead of EXECUTABLE mode.

Thanks a lot for your time.

Arkaitz.

John Williams <jwilliams@itee.uq.edu.au> wrote in message news:<bm564s$d66$1@bunyip.cc.uq.edu.au>...
> Hi Arkaitz,
> 
> arkaitz wrote:
> > Hi Antti,
> > 
> > I've done a flash loader but I don't know which file do I have to
> > store in flash in order to enable to execute it.
> > 
> > I've proved storing the "executable.elf" file which contains the
> > crt0.o initialization code linked and then I jump to that address from
> > a program stored in the Block RAMS, but as I supposed it doesn't work.
> 
> re-reading your messages, it occurs to me - are you hoping to execute 
> the code directly from the flash?  In that case, you will need a custom 
> link script, because otherwise your data segment (read/write) will also 
> be located in the flash address space, and of course that won't work at all!
> 
> In my applications I simply use the flash as somewhere to store the 
> image when the power is off - at bootup I copy the image from flash, 
> down into RAM to the address at which it was originally linked, then 
> jump to it.  You will need to modify this sequence somewhat...
> 
> Regards,
> 
> John

Article: 61764
Subject: Re: Floorplanning, Routing, FPGA Editor
From: "Martin Euredjian" <0_0_0_0_@pacbell.net>
Date: Fri, 10 Oct 2003 10:28:54 GMT
Links: << >> << T >> << A >>

"Ray Andraka" wrote:

> FWIW, you need to put those registers in those spots around the
multipliers in
> order to achieve the data sheet max performance.

Right.  I experimented with the XAPP636 placement and studied the routing in
and out of the multiplier with FPGA Editor.  Makes sense.  Can't see a
faster way to lay it out.

Funny enough, if you let the tools do a layout they will be exceedingly
happy to put FF's so far away from multipliers that a monkey with a dart
might be able to do better.  This, I don't really understand.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Martin Euredjian

To send private email:
0_0_0_0_@pacbell.net
where
"0_0_0_0_"  =  "martineu"

Article: 61765
Subject: Re: pci-x133 to parallel pci-66
From: chadb@beardendesigns.com (Chad Bearden)
Date: 10 Oct 2003 03:45:18 -0700
Links: << >> << T >> << A >>

If you mean putting both Tundra 310 bridges on a single pci-x133 bus I
don't think this is electrically supported.  As I understand it you
can only have one load on pci-x133 bus. Please correct me if I have
mis-stated your intention.

chad.

> If you're looking for an existing silicon solution I believe you could
> do it with two Tundra Tsi310 parts.
> 
> 	-hpa

Article: 61766
Subject: Re: FPGA/PLD Reliability: High Speeds and Advanced Processes
From: Allan Herriman <allan.herriman.hates.spam@ctam.com.au.invalid>
Date: Fri, 10 Oct 2003 20:45:23 +1000
Links: << >> << T >> << A >>

On 10 Oct 2003 03:30:40 GMT, "Richard B. Katz"
<richard.b.katz@nospamplease.nasa.gov> wrote:

>Hi,
>
>I am interested in the reliability of modern FPGA/PLD hardware and 
>am surveying groups of users for their experience along with 
>studying reliability data provided by various manufacturers.  So, 
>two basic questions:
>
>   1. Are there reliability issues for modern devices with the
>      higher clock speeds that we are using today?  I shall set,
>      for the sake of discussion, an artificial boundary of 100 MHz
>      clock frequency for the dividing line between high and not
>      high speed.
>
>   2. Are there handling/assembly/application issues for modern
>      devices as compared to say devices from 5 years ago?  That
>      is, are there observed changes in sensitivity to conditions
>      such as ESD, input voltage excursions, transients on the 
>      power supplies, etc.

You also might like to consider packaging.  The high performance we
achieve today owes as much to the packaging as to the silicon.  New
packaging (e.g. BGA) will have new ways to fail.

High performance also means lots of power, and the thermal aspects may
influence reliability.

Regards,
Allan.

Article: 61767
Subject: Re: pci-x133 to parallel pci-66
From: chadb@beardendesigns.com (Chad Bearden)
Date: 10 Oct 2003 03:56:30 -0700
Links: << >> << T >> << A >>

Eric Crabill <eric.crabill@xilinx.com> wrote in message news:<3F85E01C.A0E99EFA@xilinx.com>...
> Hi,
> 
> Logically, what you described can be built with three
> PCI-X to PCI-X bridges.
> 
> You can take bridge #1 from PCI-X 133 to PCI-X 66.  

Aren't you cutting your bandwidth in half?  I would like to have the
pci66 busses be able to run at full speed to access the host's memory
(primary side of the pcix133 bridge #1).  If you drop to 66 MHz here
now my to secondary busses can only run at 1/2 there bandwidth _if_
trying to access host memory at the _same_ time.


> On
> that PCI-X 66 bus segment, you put bridge #2a and #2b,
> both of which bridge from PCI-X 66 to PCI 66.  So, you
> can actually go buy three of these ASSPs and build
> exactly what you want.
> 
> I wouldn't want to turn you away from a Xilinx solution.
> A Xilinx solution could be a one-chip solution, offer
> lower latency, and provide you with the opportunity to
> customize your design in a way you cannot with ASSPs.
> However, you would want to carefully weigh the benefits
> with the downsides -- you will need to put in some design
> effort.  Another thing to consider is cost, which will
> be a function of the size of your final design.
> 
> Good luck,
> Eric
>

Article: 61768
Subject: Re: Floorplanning, Routing, FPGA Editor
From: Ray Andraka <ray@andraka.com>
Date: Fri, 10 Oct 2003 07:45:02 -0400
Links: << >> << T >> << A >>

The tools do the same thing with pipeline registers added to BRAMs.  They don't
seem to do very well with placement of and around the multipliers and BRAMs.

Martin Euredjian wrote:

> "Ray Andraka" wrote:
>
> > FWIW, you need to put those registers in those spots around the
> multipliers in
> > order to achieve the data sheet max performance.
>
> Right.  I experimented with the XAPP636 placement and studied the routing in
> and out of the multiplier with FPGA Editor.  Makes sense.  Can't see a
> faster way to lay it out.
>
> Funny enough, if you let the tools do a layout they will be exceedingly
> happy to put FF's so far away from multipliers that a monkey with a dart
> might be able to do better.  This, I don't really understand.
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Martin Euredjian
>
> To send private email:
> 0_0_0_0_@pacbell.net
> where
> "0_0_0_0_"  =  "martineu"

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 61769
Subject: Re: Problems with PCI-CardbusCard (interface is an FPGA) on Windows
From: "Nial Stewart" <nial@spamno.nialstewart.co.uk>
Date: Fri, 10 Oct 2003 16:20:54 +0100
Links: << >> << T >> << A >>


Joachim Mann <jogges@web.de> wrote in message
news:bm5j3v$h2vjr$1@ID-199325.news.uni-berlin.de...
> Dear all,
> I have one big problem with my cardbus PC-card.
> I developed this card on my own and a collegue developed the driver. The
> CardBus-interface is included in an FPGA APEX20K100E.
> Since there were some statements in the Specifications about burst read, I
> adapted
> everything on this card for this burst read. But when I insert the card in
a
> Notebook and read my memory space, no burst happens. All what is done are
> normal single accesses. What do I have to do to perform a burst read? Are
> there any settings which have to be made to enable such a burst?
> My PC-Card is burst-read capable and on the PCI-to-CARDBUS-Bridge I also
set
> the MBURSTUP and MBURSTDN-Bit to enable such burst. The sys-driver itself
> should also be burst-capable I think.
>
> I would be very pleased if anyone could help me solving this problem.
> Thanks in advance
>
> Joachim

Joachim,

I've read that PC hosts won't perform burst reads from target PCI cards.
If you want a burst transfer you've got to implement a PCI master/target
in your interface so the master can perform the burst transfer.
I don't think this is mentioned in the PCI spec.

I presume this is the same for Cardbus.

Can anyone else confirm this?


Nial.

------------------------------------------------
Nial Stewart Developments Ltd
FPGA and High Speed Digital Design
www.nialstewartdevelopments.co.uk

Article: 61770
Subject: Re: pci-x133 to parallel pci-66
From: Eric Crabill <eric.crabill@xilinx.com>
Date: Fri, 10 Oct 2003 09:52:52 -0700
Links: << >> << T >> << A >>


Hi,

Perhaps I am a bit jaded, but I think you will never
actually realize anything close to "full speed" using
PCI.  (PCI-X has some improvements in protocol).  Your
statement assumes that both the data source and the
data sink have an infinitely sized buffer, nobody uses
retries with delayed read requests, and you have huge
(kilobytes at a time) bursts.

> Aren't you cutting your bandwidth in half?

It depends -- are you talking about "theoretical"
bandwidth, or bandwidth you are likely to achieve?

If you are designing under the assumption that you will
achieve every last byte of 533 Mbytes/sec on a PCI64/66
bus, you will have some disappointment coming.  :)

PCI and PCI-X are not busses that provide guaranteed
bandwidth.  I've seen bandwidth on a PCI 64/66 bus fall
to 40 Mbytes/sec during certain operations because the
devices on it were designed poorly (mostly for the
reasons I stated in the first paragraph).

> like to have the pci66 busses be able to run at full
> speed to access the host's memory (primary side of
> the pcix133 bridge #1).  If you drop to 66 MHz here
> now my to secondary busses can only run at 1/2 there
> bandwidth _if_ trying to access host memory at the
> _same_ time.

While the point you raise is theoretically valid, you
must consider that the bandwidth you achieve is going to
be no greater than the weakest link in the path.  What
is the actual performance of the PCI-X 133 Host?  How
about your PCI 66 components?  The bridge performance
may be moot.

An interesting experiment you could conduct would be
to plug your PCI 66 component into a PCI 66 host, and
see how close to "full speed" you can really get using
a PCI/PCI-X protocol analyzer.

Then, you could buy two bridge demo boards from a
bridge manufacturer (PLX/Hint comes to mind...) and
see what you get behind two bridges, configured as I
described.

I would certainly conduct this experiment as a way to
justify the design time and expense of a custom bridge
to myself or my manager.  While I suspect you won't get
half of "full speed" in either case, I am very often
wrong.  That's why I'm suggesting you try it out.

I'm not trying to discourage you from using a Xilinx
solution.  However, I'd prefer that potential customers
make informed design decisions that result in the best
combination of price/performance/features.

Good luck,
Eric

Article: 61771
Subject: Re: Problems with PCI-CardbusCard (interface is an FPGA) on Windows
From: Eric Crabill <eric.crabill@xilinx.com>
Date: Fri, 10 Oct 2003 09:57:39 -0700
Links: << >> << T >> << A >>


Hi,

For both PCI and CardBus (which is basically point-to-point
3.3v, 33 MHz PCI) the host bridges are traditionally very
good targets but not good initiators.

This means that if you want to move lots of data, the way
you need to do it is by making your add-in card become a
bus master, and then read/write host memory.

If you are trying to have the "CPU" read/write the add-in
card, you'll get poor performance.  As the original poster
noted, the host won't even burst...

Eric

Article: 61772
Subject: Xilinx XC2S50: Unable to configure through slave serial mode
From: do_not_reply_to_this_addr@yahoo.com (Sumit Gupta)
Date: 10 Oct 2003 10:45:54 -0700
Links: << >> << T >> << A >>

Hi

I am trying to build a prototype Spartan-II board.

1. I am using a XC2S50 TQ144 part with all the mode pins tied to
VCCINT
2. I am using Xilinx webpack 4.2
3. The parallel cable is from insight electronics (Model IJC-2)
4. I am using a general purpose PCB and a QFP144 adapter from
adapters.com to connect the FPGA to the PCB.

Now when I compile a simple test module and try to download the bit
file through Xilinx iMPACT tool, it gives an error saying
"Configuration failed: done pin did not go high".

What could be the cause and how can I debug it.

Thanks
Sumit

Article: 61773
Subject: VCC's HOTman
From: machosri@yahoo.com (Sriram)
Date: 10 Oct 2003 11:12:53 -0700
Links: << >> << T >> << A >>

Hi ,
I downloaded HOTman from Virtual Computing corporation around April'03
but when I tried to open the console now it didnt work.Also I went
through the procedure again for running HOTman the first time and
still couldnt get the GUI to appear.I got the error message "Could not
find main class" when I double clicked on the Hotman.jar file.

Is this a problem of JRE or is the software HOTman no longer working.
I also tried to download the evaluation edition again from the VCC
website and couldnt do it(got an internal server error problem).Have
they closed the site ?

Has anybody faced similar problems with HOTman? 
Also to implement programs from C directly to FPGA would Celoxica's
HandelC oriented DK1-design Suite be the next best option,if I cant
get HOTman working.


Kindly do help me out on the above. 

Thanks ,
Sriram

Article: 61774
Subject: Re: Counting ones
From: john.l.smith@titan.com (John)
Date: 10 Oct 2003 11:49:17 -0700
Links: << >> << T >> << A >>

"John_H" <johnhandwork@mail.com> wrote in message news:<pDYdb.14$XP3.1342@news-west.eli.net>...
> <snip>
> In either the Xilinx or Altera architecture, it's probably most efficient to
> pre-add in groups of 4 bits then add thse results in an adder tree.  For 32
> bits (for instance) you can get 8 values with counts of 0-4 with simple
> LUTs.
> <snip>
> - John_H

I love coming back to a subject and gaining a
little more insight!

  In the current discussion (and the thread from
3 years ago) I made a mistake (and so has
everyone else contributing, a surprising
event on usenet). There is a seeming paradox,
in that it is more efficient to use the LUTs
to do a three input sum at the leaves of the adder
tree than to do a 4 input sum. I'll show below the
most efficient (area wise) FPGA implementations that
I know (for 16 bits, and OP's 30 bits), and challenge
anyone to come back and show an even better way.
After that, some simple words about the 'paradox'
of why the three input/LUT solution is better than
four input/LUT in this case.

  Look first at a _LUT_only_ implementation (16 bits),
which applies to any LUT based FPGA...

  A full adder (FA3) can be implemented in 2 LUTs
These are 3-LUTs!, not 4-LUTS, although in practice
4-LUTs are used because that is what is available.
It is also called a (3,2) counter.
See L.Dadda's papers, including "Pipelined Adders"
in IEEE Transactions on Computers, Mar 1996,
for a good discussion of adders, or Patterson et.al
on "Optimal Carry Save Networks", available at Citeseer,
for better/deeper discussions than I can give.

This is the full adder:
    ___
-0-| F |
-0-| A |-1-
-0-|_3_|-0- 

(the numbers are powers of 2 at a position)
  Repeating: The FA3 sums 3 bits, using 2 3-LUTs.
  Next, 4 full adders are arranged to sum 7 bits,
using 8x 3-LUTs:
    ___
0--| F |
0--| A |-1--------+
0--|_3_|-0---+    |        ___
             |    +-----1-| F |
    ___   +--|----------1-| A |---2
0--| F |  |  |   ___   +1-|_3_|---1
0--| A |-1+  +0-| F |  |
0--|_3_|-0----0-| A |-1+
0-------------0-|___|-0-----------0              

   Call the above a 7 Adder (7Add):
     ___
    | 7 |
  7 | A |-2-
0-/-| d |-1-
    |_d_|-0-

(a more correct term might be '(7,3) counter',
but '7Add' fits in the ascii drawing)

  Two 7Adds can be used to sum 14 bits into
two 3-bit numbers, using 16x 3-LUTs.

  3x 4-LUTs are used to sum 4 bits to a 3-bit
result in a 4Add:
     ___
    | 4 |
  4 | A |---2
0-/-| d |---1
    |_d_|---0

  Produce the final result from the output of the
two 7Adds plus the remaining two bits, using
two 4Adds and an FA:
     ___
    | 7 |
  7 | A |-2-------------------------------+    ___
0-/-| d |-1-------------------+           +-2-| 4 |
    |_d_|-0-----+     +-------|-------------2-| A |--4
                |     |     +-|-------------2-| d |--3
                |     |     | |    ___    +-2-|_d_|--2
     ___    +---|-----+     | +-1-| F |   |
    | 7 |   | +-|-----------|---1-| A |-2-+
  7 | A |-2-+ | |    ___    | +-1-|___|--------------1
0-/-| d |-1---+ +-0-| 4 |   | |
    |_d_|-0-------0-| A |-2-+ |
0-----------------0-| d |-1---+ 
0-----------------0-|_d_|----------------------------0

  Total LUT count is 24 (3 Virtex CLBs), using
6x 4-LUTs and 18x 3-LUTs. This is the absolute
minimum using 3-LUT and 4-LUT based logic alone
(that I know of).

  Now look at how the VirtexII carry logic may
improve things...

  At the leaves of the tree (left hand side), two 3-LUTs
are still used to build full adders. The next step is
building a 7Add, which sums the output of 2 FAs plus
one more bit. Here, either carry logic can be used,
at a total cost of 8 LUTs, or a LUT only solution, also
costing 8 LUTs. A 15Add sums two 7Adds and another bit.
This costs two 7Adds (16 LUTs), plus a standard 3 bit
carry logic based adder with carry-in and carry-out, another
5 LUTs. Adding in the last bit is a 4 bit increment with
carry-out, costing 6 LUTs using carry logic. Total: 27 LUTs!
Three more than the LUT only circuit. Frustrating, isn't it?
  For this size, a sum of 16 bits, trying to use carry logic
does worse than a LUT only implementation. For a 15Add, carry
logic allows a marginal improvement of one LUT, using only 
21 LUTs instead of 22. For larger bit counts, the carry logic
becomes increasingly important, as will be explained below.

  Finally, a few simple-minded words about the 'paradox'...

  Many operations can be viewed as an exercise in compression,
reducing a number of inputs to a smaller number of outputs
through some sort of multi-level tree structure. In this case,
inputs are the bits to count, and outputs are the count. At any
tree level, a simple, naive figure of merit qualifies a circuit:
	(InputBits/OutputBits) * (InputBits/LUTs)
  The larger this number, the better the circuit is for that level.
The first ratio helps the tree converge faster, the second reduces
the LUT count. For 4-LUT based logic, the highest possible figure
of merit would be 16, indicating four input bits produce a single
output bit, with a single LUT. Parity trees, "and" trees, "or"
trees acheive 16.
  Implementing the leaf side initial level of bit counting
with 4-LUTs gives  (4/3)*(4/3)=1.778. Implementing the initial bit
counting with 3-LUTs gives (3/2)*(3/2)=2.25. At any stage where
bits of the same weight are aggregated and compressed, the 3-LUT
implementation is better if they can be grouped by threes.
At the end of the 16Add tree, some 4-LUTs are used to advantage
because there are 4 signals to combine that cannot be cleanly
split into groups of three. For every other location in the tree,
the 3-LUTs work out better.
   When carry logic is used at a combining stage, uniting
two addends of the same size plus an LSB, the metric becomes:
((2*Size+1)/(Size+1))*((2*Size+1)/(Size+2))
where Size is the number of bits in each addend.
This gives:
  Size  Merit
   1    1.5
   2    2.083
   3    2.45
   4    2.7
   ... approaching 4

  When 3-LUT based full adders are cascaded to add
two numbers and a carry-in, the metric becomes:
((2*Size+1)/(Size+1))*((Size+1)/(2*Size))
Giving:
 Size   Merit
  1     2.25
  2     2.083
  3     2.042
  4     2.025
  ... approaching 1

  Comparing the tables, for combining three equal weight
bits, the LUT solution is better; for going from full adder
to 7Add the two circuits have equal LUT count; for
going from 7Add to 15Add carry logic should be used.
The optimum (30,5) counter (that I know of) uses a mix of
3-LUTs and carry-logic, and no 4-LUTs.

  Whew! Talk about being over-anal(ytical). Hope I haven't
bored anyone, just wanted to get the 'best' circuits public
(in hopes someone else has a better one), and show something
not immediately obvious about the leaves of the adder tree.
Peter's BlockRam implementation is also slick!

   I'll finish with a question:
Just what are the best/worst results from HDL synthesis
tools that folks get with this function? I.e., barring
forcing the mapping, and letting the tool optimize from
something like:
Count <= Bit0 + Bit1 + Bit2 +...

Regards,
John

p.s. Original poster asked about 30 bits...here's the
most compact way I know, but I haven't spent much
time on 30 bits. Here the carry logic helps. Note
that none of the LUTs are configured as 4-LUT....
   ___ 
0-| F |        
0-| A |-1-------+     
0-|_3_|-0-----+ |    
   ___        | |    ___
0-| F |       | +-1-| F |
0-| A |-1-----|---1-| A |-2-----------------+
0-|_3_|-0-+ +-|---1-|_3_|-1---------------+ |
   ___    | | |      ___                  | |
0-| F |   +-|-|---0-| F |                 | |
0-| A |-1---+ +---0-| A |-1---+           | |
0-|_3_|-0---------0-|_3_|-0---|---------+ | |
   ___                        |         | | |
0-| F |                       |    ___  | | |
0-| A |-1-------+             +   | F | | | |
0-|_3_|-0-----+ |           +---1-| A |-|-|-|-2---+
   ___        | |    ___    | +-1-|_3_|-|-|-|-1-+ |
0-| F |       | +-1-| F |   | |         | | |   | |
0-| A |-1-----|---1-| A |-2-|-|-------+ | | |   | |         ___
0-|_3_|-0-+ +-|---1-|_3_|-1-+ |       | | | |   | |  '0'-4-|   |
   ___    | | |      ___      |       | | | |   | |  '0'-3-|   |
0-| F |   +-|-|---0-| F |     |       | | | |   | +------2-| C |
0-| A |-1---+ +---0-| A |-1---|-----+ | | | |   +--------1-| Y |-4
0-|_3_|-0---------0-|_3_|-0---|---+ | | | | |        '0'-0-| A |-3
   ___                        |   | | | | | |       ___    | d |-2
0-| F |                       |   | | | | | |'0'-3-|   |-4-| d |-1
0-| A |-1-------+             |   | | | | | +----2-| C |-3-|   |-0
0-|_3_|-0-----+ |             |   | | | | +------1-| Y |-2-|   |
   ___        | |    ___      |   | | | +--------0-| A |-1-|   |
0-| F |       | +-1-| F |     |   | | |     ___    | d |-0-|___|
0-| A |-1-----|---1-| A |-2---|-+ | | +--2 | C |-3-| d |     |
0-|_3_|-0-+ +-|---1-|_3_|-1---+ | | +----1-| Y |-2-|   |     0
   ___    | | |      ___        | +------0-| A |-1-|   |     |
0-| F |   +-|-|---0-| F |       +--------2-| d |-0-|___|     |
0-| A |-1---+ +---0-| A |-1--------------1-| d |     |       |
0-|_3_|-0---------0-|_3_|-0--------------0-|___|     0       |
                                             |       |       |
                                             0       |       |
0--------------------------------------------+       |       |
0----------------------------------------------------+       |
0------------------------------------------------------------+

LUT Count:
    18       +        12          +  2   +   5   +   6   +   6 = 49

49 LUTs = 8.125 CLBs
(Pipeline to taste)
HTH
(apologies for being late to the thread, suffered a disk crash
recently, could not post)

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search