Messages from 45575

Article: 45575
Subject: Re: Making my own software
From: "Martin Schoeberl" <martin.schoeberl@chello.at>
Date: Sat, 27 Jul 2002 10:57:42 GMT
Links: << >> << T >> << A >>

I've written a small program for configuration in serial mode (but not JTAG)
with the printer port some time ago. If you're interested drop me a note.
Martin

"Juha Pajunen" <juha.pajunen@bitboys.com> schrieb im Newsbeitrag
news:4b980638.0207190558.37890b2d@posting.google.com...
> Hi All,
>
> I am planning to do my own stand alome software
> that programs Altera APEX via ByteBlasterMV
> with *.SOF and *.POF (FlexChain and JTAG)
> files, so I do not need Altera QuartusII
> for sending data to device. (Can use my HW design
> w/o huge QuartusII software...)
>
> So, I have been looking all over WWW to find out
> information about how does Quartus do programming
> and how those those signals acts on ByteBlasterMV
> cable.
>
> Can you help me where to find some kind of timing
> waveform / wavediagram where I can start to learn
> ByteBlasterMV "protocol?
>
> Is it possible to do that kind of program...?
>
> If there is existing softwares I am also intrested
> in those *.EXE files.
>
> Thank You vert much and have a nice weekend :-)
>
> Sincerely,
> Juha Pajunen, Hw Engineer

Article: 45576
Subject: Re: logic elements v/s logic cells
From: rickman <spamgoeshere4@yahoo.com>
Date: Sat, 27 Jul 2002 09:55:54 -0400
Links: << >> << T >> << A >>

Interesting idea.  IIRC, the Altera cascade chain was inferred by
Synplicity pretty well if we used predecoded enables.  

I am curious about the rotator you mention.  You said that you could
implement a 64 bit rotator in 192 slices (384 LUTs?) with a standard
method and 66 slices (132 LUTs) with an optimal technique.  I can only
picture a N/2 x log(N) array of 2:1 muxes where N is 64 bits.  This
gives 256 LUTs which is neither of your answers.  Even if you find a way
to use an extra embedded 2:1 mux in the slice, that would only bring it
down to about 171 LUTs and would not change the architecture at 8 levels
of logic.  

Care to share your techniques, both the large and the small one?


John_H wrote:
> 
> The carry chain in the Xilinx part can do the same thing as the Altera cascade
> chain if I recall correctly.  If the Xilinx MUXCY element passes a 1 on the carry
> and a zero when the LUT result is false, you get a wide AND cascade chain.  Wide
> word muxes can still take N/2 LUTs in the Xilinx architecture independent of which
> method you use.  The cascade chain would probably need a manual instantiaton in
> Xilinx, possibly in Altera.  A 4-1 mux ends up being the same in either
> architecture, really:  2 LUTs.  The rotator I was talking about ends up beating
> out the cascade approach significantly in either architecture.
> 
> rickman wrote:
> 
> > I am reaching back now, but I seem to remember that when it came to
> > implementing muxes the Altera parts (maybe only the 10K parts) have a
> > "cascade" backbone in each group of LEs that allows them to do very fast
> > muxes as well as AND-OR or just wide AND type logic.  The cascade logic
> > is a two input AND (or is it an OR?) gate that combines the cascade
> > input with the LUT output.  Although the delays are additive, they are
> > very short like a carry chain and can frequently beat an equivalent tree
> > mux.
> >
> > But to use the cascade chain for a mux you need to change your logic to
> > use decoded enables rather than encoded selects.  The number of LEs for
> > the mux then becomes N/2 where N is the number of inputs.  This can be
> > very optimal for wide word muxes where the decoding the enables uses
> > much less logic than what is saved in the mux.
> >
> > I don't remember Synplify doing a great job of synthesis with these
> > structures.  It may have worked well if you used a particular coding
> > style.  But otherwise it would only use two LEs instead of the four or
> > five that were optimal.

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 45577
Subject: Re: ALU in VHDL and a bunch of questions
From: "MikeJ" <mikejNO SPAM@freeuk.com>
Date: Sat, 27 Jul 2002 15:38:10 +0100
Links: << >> << T >> << A >>

Hi,

This is the add/sub part of the ALU from the risc5x core on opencores, where
you can also get the package and the generic vhdl / simulation model.

Output is A + B, A -B or A.

The trick is to force a one on the carry in when doing a subtract.
Logic usage : 1 slice for every 2 bits.

Some people argue (rightly so) that this level of code is unreadable. True,
it is, but you can build up a library of these things which have been
simulated to death (I have simulation models of LUT4, MUXCY etc) and then
just use them when you need them.

hope this helps,
Mike.
--
-- Risc5x
-- www.OpenCores.Org - November 2001
--
--
-- This library is free software; you can distribute it and/or modify it
-- under the terms of the GNU Lesser General Public License as published
-- by the Free Software Foundation; either version 2.1 of the License, or
-- (at your option) any later version.
--
-- This library is distributed in the hope that it will be useful, but
-- WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-- See the GNU Lesser General Public License for more details.
--
-- A RISC CPU core.
--
-- (c) Mike Johnson 2001. All Rights Reserved.
-- mikej@<NOSPAM>opencores.org for support or any other issues.
--
-- Revision list
--
-- version 1.0 initial opencores release
--
library ieee;
  use ieee.std_logic_1164.all;
  use ieee.std_logic_arith.all;
  use ieee.std_logic_unsigned.all;

--
-- op <= A +/- B or A
--
entity ADD_SUB is
  generic (
    WIDTH         : in  natural := 8
    );
  port (
    A             : in  std_logic_vector(WIDTH-1 downto 0);
    B             : in  std_logic_vector(WIDTH-1 downto 0);

    ADD_OR_SUB    : in  std_logic; -- high for DOUT <= A +/- B, low for DOUT
<= A
    DO_SUB        : in  std_logic; -- high for DOUT <= A   - B, low for DOUT
<= A + B

    CARRY_OUT     : out std_logic_vector(WIDTH-1 downto 0);
    DOUT          : out std_logic_vector(WIDTH-1 downto 0)
    );
end;

use work.pkg_xilinx_prims.all;
library ieee;
  use ieee.std_logic_1164.all;
  use ieee.std_logic_arith.all;
  use ieee.std_logic_unsigned.all;

architecture VIRTEX of ADD_SUB is

    signal lut_op       : std_logic_vector(WIDTH-1 downto 0);
    signal mult_and_op  : std_logic_vector(WIDTH-1 downto 0);
    signal carry        : std_logic_vector(WIDTH   downto 0);
    signal op_int       : std_logic_vector(WIDTH-1 downto 0);

    function loc(i : integer) return integer is
    begin
      return (((WIDTH+1)/2)-1) - i/2;
    end loc;

begin
  carry(0) <= DO_SUB;
  INST : for i in 0 to WIDTH-1 generate
    attribute RLOC of u_lut  : label is "R" & integer'image(loc(i)) &
"C0.S1";
    attribute RLOC of u_1    : label is "R" & integer'image(loc(i)) &
"C0.S1";
    attribute RLOC of u_2    : label is "R" & integer'image(loc(i)) &
"C0.S1";
    attribute RLOC of u_3    : label is "R" & integer'image(loc(i)) &
"C0.S1";
    attribute INIT of u_lut  : label is "C66C";
    begin
      u_lut :  LUT4
      --pragma translate_off
      generic map (
        INIT => str2slv(u_lut'INIT)
        )
      --pragma translate_on
      port map (
        I0 => ADD_OR_SUB,
        I1 => A(i),
        I2 => B(i),
        I3 => DO_SUB,
        O  => lut_op(i)
        );

      u_1 : MULT_AND
      port map (
        I0 => ADD_OR_SUB,
        I1 => A(i),
        LO => mult_and_op(i)
        );

      u_2 : MUXCY
      port map (
        DI => mult_and_op(i),
        CI => carry(i),
        S  => lut_op(i),
        O  => carry(i+1)
        );

      u_3 : XORCY
      port map (
        LI => lut_op(i),
        CI => carry(i),
        O  => op_int(i)
        );

  end generate;
  CARRY_OUT <= carry(WIDTH downto 1);
  DOUT <= op_int;
end Virtex;

<SNIP>

Article: 45578
Subject: Re: ALU in VHDL and a bunch of questions
From: dmitrik@mailandnews.com (Dmitri Katchalov)
Date: 27 Jul 2002 07:54:01 -0700
Links: << >> << T >> << A >>

Thank you guys for your valuable comments.

Dmitri

dmitrik@mailandnews.com (Dmitri Katchalov) wrote in message news:<3db7c986.0207250834.7ae051c6@posting.google.com>...

> I'm trying to synthesize a simple ALU.

Article: 45579
Subject: Re: ALU in VHDL and a bunch of questions
From: rickman <spamgoeshere4@yahoo.com>
Date: Sat, 27 Jul 2002 12:01:50 -0400
Links: << >> << T >> << A >>

A couple of comments for points that were not fully addressed.  

Dmitri Katchalov wrote:
> 
> Hi,
> 
> I'm new to FPGA. I'm trying to replicate PIC16Fxxx core as an exersize
> (any real programmer should write at least one OS and compiler :)
> 
> I'm trying to synthesize a simple ALU. I'm using VHDL and XST (WebPack).
> Target is SpartanIIE. It sortof works but is rather inefficient.
> At first I tried a big case statement for all ALU operations.
> XST happily infers lots of built-in macros (one for each ALU op)
> and a huge output mux. For example it produces 6 carry-chain adders
> (one for each ADD, SUB, INC, DEC and another two to get the
> half-carry bit for ADD/SUB) where I would think one is enough.
> 
> I've narrowed the problem down to a simple adder/subtractor:
> 
>    if add='1' then
>          Y <= A + B;
>    else
>          Y <= A - B;
>    end if;
> 
> This works fine, produces a single 8-bit adder/subtractor. 4 slices in total.
> But this does not give me carry/borrow bit.
> 
>    if add='1' then
>          Y <= ('0' & A) + ('0' & B);
>    else
>          Y <= ('0' & A) - ('0' & B);
>    end if;
> 
> produces 8bit adder with carry-out, a separate 9bit subtractor and
> a 9bit 2x1 mux. 9 slices. I tried different variations of the above
> with the same results.
> 
> Finally I have come up with the following code.
> It uses the fact that A-B = A +(-B) = A + ((not B) + 1).
> 
>   variable tmp: integer;
>   variable cin: std_logic;
> 
>   if op = '1' then
>       tmp := conv_integer(B);
>       cin := '0';
>   else
>       tmp := conv_integer(not B);
>       cin := '1';
>   end if;
> 
>   Y <= conv_std_logic_vector(conv_integer(A) + tmp + conv_integer(cin),9);
> 
> This infers 1 "9bit adder carry in" and 8 2x1 muxes and takes only 4 slices.
> Much better. One small detail: if I declare cin as integer instead
> of converting it from std_logic at the last step, I'm back to 9 slices.
> 
> Now the questions.
> 
> * Am I on the right track?

Yes, but this will be somewhat compilier dependent.  

> * I'm trying to describe purely combinatorial logic here. The output
> is supposed to be the same fixed boolean function of inputs no matter
> how it is described. Why such big variations (more than 2 times the area)?
> Is this a problem with the tool or they all like that?

As you said, "any real programmer should write at least one OS and
compiler", try writing code to or even just figuring out how to
translate this stuff into hardware.  Not so easy.  Compliers are simple
in comparison.  

> * Should I be tweaking XST settings instead? Is there a magic setting
> like "Do what I mean not what I say" :)

No, issues with carry and the like are not easy since different chips
deal with them differently.  So the compiler needs to be able to map to
different architectures.  

> * Xilinx lib has "8bit adder carry out" but it doesn't seem to have
> "8bit subtractor borrow out". Is this right?

Don't know, but as you found, an adder and a subtractor are the same
thing with inverters on one input and the carries.  

> * How do I get the half-carry bit out of the 8bit adder? I guess I can
> instantiate/infer two separate 4bit adders. Is there a better way?

The last time I tried to get a carry out of the middle of a carry chain,
I found that the Xilinx architecture does not support that without
breaking the carry chain.  So it will need to be done with two 4 bit
adders, as you say. 

> * What's the story with IEEE.std_logic.SIGNED vs .UNSIGNED? I heard that
> they are are mutually exclusive and math operations produce different
> results depending on which one is in use. Webpack automatically inserts
> IEEE.STD_LOGIC_UNSIGNED.ALL at the beginning of every VHDL source it
> creates. Should I always use UNSIGNED?

Both of these libraries are NOT IEEE standards.  They are Synopsis
proprietary IIRC.  So avoid using them and use the "numeric_std" library
instead.  

use IEEE.NUMERIC_STD.all;

> * Is there a decent on-line reference for all those IEEE.* libraries?
> I've found several good VHDL tutorials but none of them covers
> std_logic in details.

If you find one, let us all know.  Type conversion is the only thing I
have trouble with in VHDL.  I recently worked with some Verilog people
and could not convince them that VHDL was even viable because of all the
issues created by strong typing.  Verilog is much like C and lets you do
anything you want, no matter how stupid or wrong.  But then in a year of
coding, I only made two mistakes from that and it was the same mistake
twice!  Sometimes I am a little slow to learn  :)

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 45580
Subject: Re: can 555 be used as clock input to cplds
From: "Leon Heller" <leon_heller@hotmail.com>
Date: Sat, 27 Jul 2002 16:32:03 +0000 (UTC)
Links: << >> << T >> << A >>


"Falk Brunner" <Falk.Brunner@gmx.de> wrote in message
news:ahtt4h$vv1v5$1@ID-84877.news.dfncis.de...
> "suchitra" <ssbhide@rediffmail.com> schrieb im Newsbeitrag
> news:110cc2fe.0207262205.2e37143c@posting.google.com...
> > hello all
> > i just wanted to know that can 555 be used for cplds as clock input if
> > the frequency is very low something like 1 hz.
>
> Hmm, the datasheet says something about 100..300ns rise/fall time. This is
> terrible slow for a FPGA. You should use a 74xx14 (schmitt-trigger) after
> the NE555 to get fast edges.

An RC oscillator can be implemented with a 74HC14 inverter, and followed
with another of the inverters as a buffer.

Leon
--
Leon Heller, G1HSM
leon_heller@hotmail.com
http://www.geocities.com/leon_heller

Article: 45581
Subject: Re: 32-bit PCI Target core
From: "cfk" <cfk_alter_ego@pacbell.net>
Date: Sat, 27 Jul 2002 16:42:30 GMT
Links: << >> << T >> << A >>

First to Jeff:
    Jerr, I have been working with the opencore PCI device at
www.opencores.org. It consists of about 50 Verilog modules and contains both
a target and master implementation. It is also a bridge between PCI and
something called Wishbone which is a SoC (SystemOnChip) type of internal
bus. I have gotten it to synthesize with ISE 4.2 just fine and load into a
VirtexE where I am currently reading/writing configuration and memory
spaces. It looks like a very reasonable way to implement a PCI target.

Dear Kevin:
    If you dont mind, I would be most appreciative if you would e-mail the
Verilog code you described in your last post to me. It would be interesting
to compare it with the PCI opencore implementation to see differences in
design philosophy. Having two PCI implementations to compare strikes me as
very useful in trying to understand this somewhat complicated concept. My
e-mail address is cfk@pacbell.net.

Charles Krinke


"Kevin Brace" <killspam4kevinbraceusenet@killspam4hotmail.com> wrote in
message news:ahn2ha$9if$1@newsreader.mailgate.org...
>
>
> Jeff Reeve wrote:
> >
> > I'm looking for a synthesizeable 32-bit 33MHz PCI Target only design to
be
> > placed into a FPGA or large CPLD. Minimal implementation is fine. Does
> > anybody know if such a thing is available in VHDL or Verilog and is open
> > sourced? I seem to recall Xilinx publishing a target only design quite
some
> > time ago but I can no longer find it on their web site.
> >
> > Any help is much apprecieated!
> > Jeff
>
>
>         This is what you are probably talking about.
>
> ftp://ftp.xilinx.com/pub/applications/pci/
> ftp://ftp.xilinx.com/pub/applications/pci/00_index.htm
>
>
> For some reason, a Verilog version of the reference design is missing,
> but if you want it I can E-mail it to you (Some kind, long time Xilinx
> user sent it to me.).
> I also believe Lattice Semiconductor and Quicklogic also have their own
> PCI reference design (I know the Lattice one is written in Verilog, but
> not sure about the Quicklogic one.).
>         However, here is a caveat of using reference designs offered by
> device manufacturers.
> Even if the design is written in a device independent form (Uses generic
> Verilog or VHDL statements, and no vendor specific primitives.), when
> using reference designs offered by device manufacturers, you are often
> legally required to use the reference designs on their devices.
>         Opencores.org also has a free PCI IP core, but it is a lot more
> complex (Supports initiator and target transfers.) than any of the above
> mentioned reference designs, so I feel like you will likely have a hard
> time modifying it to suit your own needs.
>         When modifying a PCI interface, PCI specification Appendix B's
> state machine examples and the following article may be helpful.
>
> http://www.eedesign.com/editorial/1995/fpgafeature9502.html
>
>
>
> Kevin Brace (In general, don't respond to me directly, and respond
> within the newsgroup.)

Article: 45582
Subject: Re: Problem with mapping
From: "cfk" <cfk_alter_ego@pacbell.net>
Date: Sat, 27 Jul 2002 18:50:21 GMT
Links: << >> << T >> << A >>

Dear Broto:
    I can definitely tell you that the top.v that comes with the opencore
PCI interface will synthesize with all of its sub-modules and load into both
a Spartan and a VirtexE as I have done both. I have seen this "Unable to
combine" message a couple of times in the last month or so, and it
invariably had to do with my defining either two gates trying to drive the
same IOB or both a GCK input and a normal IOB input trying to come from the
same pin. Go back to the original TOP.v that came with Opencore's PCI
interface, synthesize that and then add your changes. Somewhere along the
way, the problem will become obvious.

Charles

>
> BROTO Laurent wrote:
> >
> > Hi!
> > I've succed to synthetize opencore PCI IP Core and now I try to do a top
> > with this core and another one.
> > I can synthetize without problem but when webpack try to map this top, I
get
> > the following error:
> >
> > ERROR:Pack:1107 - Unable to combine the following symbols into a single
IOB
> >    component:
> >     PAD symbol "CLK" (Pad Signal = CLK)
> >     BUF symbol "CLK_IBUF" (Output Signal = CLK_IBUF)
> >    Each of the following constraints specifies an illegal physical site
for
> > a
> >    component of type IOB:
> >     Symbol "CLK" (LOC=C11)
> >    Please correct the constraints accordingly.
> > Problem encountered during the packing phase.
> >
> > I would like to know how can I solve this problem.
> >
> > Thanks,
> >
> > BROTO Laurent

Article: 45583
Subject: Re: ALU in VHDL and a bunch of questions
From: Ray Andraka <ray@andraka.com>
Date: Sat, 27 Jul 2002 18:57:41 GMT
Links: << >> << T >> << A >>



rickman wrote:

> A couple of comments for points that were not fully addressed.
>
>

I think there is an adder/subtractor in the coregen, if you insist on using a
generated core.


>
> > * Xilinx lib has "8bit adder carry out" but it doesn't seem to have
> > "8bit subtractor borrow out". Is this right?
>
> Don't know, but as you found, an adder and a subtractor are the same
> thing with inverters on one input and the carries.
>
> > * How do I get the half-carry bit out of the 8bit adder? I guess I can
> > instantiate/infer two separate 4bit adders. Is there a better way?

It can be done, but it takes a little mind-bending.  Basically, you need to turn
your 8 bit adder into a 9 bit one with bit 4 being a dummy so that you can pull out
the carry out through the bit.  It takes a bit of caressing the tools to make them
infer it.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 45584
Subject: Re: logic elements v/s logic cells
From: John_H <johnhandwork@mail.com>
Date: Sat, 27 Jul 2002 19:09:27 GMT
Links: << >> << T >> << A >>

Maybe you got your math wrong and you really do envision the large solution.  64/2 x
log2(64) = 192, not 256.

The large rotator is the standard mux arrangement using 4:1 multiplexers, requiring 2
LUTs per element (either in the Altera or Xilinx) for 128 LUTs (64 slices) per stage.
Since two bits of the rotation are taken care of at each stage, it takes 3 stages to
accommodate the full 64 bit rotation.  Three stages of 4:1 mux give a 64:1 effective
multiplexer.At any stage of a rotator, N bits input map to N bits output so - in this
case - the 64 bit width is maintained in the interim stages as well as the output.
Three stages result in 3x128=384 LUTs.  The 4:1 muxes are inferred very nicely with
MUXF5 elements in the Synplify Xilinx flow.  The 6 address bits are used directly so
there is no replication required unless the three stages are pipelined.

As I was starting to put together the smaller solution, I realized I goofed once again.
The cross-coupled MUXF6 elements don't give me the 4 outputs per CLB for an 8 bit rotate
that I "remembered" but indeed just 2 outputs.  The number I supplied should have been
76, not 66, in the first place but to compound things the real value ends up being 140
slices (280 LUTs) because of my forgotten efficiency.  So, onto the (not as spectacular)
implementation . . .

In the Virtex(E)/Spartan-II(E) devices, the 4 LUTs can implement an 8:1 mux or two 4:1
muxes with the MUXFn elements and a 2 element select in each LUT.  The classic 8:1 mux
would have one select bit go to all the LUTs, another select bit to the two MUXF5s and
the third select bit to the MUXF6.  Interesting thing is there's still an unused MUXF6
in the CLB.  This extra MUXF6 can be tied to the same select control as the MUXF6 in the
standard 8:1 mux and the result is a bit that's 180 degrees out of phase for an 8 bit
rotate (assuming the 4 LSbits are in one slice and the 4 MSbits in the other).

Using simple 8:1 muxes in the 64 bit rotator would require 4 LUTs per bit per stage for
2 stages or 512 LUTs (256 slices).  8 unique inputs would be required for each bit at
each multiplex stage.  If rotators are used instead, the 8 inputs don't have to be
unique allowing us to take advantage of the other MUXF6 in the CLB.  Two stages of 8 bit
rotators don't quite make a 64 bit rotator without a little help.  If the first stage is
rotated in the simple sense, the second stage can be rotated partially by the "simple"
value and the rest rotated by that value the "simple value plus one.  A rotate of 37,
given the original ordering in the grid below, would be a rotate within the rows of 5
(37 mod 8) followed by a rotate between the rows in the same column of either 4 (37\8)
or 5 (37\8+1) where "\" indicated integer divide.  Be sure to view this in fixed space
font.

Original:
 3f 3e 3d 3c 3b 3a 39 38
 37 36 35 34 33 32 31 30
 2f 2e 2d 2c 2b 2a 29 28
 27 26 25 24 23 22 21 20
 1f 1e 1d 1c 1b 1a 19 18
 17 16 15 14 13 12 11 10
 0f 0e 0d 0c 0b 0a 09 08
 07 06 05 04 03 02 01 00

Rotate left 5:
 3a 39 38 3f 3e 3d 3c 3b
 32 31 30 37 36 35 34 33
 2a 29 28 2f 2e 2d 2c 2b
 22 21 20 27 26 25 24 23
 1a 19 18 1f 1e 1d 1c 1b
 12 11 10 17 16 15 14 13
 0a 09 08 0f 0e 0d 0c 0b
 02 01 00 07 06 05 04 03

Rotate up 4 or 5:
 _4 _4 _4 _5 _5 _5 _5 _5
 1a 19 18 17 16 15 14 13
 12 11 10 0f 0e 0d 0c 0b
 0a 09 08 07 06 05 04 03
 02 01 00 3f 3e 3d 3c 3b
 3a 39 38 37 36 35 34 33
 32 31 30 2f 2e 2d 2c 2b
 2a 29 28 27 26 25 24 23
 22 21 20 1f 1e 1d 1c 1b

The replication of the address bits for the control over n\8 vs n\8+1 needs to be done
for 7 of the 8 columns (the leftmost is always n\8 or the upper 3 bits).This decision
and replication increased the 128 slices to about 140.

A full 64 bit rotate in 2 stages with 73% of the resources.  Not quite the gains I
claimed but pretty respectable.

The technique can be applied to 4 bit rotators instead of 8 bit rotators (don't use one
LUT in each slice of the CLBs with the cross coupled MUXF6s) for 16 and 32 bit rotators
with good resources savings.

The resource savings might not be worth the trouble for many designs but there are gains
in speed due to reduced fanout and fewer stages of decode.

- John_H

rickman wrote:

> Interesting idea.  IIRC, the Altera cascade chain was inferred by
> Synplicity pretty well if we used predecoded enables.
>
> I am curious about the rotator you mention.  You said that you could
> implement a 64 bit rotator in 192 slices (384 LUTs?) with a standard
> method and 66 slices (132 LUTs) with an optimal technique.  I can only
> picture a N/2 x log(N) array of 2:1 muxes where N is 64 bits.  This
> gives 256 LUTs which is neither of your answers.  Even if you find a way
> to use an extra embedded 2:1 mux in the slice, that would only bring it
> down to about 171 LUTs and would not change the architecture at 8 levels
> of logic.
>
> Care to share your techniques, both the large and the small one?
>
> John_H wrote:
> >
> > The carry chain in the Xilinx part can do the same thing as the Altera cascade
> > chain if I recall correctly.  If the Xilinx MUXCY element passes a 1 on the carry
> > and a zero when the LUT result is false, you get a wide AND cascade chain.  Wide
> > word muxes can still take N/2 LUTs in the Xilinx architecture independent of which
> > method you use.  The cascade chain would probably need a manual instantiaton in
> > Xilinx, possibly in Altera.  A 4-1 mux ends up being the same in either
> > architecture, really:  2 LUTs.  The rotator I was talking about ends up beating
> > out the cascade approach significantly in either architecture.
> >
> > rickman wrote:
> >
> > > I am reaching back now, but I seem to remember that when it came to
> > > implementing muxes the Altera parts (maybe only the 10K parts) have a
> > > "cascade" backbone in each group of LEs that allows them to do very fast
> > > muxes as well as AND-OR or just wide AND type logic.  The cascade logic
> > > is a two input AND (or is it an OR?) gate that combines the cascade
> > > input with the LUT output.  Although the delays are additive, they are
> > > very short like a carry chain and can frequently beat an equivalent tree
> > > mux.
> > >
> > > But to use the cascade chain for a mux you need to change your logic to
> > > use decoded enables rather than encoded selects.  The number of LEs for
> > > the mux then becomes N/2 where N is the number of inputs.  This can be
> > > very optimal for wide word muxes where the decoding the enables uses
> > > much less logic than what is saved in the mux.
> > >
> > > I don't remember Synplify doing a great job of synthesis with these
> > > structures.  It may have worked well if you used a particular coding
> > > style.  But otherwise it would only use two LEs instead of the four or
> > > five that were optimal.
>
> --
>
> Rick "rickman" Collins
>
> rick.collins@XYarius.com
> Ignore the reply address. To email me use the above address with the XY
> removed.
>
> Arius - A Signal Processing Solutions Company
> Specializing in DSP and FPGA design      URL http://www.arius.com
> 4 King Ave                               301-682-7772 Voice
> Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 45585
Subject: Re: I want to buy 4 Xilinx FPGA
From: "Erik" <vikinger@uni.de>
Date: Sat, 27 Jul 2002 21:00:22 +0100
Links: << >> << T >> << A >>

Hi Hal Murray,

> I think the answer depends upon how many you want to build
> and who is going to be using them.

only One Device for a privat project.


> Several years ago, I had the same problem.

is the PCI-core-design better in a faster chip?
do you have lesser problems with the timing?


> I put a scope on the system I was interested in running in.
> I didn't see any reflections significantly over 3 V.
>
> We decided it was a risk we were willing to take.

i have the same results at my board


Greatings,
Erik

Article: 45586
Subject: Re: Design Techniques for Memory Mapped Registers.
From: hmurray@suespammers.org (Hal Murray)
Date: Sat, 27 Jul 2002 20:53:38 -0000
Links: << >> << T >> << A >>


>, or read/write config registers.  I guess there is a concern of
>metastability when a status bit is changing during a read, is this a problem
>I should be concerned with?

Metastability is evil.  Far better to avoid it with clean design
(even if it looks like overkill) than to have to track it down.

I/we got bit on a case like the one you are describing on a PCI
bus that computes parity.  A junk status but was changing during the
read cycle.

-- 
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's.  I hate spam.

Article: 45587
Subject: Re: can 555 be used as clock input to cplds
From: Jim Granville <jim.granville@designtools.co.nz>
Date: Sun, 28 Jul 2002 09:13:44 +1200
Links: << >> << T >> << A >>

suchitra wrote:
> 
> hello all
> i just wanted to know that can 555 be used for cplds as clock input if
> the frequency is very low something like 1 hz.
> regards

 Probably, but you might prefer to look at
- Tiny Logic SChmitt gates ( X14 ), in SOT23 packages
- XX4060 counter chains, in SO16/TSSOP16, that have 2^14 dividers,
so allow more precise and smaller/cheaper RC components, as well as 
a fast test mode.
 1Hz in a NE555 will be something of a lottery :)
-jg

Article: 45588
Subject: Re: ALU in VHDL and a bunch of questions
From: Eric Smith <eric-no-spam-for-me@brouhaha.com>
Date: 27 Jul 2002 14:37:53 -0700
Links: << >> << T >> << A >>

> * How do I get the half-carry bit out of the 8bit adder? I guess I can
> instantiate/infer two separate 4bit adders. Is there a better way?

Ray Andraka <ray@andraka.com> writes:
> It can be done, but it takes a little mind-bending.  Basically, you
> need to turn your 8 bit adder into a 9 bit one with bit 4 being a
> dummy so that you can pull out the carry out through the bit.  It
> takes a bit of caressing the tools to make them infer it.

Is there any advantage to doing that rather than two four-bit adders?
For instance, with two four-bit adders, does the synthesizer not
recognize that it can continue the carry chain between them?  Or
does the FPGA not allow you to tap the carry from intermediate stages
of the chain?

Article: 45589
Subject: Programming FLASH with Xilinx Parallel Cable III
From: ndesi@talk21.com (ndesi)
Date: 27 Jul 2002 14:42:46 -0700
Links: << >> << T >> << A >>

Hello :)

I want to configure my Virtex II parts with FLASH (Atmel or Intel).
I found app note how to do that
But question is How to Programm FLASH??

I am thinking of using CPLD and CPLD Logic convert "Xilinx Parallel
cable III" output to program FLASH

Is it possible?? 
Any other Idea??

I want to use Xilinx software and their cable but i can add  only CPLD
to my board.

Thanks In Advance

Article: 45590
Subject: SFL2VL now output compatible verilog with Exemplar
From: _nospam_nshimizu_at_bosei_cc_@bosei.cc.u-tokai.ac.jp
Date: 28 Jul 2002 03:16:23 GMT
Links: << >> << T >> << A >>

I update the SFL2VL that is a conversion program from
SFL(Structured Functional Language) to Verilog.
If you want a quick review on SFL, see Jan Gray's article on
http://www.fpgacpu.org/

The program is now compatible with Exemplar Leonard.
In the following web site, I placed the program with
some test suit such as:

m65: 6502 compatible processor
mz80: Z80 semi-compatible processor
my88: i8088 semi-compatible processr

The SFL2VL is free to use and redistribute, feel free to download it.

http://shimizu-lab.dt.u-tokai.ac.jp/pgm/sfl2vl/index.html

Enjoy.
---------------------------------->--------------------------->>
Naohiko Shimizu

Department of Communications Engineering,
School of Information Technology and Electronics, Tokai University
1117 Kitakaname Hiratsuka 259-1292 Japan
TEL.+81-463-58-1211(ext. 4084) FAX.+81-463-58-8320
http://shimizu-lab.dt.u-tokai.ac.jp/
<<--------------------------------<-----------------------------

Article: 45591
Subject: Re: Translate the design from FPGA to Custom IC
From: dudu <dudu@dudu.com>
Date: Sun, 28 Jul 2002 05:20:06 GMT
Links: << >> << T >> << A >>

> I have one more quesiton:
> Our company have modelsim and Tanner L-edit,
> What other tools I need for complete IC development?
>
> Which part of tools is free and which must buy?
> (Personally, I am interested to design a chip for practise, so, I do not
> need powerful tools for me).

Well if you're designing an ASIC with just digital-logic (no analog
blocks or other 'custom IP', like a custom-layout multiplier block),
and you want to carry the design all the way through the 'backend'
process, at a minimum you need the following:

  #1) synthesis tool (example, Synopsys Design Compiler)
  #2) place&route tool (example, Cadence PKS)
  #3) clock-tree insertion (not sure, could be part of #1 or #2?!?)
  #4) design rule-check, layout verification?!? (not sure)

I'm not aware of any "free" development tools.  The ones I list
above are all commercial, and range in cost (for 1 year license)
from $90,000 USD upward of $1 million USD.

unfortunately, I'm not that familiar with the backend process, so
I'm not 100% certain about the tools.  There's some overlap in
capability among the vendors.  For example, Cadence has a synthesis
tool (Buildgates), which you can acquire along with their PKS tool.

Synposys is in the process of acquiring Avanti, so when all is said
and done, Synopsys will offer a place&route tool, too.

You are better off posting this sort of question in comp.cad.cadence
(where you'll get answers heavily biased toward Cadence's tools!),
or comp.lang.verilog and comp.lang.vhdl.

Article: 45592
(removed)

Article: 45593
Subject: Re: ALU in VHDL and a bunch of questions
From: Ray Andraka <ray@andraka.com>
Date: Sun, 28 Jul 2002 14:20:38 GMT
Links: << >> << T >> << A >>

In order to tap the carry chain you need to add an extra bit in the carry
chain.  The synthesis tools won't do that for you, and in fact will not
infer a caryy chain for less than about 7 bits.  Using 2 four bit counters
you incur the delay to get off and then onto the second chain, where with a
single chain you only incur ~100ps.  With 2 4 bit counts, it is likely not
your worst case path anyway, so for the sake of simplicity, readability and
maintainability of the code, it is probably better to just infer them as
separate counters.  My point was that what you asked about could be done,
but it is not done automatically by the tools and it takes a bit of
finabling to make it work.

Eric Smith wrote:

> > * How do I get the half-carry bit out of the 8bit adder? I guess I can
> > instantiate/infer two separate 4bit adders. Is there a better way?
>
> Ray Andraka <ray@andraka.com> writes:
> > It can be done, but it takes a little mind-bending.  Basically, you
> > need to turn your 8 bit adder into a 9 bit one with bit 4 being a
> > dummy so that you can pull out the carry out through the bit.  It
> > takes a bit of caressing the tools to make them infer it.
>
> Is there any advantage to doing that rather than two four-bit adders?
> For instance, with two four-bit adders, does the synthesizer not
> recognize that it can continue the carry chain between them?  Or
> does the FPGA not allow you to tap the carry from intermediate stages
> of the chain?

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Article: 45594
(removed)

Article: 45595
(removed)

Article: 45596
Subject: Bit serial arithmetic Vs Digit serial Arithmetic
From: hristostev@yahoo.com (hristo)
Date: 28 Jul 2002 07:53:22 -0700
Links: << >> << T >> << A >>

hello,
may be basic question
if someone has to implement an FIR using bit serial, he has to see the 
output wordlength, thus the FIR bit growth. Then, he needs to expand the
input data with zero to have regular wordlength through the structure

in parrallel we have not to do that

what about digit serial, should we still need to expand the input data
with zero digits

Many thanks

Article: 45597
Subject: Re: How to implement efficient wide word comparator?
From: "Daryl" <caoqh@engineer.com.cn>
Date: Mon, 29 Jul 2002 13:50:39 +0800
Links: << >> << T >> << A >>

Thanks a lot, John_H.
It is very useful to me.

Best Regards,
Daryl
"John_H" <johnhandwork@mail.com> Ð´ÈëÏûÏ¢ÐÂÎÅ:3D3E3E00.2796573F@mail.com...
> You're trying to @(posedge clk) increment the counter and provide a
> comparison value on... the new value?  The old value?  In the telecom
> stuff I worked with, there were typically frame counters to track the
> bytes and provide gates for various operations.  If you only need one
> gate signal, things are too simple.  If you need a separate gate for
> each of 20 bit positions, it's a little tougher but your speeds should
> be extreme with a little care.
>
> If you're doing an equality compare for each gate, there are two ways to
> do it, with a tree or a carry chain.  I'll be playing with my first
> Virtex-II in a week or two but I've heard the carry chains aren't as
> effective as they were in the Virtex-E parts but they should still
> provide excellent results.
>
> A 14 bit *constant* equality compare in a tree would require 3.5 LUTs
> for the first level of comparison and another LUT to assemble all those
> together.  Since there are 4 slices (8 LUTs) in one Virtex-II CLB, this
> should scream!  If it's a variable equality compare, the 7 LUTs feeding
> 2 LUTs feeding 1 final LUT isn't as clean but you should still get great
> speed.  One of the key factors is that the *registered* count value
> needs to be compared to a constant or a *registered* comparison value.
>
> The carry chain is probably better for a 14 bit equality compare since
> the 7 LUTs can cascade into one carry chain.  If you want to do a 98 bit
> equality compare, you could assemble the 7 bit carry chains into a
> series of (horizontal) cascade ORs (if that's what they're called - I
> won't look it up now).
>
> The point is, things should scream in either format compared to the
> speeds you're getting.
>
> Check out your logic and routing delays to see how your timing goes from
> source register to destination.  Ask yourself if some of the stages can
> be pipelined.  One of the beautiful things about counters is that they
> increment predictably!  (Unless they decrement)
>
> You could assemble a huge comparison tree and register each level to
> attain outrageous pipelined speeds.  Look at your requirements and
> figure out what you can back into a previous pipeline stage.  Very good
> things should come together with nice design work.
>
> An example of a counter with a single compare output (apologies if
> you're VHDL):
>
> always @(posedge clk)
>   if( count == max_count ) count <= 0 + ena;
>   else                     count <= count + ena;
> assign out_gate = (count == max_count);
>
> The structure above isn't very efficient because a wide compare is
> needed in the logic while it isn't needed in the design.  The logic may
> not synthesize into a simple counter, either, requiring two stages of
> logic for the counter to add to the compare.
>
> You could use a registered compare of
>
>   out_gate <= (count == max_count - 1) & ena;
>
> which (in the always block) has the gate go active when you want it.
>
> But you could do better by resetting your counter with a different
> value:
>
> always @(posedge clock)
>   if( out_gate )  {out_gate,count} <= {1'b0,-max_count} + ena;
>   else            {out_gate,count} <= count + ena;
>
> Note that the gate is now synchronous and there is NO compare required.
> (Apologies that things look a little strange... the constant "max_count"
> should be dimensioned the same as the "count" vector so the out_gate
> initializes properly false)
>
> The structure can be made "synthesis friendly" to use one level of
> synthesized logic (if it doesn't already) by using an equation that's
> more friendly to the Xilinx carry chain configuration:
>
> always @(posedge clock)
>   {out_gate,count} <= (out_gate ? {1'b0,-max_count} : count) + ena;
>
> The conditional operator works in place of the if/else construct and
> "fits" in the carry structure.
>
> Many things to do.  Happy coding!
>
> - John_H
>
>
>
>
> Sniper Daryl wrote:
> >
> > Here,
> >
> >    I am Daryl and I have to trouble you. :-)
> >
> >    When I design a chip used for optical network, a lot of effort must
> > be made to increase the clock speed and reduce the chip resource cost.
> > In a timing interface module, there is a counter with 14-bit width to
> > provide timing to the outgoing frame. So, a comparator used to compare
> > the counter word with a series of registers set by the controller.
> > I've notice that the slices cost increases seriously and the maxinum
> > clock speed decreases a lot, when the counter and the comparator get
> > wider.
> >
> >    Troubled with it, I firstly tried a wider counter(14-bit) and a
> > narrower comparator(4-bit) and got 20MHz upgrade of speed and more
> > than 20 slices saving. Then, a 4-bit counter and 14-bit comparator
> > with a result of 10MHz upgrade and about 10 slices saving. So, I think
> > the critical factor is the wide comparator. This is proved by studying
> > the report and schematics from the synthesis tools(FCII3.6.1 and
> > Synplify Pro with Amplify).
> >
> >    To improved the performance, I've tried to use CoreGen tool to
> > generate a core of comparator. But,after implement, the result is no
> > better than from myselft code.
> >
> >    The synthesis tool I used is FCII 3.6.1, the device is
> > VirtextII1000, implement by ISE4.2SP3. Here is the result of my trials
> > :
> >
> >       14-bit counter,  14-bit comparator and other logic :      63
> > slices used(36 FFs and 105 LUTs);     95MHz
> >
> >       4-bit counter,    14-bit comparator and other logic :      50
> > slices used(26 FFs and 85 LUTs);      115MHz
> >
> >       14-bit counter,    4-bit comparator and other logic :      41
> > slices used(26 FFs and 62 LUTs);       127MHz
> >
> >    Would you give me some advice about it from your experience? Or
> > some resource to study?
> >
> >
> >
> > Thanks in advance for you time!
> >
> > Daryl

Article: 45598
Subject: secure FPGA
From: "Thomas Wollinger" <wollinger@crypto.ruhr-uni-bochum.de>
Date: Mon, 29 Jul 2002 08:50:02 +0200
Links: << >> << T >> << A >>

Hi everybody,

I am looking for a FPGA that I have to use in a secure manner. I have to do
a project in which the FPGA is use in a 'military' like environment.

Can somebody tell me which vendors and FPGA families are out there with
'special' security features? and which security features that are and what
are they good for?

The FPGA can use any technology (SRAM, antifues, flash ...), but it has to
be secure against as much attacks as possible.

Thanks everybody for your time.

I really appreciate your help

Thomas



P.S: If you like to email me, just delete XY in the following email-address:
wollingerXY@crypto.ruhr-uni-bochum.de

(I do not know if this is the right newsgroup to post this question - if not
please could somebody let me know where is a better place to post.)

Article: 45599
Subject: Re: ALU in VHDL and a bunch of questions
From: dmitrik@mailandnews.com (Dmitri Katchalov)
Date: 29 Jul 2002 01:42:35 -0700
Links: << >> << T >> << A >>

Thanks again everyone.

Using your suggestions I've managed to implement PIC-style 
ADD/SUB/INC/DEC with carry and half-carry out in just 4 slices, see code below.
I'm not sure about the polarity of the borrow bit though.

Synthesis infers 2 5-bit adders, later optimised into 4-bit 
adders with carry in/out. P&R places them in one column one immediately 
on top of another (in otherwise empty FPGA). I don't have suffucient 
knowledge to tell from all those the reports whether the carry chain 
is broken or continues over. It does seem to continue over.

Here is the code, comments appreciated.

Regards,
Dmitri

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity alu_adder is
  Port ( A,B: in std_logic_vector(7 downto 0);
          op: in std_logic_vector(1 downto 0);
	   Y: out std_logic_vector(7 downto 0);
	carry_out: out std_logic;
	dc_out:    out std_logic );
	constant ADD : std_logic_vector(1 downto 0) := "00";
	constant SUB : std_logic_vector(1 downto 0) := "01";
	constant DEC : std_logic_vector(1 downto 0) := "10";
	constant INC : std_logic_vector(1 downto 0) := "11";
end entity alu_adder;

architecture Behavioral of alu_adder is
begin
  process( A, B, op )
     variable tmp: std_logic_vector(7 downto 0);
     variable lo_nibble, hi_nibble: unsigned(5 downto 0);
     variable cin: std_logic;
  begin
    case op is
      when INC =>	tmp :=  (others => '0'); cin := '1';
      when DEC =>	tmp :=  (others => '1'); cin := '0';
      when SUB =>	tmp := not B;	cin := '1';
      when ADD =>	tmp := B;	cin := '0';
      when others =>	tmp := (others => '-'); cin := '-';
    end case;

    lo_nibble := unsigned('0' & A(3 downto 0) & cin ) + 
                 unsigned('0' & tmp(3 downto 0) & cin );

    hi_nibble := unsigned('0' & A(7 downto 4) & lo_nibble(5) ) + 
                 unsigned('0' & tmp(7 downto 4) & lo_nibble(5) );

    Y <= std_logic_vector( hi_nibble(4 downto 1) & lo_nibble(4 downto 1));
    dc_out    <= lo_nibble(5);
    carry_out <= hi_nibble(5);
  end process;
end architecture Behavioral;



Ray Andraka <ray@andraka.com> wrote in message news:<3D43FE74.BC6780AD@andraka.com>...
> In order to tap the carry chain you need to add an extra bit in the carry
> chain.  The synthesis tools won't do that for you, and in fact will not
> infer a caryy chain for less than about 7 bits.  Using 2 four bit counters
> you incur the delay to get off and then onto the second chain, where with a
> single chain you only incur ~100ps.  With 2 4 bit counts, it is likely not
> your worst case path anyway, so for the sake of simplicity, readability and
> maintainability of the code, it is probably better to just infer them as
> separate counters.  My point was that what you asked about could be done,
> but it is not done automatically by the tools and it takes a bit of
> finabling to make it work.
> 
> Eric Smith wrote:
> 
> > > * How do I get the half-carry bit out of the 8bit adder? I guess I can
> > > instantiate/infer two separate 4bit adders. Is there a better way?
> >
> > Ray Andraka <ray@andraka.com> writes:
> > > It can be done, but it takes a little mind-bending.  Basically, you
> > > need to turn your 8 bit adder into a 9 bit one with bit 4 being a
> > > dummy so that you can pull out the carry out through the bit.  It
> > > takes a bit of caressing the tools to make them infer it.
> >
> > Is there any advantage to doing that rather than two four-bit adders?
> > For instance, with two four-bit adders, does the synthesizer not
> > recognize that it can continue the carry chain between them?  Or
> > does the FPGA not allow you to tap the carry from intermediate stages
> > of the chain?

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search