Messages from 37625

Article: 37625
Subject: Re: ISP by JTAG using a microcontroller
From: Randal Kuramoto <Randal.Kuramoto@Xilinx.com>
Date: Mon, 17 Dec 2001 18:05:37 -0800
Links: << >> << T >> << A >>

This is a multi-part message in MIME format.
--------------2700911FC1ADC8F92BF29D38
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Alco,

Are you using an implementation of the XAPP058 XSVF player code?

My answers to your original questions are inserted in the quoted text below...

Send me direct email if you need more help.

Randal

alco wrote:

> Hello,
>
> I have been using a 8051 controller to program a XC9536 cpld using the JTAG
> interface. I have programmed several hundreds of units in the last couple of
> years without any problems, but now programming the cpld by the 8051 fails
> for more than half the units produced. The PCB's do not show shorts or open
> connections or other production errors. With some care the cpld (while on
> the pcb) was connected to a Multilinx programming device which succesfully
> programs the cpld.
>
> The programming algorithm on the 8051 is based on Xilinx app notes XAPP058
> and XAPP067. The CPLD fails to generate the correct TDO output for
> verification after a program or erase instruction. The scope shows
> appropriate XRUNTEST idle times for these instructions (640us/1300ms). TCK
> period is larger than 2us.
>
> On the scope I compared JTAG output from the 8051 and the multilinx device.
> The only real difference is that the multilinx does not instruct the cpld to
> return to Run/test/idle mode after update-IR (see figure 7 in XAPP058) for
> the instructions 'isp enable', 'erase' and 'program', which are then
> immediately followed by a SDR instruction.
>
> Questions:
>
>  - Has anything changed recently in the JTAG interface for the xc9536 that
> might cause a microcontroller to fail programming the cpld.

Nothing has recently changed with the physical device.  Many (2-3) years ago,
the XC9536 fabrication facility was changed.

Software changes have continuously occurred:
1.  The XAPP058 SVF2XSVF translator and XSVF player have been updated to v4.xx)
that support new additions to the Xilinx family.  The SVF2XSVF v4.xx translator
output may not be compatible with the old XSVF player.  You should continue
using the old SVF2XSVF translater if you have an older XSVF player
implementation.
2.  The new download software called iMPACT was released in the Xilinx 4.1i
software package.  The iMPACT SVF output is not compatible with the older XSVF
tools.

> -  Where do i find detailed information on JTAG timing and xc9536 timing
> during programming. It is not found in the Xilinx data book.

The only details are in the combination of the appnotes you listed, the XC9536
data sheet, and the iMPACT/JTAGProgrammer SVF output file for the XC9536.

> - Are there timing constraints associated with the Jtag interface or xc9536
> other than the TCK period and XRUNTEST idle times?

Since you have read XAPP067 and XAPP058, then you are aware of the conditional
"retry" loop.  On a TDO test failure, the retry is invoked and the prior
XRUNTEST time should be applied again.  Actually, it's not guaranteed that the
device will erase/program with the given XRUNTEST time.  When the retry loop is
invoked, it is safer to account for the longer device timing by incrementing the
applied XRUNTEST time by 10-25% on each retry attempt.

> - besides the xc9536.bsd file there exists a xc9536_v2.bsd file. The device
> marking does not say so, but could the device be a v2 type that must be used
> with the other bsd file? And if so, how do i use that files, i can not
> select a version-2 9536 as a target device.

The v2 BSDL file is for the XC9536 that is currently being made.  It's the
revision 2 device.  The switch to this revision occurred with the fabrication
facility change about 2-3 years ago.  The v2 XC9536 is compatible with the
original device.

JTAG Programmer always generates SVF programming vectors for the original XC9536
which, as I mentioned, is forward compatible to the newer revisions.

iMPACT generates SVF for only the XC9536 v2 version.

Randal

--------------2700911FC1ADC8F92BF29D38
Content-Type: text/x-vcard; charset=us-ascii;
 name="Randal.Kuramoto.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Randal Kuramoto
Content-Disposition: attachment;
 filename="Randal.Kuramoto.vcf"

begin:vcard 
n:Kuramoto;Randal
tel;fax:408-626-4216
tel;work:408-879-4819
x-mozilla-html:TRUE
url:http://www.xilinx.com
org:Xilinx, Inc.;Configuration Solutions Division
adr:;;2100 Logic Drive;San Jose;CA;95124-3400;USA
version:2.1
email;internet:Randal.Kuramoto@Xilinx.com
title:Applications Engineer
fn:Randal Kuramoto
end:vcard

--------------2700911FC1ADC8F92BF29D38--

Article: 37626
Subject: Re: Xilinx ChipScope - experiences ?
From: Phil Hays <spampostmaster@home.com>
Date: Tue, 18 Dec 2001 04:25:58 GMT
Links: << >> << T >> << A >>

Rick Filipkiewicz wrote:

> [essential ... helpful ... difficult to use ... waste of time & gates] ?

A vote for "Helpful".  My latest design couldn't use it, as I use all of the
block rams, but the gentlemen in the next office over did use it, and while they
did swear at it they also seemed to find it quite useful.  I would hope that
some of the bugs that bit them are now fixed, so I would think it's probably
worth trying.


-- 
Phil Hays

Article: 37627
Subject: Re: newbie Xilinx Foundation ISE4.1 questions
From: Ukkdl <Ikdkl@iduiofsd.com>
Date: Tue, 18 Dec 2001 06:23:05 GMT
Links: << >> << T >> << A >>

> > Last year, I used Xilinx Foundation Express 3.3i, to develop
> > for a Virtex300 part.  I recently went to Xilinx's hoomepage,
> > and found that the 'Foundation ISE' has replaced the older
> > Foundation (non-ISE.)
> >
> > Does this mean :
> >
> >   1)  goodbye old Windows 16-bit legacy code
> >      (3.3i would crash on average, every 4-6 compiles, and
> >       sometimes take down my NT4 workstation)
> 
> What the heck were you doing with your computer that it would cause it
> to crash?  I ran NT 4 SP6 with the Xilinx service packs, and it never
> crashed the computer.

Well, the crashes seem to center around the 'project manager' (the top
level Foundation application.)  When I clicked the "compile" button, it
would ask me if I wanted to update some "libraries" (I'm sorry I don't
remember the exact dialogs, this was over a year ago.)  After clicking
yes, I'd take a deep-breath and cross my fingers.  Sometimes the app
would crash here, with a Win16 dialog box.  (I have Visual Studio 6
installed on my system, and every other app-crash has a 'debug' button.
The Foundation Project Manager, Schematic Capture, and a few other
Xilinx
things did NOT.)

Once the 'implementation' window pops-up, which shows the progress of
the
compile, map, par, etc., etc., then the application would always 
complete successfully.

Once I get a crash, I can quit the program.  If I try to re-run it, and
recompile, the whole program just hangs.  So I have to reboot

Article: 37628
Subject: Re: division 64
From: Muzaffer Kal <muzaffer@dspia.com>
Date: Tue, 18 Dec 2001 06:32:03 GMT
Links: << >> << T >> << A >>

On Mon, 17 Dec 2001 10:31:33 -0500, "Pallek, Andrew [CAR:CN34:EXCH]"
<apallek@americasm01.nt.com> wrote:

>If you just want to devide by 64, shift right by 6 places.  The modulo is what was shifted
>out.

what if the dividend is negative ?

Muzaffer Kal

http://www.dspia.com
DSP algorithm implementations for FPGA systems

Article: 37629
Subject: Hardware FPGA questions
From: dottavio@ised.it (Antonio)
Date: 17 Dec 2001 23:25:31 -0800
Links: << >> << T >> << A >>

Some hardware question on FPGA :

1) What's the difference between a part with speed -3 and another with
speed -4 , the number is the number of metal layers ??

2) I read data sheet of Virtex and Virtex E, I didn't found really
much difference, can you explain me which is better and why ??

Thanks

Article: 37630
Subject: Altera vs Xilinx
From: arlington_sade@yahoo.com (arlington)
Date: 18 Dec 2001 00:30:18 -0800
Links: << >> << T >> << A >>

I'd like to hear more inputs from current users.

So I am posting this whole thing from the faq here. Please extract
only portions you want to comment on. Thanks.


===========================================================

FPGA-FAQ    0014
How do I choose between Altera and Xilinx?
Vendor Both 
FAQ Entry Author Martin Thompson 
FAQ Entry Rebuttal/Commentary Anonymous Altera Fan 
FAQ Entry Additional Analysis Ray Andraka 
FAQ Entry Editor Philip Freidin 
FAQ Entry Date 7 June 2001 

One of our many readers suggested that the only way to read this
particular page was to do it
listening to Arlo Guthrie's Alice's Restaurant,  and I agree. Here it
is !
  
  Q. How do I choose between Altera and Xilinx? 
A.
Here's a quick (well actually its not as quick as I expected)
description of how I chose between Altera and Xilinx. I was comparing
Altera's Flex/APEX with the Xilinx's XC4000/Virtex families. Some
comments are thrown in about Virtex and Apex and so on, but I will be
the first to admit I haven't done any more than read the datasheets.
The Flex10KE I have used in anger.
A lot of what I have learnt about the two families has come from
reading comp.arch.fpga, and in particular, a discussion I had with Ray
Andraka. Most of the rest came from reading datasheets and appnotes.

Due to the discursive nature of this bit, it is indented with later
comments being more indented...

Logic
The structure of the Xilinx logic cells is well suited to arithmetic
structures, compared to the Altera Flex/Apex structure, due to the
ability to generate both output and carry from one logic cell.
Altera's 4-LUT is divided into two 3-LUTs for arithmetic.
I think you misunderstand the basic Altera Logic element.  The carry
and sum outputs are implemented in a single logic cell.  Just as a
Xilinx logic cell can be reconfigured to act as a 16-bit memory cell,
the Altera logic cell has several configurations optimized for
arithmetic, counters, or misc logic.  There is no speed penalty for
using these modes and there is no inherent advantage to the Xilinx
logic cell with regard to arithmetic.  I hear all kinds of claims
about Xilinx architectural advantages, but I have never heard even the
most ardent Xilinx user claim that tha Xilinx architecture has a
arthmetic advantage in the logic cells.
I think the point is that Altera can only do a 2-bit arithmetic
operation with carry in and out - in Flex at least (ref. figure 11 in
the 10KE datasheet). Also, with Flex, if you use a CE that has to take
the place of an arithmetic input as well, leading to a one input
function (without using cascade chains to make it wider, with their,
admittedly small, routing delay).
Xilinx does have a distinct advantage over Altera when it comes to
arithmetic circuits. For arithmetic, the Altera 4-Lut does indeed get
partitioned into a pair of 3-Luts: one for the 'sum' function, and one
for the 'carry' function. One input to each of these is connected to
the carry out from the previous bit. As a result, you are limited to a
two input arithmetic function if you wish to stay in one level of
logic. Arithmetic functions with more than two inputs, such as
adder-subtractors, multiplier partial products, mux-adds, and
accumulators with loads or synchronous clears (this last one is
addressed by improvements in the 20K family) require two levels of
logic to implement.  The Xilinx logic cell does not use the Lut for
the carry function; it has dedicated carry logic.
The 4K/Spartan families use one Lut input to connect the carry chain,
leaving three inputs for your function.  Virtex, VirtexII, and
SpartanII have a dedicated xor gate after the Lut to do this, so these
devices can handle 4 input arithmetic functions without having to go
to two levels. The relatively limited arithmetic function of the
Altera parts means as much as twice the Luts are used in heavily
arithmetic applications. Two levels of logic also equates to a
significant performance penalty, everything else being equal.

Xilinx's logic cells can also be used as 16 bit shift registers or
16x1 SRAMs for small amounts of storage. In addition, in Virtex there
are BlockRAMs which are larger blocks of dual-ported memory. Altera
only has large blocks of RAM called EABs which are configurable
between 256x16bits and 4096x1bit. They are also only partially
dual-ported (one read and one write port).
The ability to convert the Logic cell into memory is a neat feature. 
This is one of the key differences in the architectures.  My only
comment on that is that it isn't used as much as you might think. 
Xilinx parts have a much lower logic cell count relative to device
size since they include so much RAM (example: XVC600E: 13.8K logic
cells, 288kbits RAM,  20K400E: 16.6K logic cells, 208Kbits RAM). 
Because of this it doesn't usually make sense to take away your less
abundant resource (logic cells) to create more of something you
already have lots of (memory).  None-the-less it is a neat and
sometimes quite useful feature.
For DSP designs, the CLB RAM capability is another significant
advantage over the Altera offerings. DSP designs tend to have many
small delay queues (filter tap delays, for example) which use up a lot
of logic cells if implemented as flip-flops, or severely under-utilize
block memories if done there. By using the CLB RAMs (or in the case of
Virtex, the shift register mode), you get up to a 17:1 area reduction
over using Lut flip-flops.  Similar reductions come into play for
designs having register files and small fifos.  The Virtex SRL16
primitive also gives you the capability to reload Lut contents without
reconfiguring the device.  This makes it possible to have
re-programmable coefficients in a distributed arithmetic filter for
instance. There is simply no equivalent capability in the Altera
devices. My Virtex designs typically have more than half of the Luts
configured as SRL16's.

(This is comparing marketing gate counts (600E vs 400E) , actual logic
cells (Xilinx actually claims 15.5K, but 13.8K is the actual number of
4LUTS). The 288Kbits of RAM in Xilinx is the block RAM, there can be
upto 216Kbits more in the LUTs (which would leave zero for logic). The
208K for Altera is for block RAM only. For each user, a better measure
might be to find the product in each vendors product line that can
hold a given design, and compare actual price. This gets away from
inflated gate and RAM claims, and whether or not it makes sense to
trade logic for RAM)

Precisely - as with many of the architectural differences, if you need
the feature, its brilliant, otherwise, it has no (or even a negative)
impact.
As far as the memory blocks go, the Altera blocks have built-in
circuitry to allow them to be used as CAM (content addressable
memory).  The Altera CAMs have a huge performance advantage over
trying to implement CAMs in Xilinx devices using memory blocks and
some logic.  The Altera memory blocks can also be used to implment
fast, wide, product-term logic. (Xilinx block RAMs can too) This is
useful, for example, for implementing a wide address decode in few
levels of logic. With that said, I will agree that the Xilinx
dual-port mode is more full-featured than the APEX 20KE dual-port
(although the advantage disappears when you compare Virtex II vs. Apex
II).
This is with APEX and later families. As far as I can tell, the Flex
devices don't have this ability. Again, great if you need it!
On the subject of block memories, the advantages of one over the other
are not as clear. Xilinx does have a true dual port capability where
Altera's memory is at best (depends on the family) a read only port
and a write only port through the 20K family.  This is fine for many
designs, so unless you need it, not having it is not a problem. 
Altera does have two very nice unique capabilities in the 20K
memories: a CAM mode and the product term mode.  The CAM is more than
nice to have for network apps and places where you need to sort data.
While you can do a CAM in Xilinx, the design is neither trivial nor
particularly fast (either the fetch or the write operation has to take
multiple clocks; see the Xilinx app notes for details). The product
term capability is reminscent of a CPLD, which is very handy when
dealing with big combinatorial functions such as address decodes.
The flipflops in the logic cells differ in that the Xilinx logic cell
has a dedicated clock enable input, whereas Altera use one of the
inputs to the LUT to create a CE signal. In addition the Altera flip
flops only have a clear input. If you want a preset, the tools will
put NOT gates on the input and output of the DFF. Which means that you
can't have a preset flipflop implemented in the I/O cell - therefore
your Tco can suffer badly. The diagram in the datasheet implies a
preset input, but on reading the text you discover the truth!
The Altera does have a true clock enable on the LE flip-flop but
(except for the 20K) it shares an input to the LE with one of the Lut
inputs, so using the clock enable reduces the available functionality
of the Lut.  In the case of arithmetic logic, using the CE limits you
to a single input for one level of logic.
FLEX 8000:  No clock enable.  Software emulates the clock enable by
building it into the logic

FLEX 6000:  No clock enable.  Software emulates the clock enable by
building it into the logic

FLEX 10K, 10KE, ACEX 1K:  Clock enable uses one of the LUTs data
inputs (per the authors original comment)

APEX 20K, 20KE, Mercury, APEX II: Regular clock enable. 
  

The logic cells allow you to implement EITHER an asynchronous clear OR
an asynchronous preset.  You can't do both without using additional
logic cells, but you can implement either, even in the I/O cell.  By
the way, the tco increases by only 0.233ns when using a register near
the periphery rather than a register in the I/O cell (APEX
EP20K30ETC144-1).

Provided you can get the register to be consistently  located adjacent
to the IOB (can be difficult as the device gets full).  Depending on
registers placed in the core rather than in the IOB leads to external
timing being a function of the place and route solution...not a good
thing.  Incidently this is also a problem in the 10K if you need
bi-directional I/O since there is only one flip-flop in the IOB.

If I can return to the Flex architecture, which is what I began the
article comparing, according to the 10KE datasheet, an async preset is
implemented in one of two ways:
Using the clear and inverting both input and output. Inverting the
input is 'free' but inverting the output requires a LUT between your
register and the pin. Hence, its not just a case of putting the
register not in the I/O element, there's extra logic to consider.
Admittedly, I missed the other way of doing it, which is to use one of
the LUT inputs as a preset. But then you've lost a LUT input, so
that's not always possible either.
Altera's new Mercury family has a different logic structure, including
two carry chains, so the arguments are probably different. I haven't
had time/inclination to do any detailed analysis.
I/O
Both families offer similar I/O families. The biggest difference is
that the Altera I/O cell has a single register, which can be used as a
output, input or OE register. The Xilinx I/O has all three available
for use. Note that the diagrams in the Altera datasheet implies that
they have the same capability, but on reading the text you find that
the picture shows all the possibilities at once!
You're right about that diagram in the datasheet. Also, you can't use
the register in the I/O cell for the OE either - just input or output.
 However, note the comment above regarding using nearby registers not
in the I/O cell.  The performance penalty in most cases is less than
1ns for using a non-I/O cell register.
Fair comment - I admit to being a bit bitter about what I consider to
be misrepresentation of the truth in the diagram - still, I've learned
not to trust the pictures and read the words now!
This is not true for Mercury, which has three flipflops in the I/O
cell, and ApexII which has six, for DDR applications.
The Virtex/Apex comparison of their respective LVDS implementations is
interesting. As far as I can gather the SerDes function is implemented
in the FPGA fabric for Virtex, and in custom silicon for Apex. This
means that you only get proper SerDes LVDS support with the larger
Apex devices.

The dedicated SERDES circuitry in the APEX devices allows you to move
data around inside the device at 105 MHz and drive it out the LVDS
drivers at 840Mbps.  The Xilinx solution requires routing data and
clocks around internally at 320 MHz (not simple) and they use both
edges of the clock to drive data at 640Mbps.  Also, the LVDS drivers
in the Altera part are balanced (equal rise and fall times) providing
a much better eye-diagram than what you get from the unbalanced
drivers in the Xilinx device.  The Xilinx solution also requires an
exernal resistor network to get the right LVDS voltage levels. 
Finally, the Apex 20KE devices have dedicate de-skew circuitry in the
LVDS receivers.  This prevents the board designer from having to make
all the signal traces exactly the same length.  It's hard to argue
that the Altera LVDS solution is significantly superior (Apex 20KE vs
Virtex-E), but I do have to admire the fact that Xilinx was able to
coax 640 Mbps LVDS out of drivers that were never intended to do LVDS.
 Altera's general-purpose I/Os have trouble making it to 200 Mbps with
a Xilinx-type solution.
As far as Apex II and Virtex II, I have yet to see details on the
Virtex II LVDS.  Apex II increased LVDS performance to 1 Gbps and put
it on more channels.  Apex II also improved the clock de-skew
circuitry to reduce even further the need to carefully hand-route the
board-level LVDS signals.

Also good comments, from someone who has actually done it, rather than
simply my reading of those datasheets and appnotes!
Routing
The routing structures are also different. Altera's main routing
strategy is to have many lines connecting the entire chip together
This is in contrast to the Xilinx approach, which consists of a
hierarchy of short, medium and long connections. This make the job of
the place and route tool harder in the Xilinx devices, unless it is
guided. The downside for Altera is that larger devices get slower as
there is more capacitance to drive.
The routing structures of the Xilinx and Altera families are very
different; each has different abilities. The Altera structure is a
hierarchical structure akin to that of a CPLD.  At the lowest level,
there are very fast connections between the logic elements (LE's,
which consist of a flip-flop and a 4-Lut each) within a LAB (logic
array block-with 8 to 10 LE's).  These connections are great for very
fast state machines, but are useless for arithmetic because the carry
chain also runs thru the LAB.  The next level up in the routing
hierarchy connects the LABs in a row together.  The row routes run
halfway or all the way across the chip in 10K, with switches
connecting to selected LAB's.  The rows are then interconnected by
column routes. A LAB can drive a row or column route directly, but can
only receive input from a row route. This structure has the advantage
of having uniform delays for any connections using similar
hierarchical resources. That in turn makes placement less critical.
Unfortunateiy, it also means even local connections incur the delay
associated with a cross-chip connection.  A bigger problem appears
with heavily arithmetic designs because the routing in and out of
every arithmetic LE is forced onto the row routing.  There are only
six row routes for every eight LE's in a row, so even with perfect
routing in a heavily arithmetic data-flow design, the row can only be
75% occupied. The row interconnect matrix is sparsely populated (any
one LAB can only directly connect to a fraction of the LAB's on the
same row. As the row fills up, some of the connections have to be made
via a third LAB, adding to the delay and further congesting the row
routes. In a math intensive design, system performance often falls off
sharply at 50 to 60% device utilization.  The global nature of the row
and column routes also means that performance degrades with increasing
device size.

The 20K architecture fixes many of the routing problems of the earlier
families cited above.  Another hierarchical layer is added between the
row route and the LAB, which has the effect of localizing connections
that previously had to go on the row tracks. Since those connections
don't have to cross the chip, they are faster. To fix the arithmetic
connections, direct connections have been added from each LAB to the
LE's in the adjacent LAB's in the so called megaLAB.

The Xilinx routing structure is a mix of different length wires
connected by switches.  For the more local connections, very fast
single length connections are used. Longer connections use the longer
wires to minimize the number of switch nodes traversed.  The routing
delays have a strong dependence on the connection distance, so
placement is critical to performance.  This can make performance
elusive to the novice user, but on the other hand, the segmented
routing means extreme performance is available if you are willing to
do some work to get it.

Bottom line is that the Altera routing is more forgiving for moderate
designs at moderate densities, which makes it easier for users and
tools alike. However, the same things that make it easier for those
designs are roadblocks for higher performance.

Tools
Both vendors now ship FPGA Express for compiling/synthesising. Altera
also offer Leanardo Spectrum, which in my opinion is vastly better
than the Synopsys tool. Synplify would still be my synthesiser of
choice, but that isn't likely to be free any time soon!
Altera-specific version of FPGA Express and Leonardo Spectrum are
offered FREE on the Altera web site.  You do not need a subscription
to get them. However, if you do get a subscription, you also get
ModelTech's Modelsim program.
The place and route (Xilinx) and Fitter (Altera) tools both accomplish
the same job. At the time of my investigations (1999) the design I was
benchmarking would take several hours to p&r for Xilinx, rather than
several minutes for the Altera tools. This is mainly due to the
difficulties caused by the Xilinx architecture to the tools. Note that
no effort was made to guide the tools, other than providing timing
constraints, as the environment I work in places a high priority on
speed of turnaround. I'm told (by Xilinx) that things are much
improved with the new tools, but I haven't been able to compare.
It's quite possible that I could have done the job in a
smaller/cheaper Xilinx part, but our production volumes were
exptremely small, so the time taken to create/debug the design on the
bench was a priority.

Other bits
Xilinx have DLLs, Altera have PLLs. Altera claim PLLs are better
becuase they give you proper 'analogue' control over the timing of
your clocks. Xilinx claim DLLs are better because they are not
analogue and therefore easier to deal with. Xilinx have an interesting
appnote comparing the two, but they have subtracted the jitter of
their source clock from the Xilinx numbers and not from the Altera
measurements. They didn't measure the jitter of the Altera input, so
it's difficult to judge if the PLLs are the cause of the jitter they
measure or not. In the interests of fairness, you can look at Altera's
jitter comparison - however, it seems to have a lot less experimental
details to it. I feel I could reproduce the Xilinx experiment to
verify the results if I wanted to!
One significant difference between the PLLs and the DLLs that you
missed is the ability of the PLLs to create non-integer multiples of
the input clock. In fact, the Altera PLL can multiply the input clock
by m/(n*k), where m is any value from 1 to 160 and (n*k) is also any
value from 1 to 160.  Check out App Note 115 for details on the PLLs.
Summary
Xilinx
Potentially smaller and cheaper devices 
Good at arithmetic functions 
Flexible I/Os 
Longer compile times 
More complex tools 
More capable tools for the power user 
Both small and large blocks of embedded RAM 
Proper dual port RAM 
Altera
Quick compile 
Simple tools 
Less flexible tools for the power user 
Flex and Apex make it tricky to make fast bi-directional I/O 
Less capable arithmetic 
No small blocks of embedded RAM 
RAM has one read and one write port, not proper dual ported. 
The conclusion about compile times does not hold for all designs.  The
compile time for dense arithmetic designs in Altera can literally take
days where a similar design in Xilinx can finish in under an hour with
decent floorplanning. Floorplanning in Altera is not well supported
and frankly won't provide as much as it does with Xilinx
Because of Altera's row/column architecture, Altera has been able to
design-in redundant rows and colums.  If a fab defect is found, a
redundant row can be switched in and the die is saved rather than
thrown away.  Since the biggest cost-driver is die size and yield, I
would have to dispute the "potentially... cheaper" devices claim.  As
far as smaller goes, I would have to agree that Xilinx has a wider
product offering at the small end of the FPGA size spectrum.

(The reality of whether one vendor's parts are cheaper than the other
is independent of whether the device includes redundancy logic.The
efficiency of the architecture (gates per some metric of silicon usage
such as area or transistors), the implementation geometry, test costs,
volume, package type, and many other factors all affect the
manufacturing cost. The user pays a "Price" not a "Cost", and this
price depends on the cost, as well as the supplier's profit margin,
and how good you are at negotiating lower prices :-) . While
redundancy may help reduce the cost, what matters in the end to the
end user is the price they pay for a device that meets their needs. )

Quite right Philip. And there's more than the piece price to think
about. If the tools/architecture/whatever allow you to get to market
quicker, or your volumes are so low that the development costs
outweigh the FPGA price (as it does in my particular application)
different things become more important.
Regarding "potentially... cheaper", maybe it would be better to say
"in some applications, potentially cheaper". And therefore the same
should apply to Altera!

I'm also going to have to raise an issue with the "More capable tools
for the power user".  Just because Altera's tools have a nicer GUI
doesn't mean that the tools are not for the power user.  Quartus II
has a built-in TCL console for creating scripts that can do everything
that you can do in the Xilinx tools.
Well... no!  Show me where in their tools you can look at and edit
individual wires in the device. You can do that in Xilinx's FPGA
editor. How about specifying placement in your source (the edif
netlist)? It sure would be nice to be able to constrain the two level
arithmetic logic and the registers driving it to lie in the same row. 
Cliques gave the tools a *HINT* that you want to keep stuff together
in the max plus tools, but only if there was a small number of them.
Last I checked, Quartus still could not use cliques.

If you don't like to use the menus, ask your local Altera FAE and he
can provide you with a library of TCL functions (ask for the PowerKit)
that will allow you to create constraints like "Real Men" do rather
than use the GUI.

This is probably my fault - I was referring to Maxplus2 which I have
consistently failed to get to do what I want with placing certain
logic cells - due to the Quartus fitter ignoring all my assignments -
and the older fitter not being able to get close to my timing
requirements. Approaches to our local FAE, Altera direct and the c.a.f
newsgroup all hit a brick wall. My cursory inspection of Quartus a
while ago did lead me to the idea that it was much more capable in
this area, but as I've not gone beyond 10K I have no 'real world'
comments to make. I do use emacs to enter my constraints in the acf
file though :-)
I can only encourage you to check out the literature and talk to the
FAEs from both Altera and Xilinx to get a more balanced view of the
strengths and weaknesses of the two architectures.
I have read the literature, and spoken to FAEs from both companies. I
think much of our misunderstanding probably stems from the fact that I
initially wrote this piece based on 10K compared with 4000, with
comments thrown in about other architectures jsut to confuse the
issue! Sorry about that!
Amazing as it may seem, other people have asked to contribute to this
page, and editing each person's input (I am expecting more) is getting
to be a bit much, so for your enjoyment, here are comments from others
on the topic. Good luck with your selecting a vendor of FPGAs :-)
Anonymous Designer: 
1. TOOLS  
Altera supports AHDL, which is more powerful than ABEL, but much
easier to learn than VHDL/Verilog. The Maxplus2 tool allows you to
target anything between a 7032 and a 10k200, almost seemlessly. When
we were just getting started this was a big advantage.

2. SUPPORT 
Altera data sheets have to be read most carefully to check that the
device has the features you want, in the package you want. E.G. only
some 10K series have PLLs. When Xilinx says a family has DLLs then the
whole family has them. The summary data sheet for Altera's APEX 20KE
series states that LVDS is supported, but does not mention that it is
only really supported in the 20K400E and larger devices. There is no
mention in the summary front page; you have to look really hard in the
datasheet to find this.

3. 
Altera appears to be targetting the router and other network hardware
market at the moment. Xilinx seems to be going towards DSP.

4. 
Some people are of the opinion that Xilinx appears to be far more
innnovitive and open : There is code to make your FPGA into a DAC with
one resistor and one capacitor ! The Altera app. notes amount to "Yes,
we did it" but do not give sufficient detail for ME to do it. Xilinx
app notes are far more helpful, and they will respond to postings in
comp.arch.fpga. Altera NEVER do.
 
 
An update from Anonymous Altera Fan 
The Product-Term mode of the Altera memory blocks is something other
than just using the RAMs as a big LUT.

A single memory block can be configured to provide 16 product-term
outputs based on 32 inputs.  Although this can be duplicated using a
generic RAM block as a big LUT, it would take an extemely large memory
block (32 address lines = 2^32, 16-bit memory cells) to do it in the
brute-force manner.

Note that this is only a feature in Apex, Apex-E, Mercury, and Apex II
devices.

Article: 37631
Subject: Re: SPI interface in VHDL
From: Patrick Loschmidt <Patrick.Loschmidt@gmx.net>
Date: Tue, 18 Dec 2001 09:37:01 +0100
Links: << >> << T >> << A >>

Hi!

> I'm curious to know if anyone out there knows where there are some examples
> of an SPI interface coded in VHDL.


Which type of interface? There is technical information available for 
the commercial Xilinx and Altera SPI cores, but of course they don't go 
into detail.

> Just curious as I have to code one in the
> near future and I always like to compare the various approaches taken by
> others.


Well, I have to code a SPI-4 (according to the standard phase 2) 
interface. Maybe we can exchange some information.

regards,
Patrick

Article: 37632
Subject: is it OK?
From: Kenily <chensw20@hotmail.com>
Date: Tue, 18 Dec 2001 01:14:54 -0800
Links: << >> << T >> << A >>

hi: I am a fpga beginner,now i have a small design.can you advise me how to implement it? There is 8 data in a fifo(16x255). they must be distinguished and divided when they are be read out from the fifo according to clock.So that i can operate anyone of them to do other .For Example:the first data is Data0,and the second data is Data1....the eighth data is Data7.At the begining i want to implement it by shift register or state machine or counter ,but i cann't finished it alone as my poor digital circuit . it is better if you can write out verilog source code for me thanks!

Article: 37633
Subject: Disadvantages of core creating_rpm and pipeline ?
From: ekkho@students.wisc.edu (enny)
Date: 18 Dec 2001 01:19:53 -0800
Links: << >> << T >> << A >>

Hi, 

Does anyone know what is the tradeoff or how should we decide whether
to let the Core Gen create rpm or not ?
Also, what are advantages/disadvantages of using pipelining in
creating DDS in CoreGen ?

thanks
enny

Article: 37634
Subject: Re: is it OK?
From: chensw20hotmail.com <>
Date: Tue, 18 Dec 2001 01:25:54 -0800
Links: << >> << T >> << A >>

Now,i want to implement it by counter controlling.is it OK?

/*counter[2:0] works if read enable.Data was be shifted by counter control*/ 
always @(posedge NA_Clock or negedge Rst ) 
begin 
if(Rst) 
NA_Count<=0; 
else if(NA_Read_Enable) NA_Count<=NA_Count+1; 
else NA_Count<=0; 
end 

/*data read out from fifo were allocated is NA_Des_Data0.1.....7 dividually NA_Data_Out[15:0] :fifo data out NA_Des_Data[7:0] [15:0] */ always @(posedge NA_Clock or negedge Rst ) 
begin 
if(Rst) 
begin 
NA_Des_Data0 <=16'b0; 
NA_Des_Data1 <=16'b0; 
NA_Des_Data2 <=16'b0; 
NA_Des_Data3 <=16'b0; 
NA_Des_Data4 <=16'b0; 
NA_Des_Data5 <=16'b0; 
NA_Des_Data6 <=16'b0; 
NA_Des_Data7 <=16'b0; 

end
 else 
case(NA_Count) 
3'b000: NA_Des_Data0 <=NA_Data_Out; 3'b001: NA_Des_Data1 <=NA_Data_Out; 3'b010: NA_Des_Data2 <=NA_Data_Out; 3'b011: NA_Des_Data3 <=NA_Data_Out; 3'b100: NA_Des_Data4 <=NA_Data_Out; 3'b101: NA_Des_Data5 <=NA_Data_Out; 3'b110: NA_Des_Data6 <=NA_Data_Out; 3'b111: NA_Des_Data7 <=NA_Data_Out; default :
 begin 
NA_Des_Data0 <=16'b0; 
NA_Des_Data1 <=16'b0; 
NA_Des_Data2 <=16'b0; 
NA_Des_Data3 <=16'b0; 
NA_Des_Data4 <=16'b0; 
NA_Des_Data5 <=16'b0; 
NA_Des_Data6 <=16'b0; 
NA_Des_Data7 <=16'b0; 
end
endcase
end
is it OK?
Thanks

Article: 37635
Subject: Re: Xilinx ChipScope - experiences ?
From: rotemg@mysticom.com (Rotem Gazit)
Date: 18 Dec 2001 02:17:48 -0800
Links: << >> << T >> << A >>

Rick,
I'm  using ChipScope for the past year.
I find it very useful for system debug (replaces the need for probing
of internal nodes).
Nice feature I use  a lot is the "Trigger in"/"Trigger out"  that let
you sync to/from external equipment (LA / Scope) to the ChipScope
trigger while modifying the trigger settings through the JTAG port.
The GUI is quite primitive compared to commercial LA.
Note that you need to have unused blockrams in your chip in order to
be able to use ChipScope.
The amount of RAM required depends on the size of the buffer you need.

Rotem Gazit
Design Engineer
High-speed board & FPGA design
MystiCom LTD
mailto:rotemg@mysticom.com
http://www.mysticom.com/





Rick Filipkiewicz <rick@algor.co.uk> wrote in message news:<3C1E2EAB.2CA43896@algor.co.uk>...
> As usual I'm in the position of trying to shut the stable door when the
> horse is already 2 counties away and accelerating fast but ...
> 
> Has anyone on C.A.F used the ChipScope ILA stuff ?
> 
> Does it work as advertised ?
> 
> Had sucesses/failures ?
> 
> Does it take up a lot of space per embedded analyser ?
> 
> In short where does it lie in the spectrum
> 
> [essential ... helpful ... difficult to use ... waste of time & gates] ?

Article: 37636
(removed)

Article: 37637
Subject: Re: SPI interface in VHDL
From: "Wolfgang Loewer" <wolfgang.loewer@elca.de>
Date: Tue, 18 Dec 2001 13:20:01 +0100
Links: << >> << T >> << A >>

The Altera NIOS softcore processor comes with a flexible, parameterizable
SPI interface module in VHDL or Verilog. The complete NIOS license with all
tools, board and of course SPI is US-$ 995,-
Check out:
http://www.altera.com/literature/ds/ds_niosspi.pdf

- Wolfgang

"Jason Berringer" <jberringer@trace-logic.com> schrieb im Newsbeitrag
news:S5bT7.2519$NC5.476993@news20.bellglobal.com...
> Hello again
>
> I'm curious to know if anyone out there knows where there are some
examples
> of an SPI interface coded in VHDL. Just curious as I have to code one in
the
> near future and I always like to compare the various approaches taken by
> others.
>
> Thanks
>
> Jason
>
>
>

Article: 37638
Subject: Re: division 64
From: Russell Shaw <rjshaw@iprimus.com.au>
Date: Tue, 18 Dec 2001 23:56:30 +1100
Links: << >> << T >> << A >>

Muzaffer Kal wrote:
> 
> On Mon, 17 Dec 2001 10:31:33 -0500, "Pallek, Andrew [CAR:CN34:EXCH]"
> <apallek@americasm01.nt.com> wrote:
> 
> >If you just want to devide by 64, shift right by 6 places.  The modulo is what was shifted
> >out.
> 
> what if the dividend is negative ?

The shift_right() function in ieee.numeric_bit operates on
signed numbers by maintaining the sign bits. For an unsigned
number, zeros are shifted in.

Article: 37639
Subject: Xilinx Foundation - Routing constraints/prohibit
From: Christian Plessl <plessl@remove.tik.ee.ethz.ch>
Date: Tue, 18 Dec 2001 15:10:09 +0100
Links: << >> << T >> << A >>

Hi all.

We're using Xilinx Virtex FPGAs and Xilinx Foundation 3.1i tools. We're 
currently trying to generate a FPGA configuration, where certain parts of 
the FPGA remain completely un-used. While it is possible to place all the 
logic in certain areas on the chip using placement constraints, it seems 
more difficult to influence the routing.

Is possible to (completely) prohibit the use of routing ressources on a 
specific area of the FPGA?

Regards,
 Christian

Article: 37640
Subject: FGPA express bidir pins Xilinx, FPGA-pmap-18
From: "Wilco Vahrmeijer" <wilco@cardiocontrol.com>
Date: Tue, 18 Dec 2001 15:50:11 +0100
Links: << >> << T >> << A >>

Hi all,

We've got a problem with FPGA express (FPGAexpress 3.6.6613 (attached bij
Xilinx  ISE 4.1)) and bidir pins with a Xilinx device:

I've made two blocks and each block has control signals and one
bidirectional pin (tri-state buffered). On the upper layer, this two signals
are routed to the same output pin. (See attachments)

The problem is a warning from FPGA express:
"FPGA-pmap-18  (1 Occurrence) Warning: The port type of port
'/TryOutBiDir-1/BiDirPin' is unknown. An output pad will be inserted"

and FPGA express insert a Outputbuffer instead of a bidir buffer. Internal
the signal is bidirectional, to the outside it's unidirectional.

I want a bidirectional output pin !! Can somebody help me??

Thanks,

Wilco

wilco@cardiocontrol.com


begin 666 upper.vhd
M#0IL:6)R87)Y($E%144[#0IU<V4@245%12YS=&1?;&]G:6-?,3$V-"YA;&P[
M#0H-"F5N=&ET>2!4<GE/=71":41I<B!I<PT*("!P;W)T#0H@("@-"B @("!7
M<FET93)296%D960Q.B!I;B!35$1?3$]'24,[#0H@(" @5W)I=&4R4F5A9&5D
M,CH@:6X@4U1$7TQ/1TE#.PT*(" @($AI9VA:,3H@:6X@4U1$7TQ/1TE#.PT*
M(" @($AI9VA:,CH@:6X@4U1$7TQ/1TE#.PT*(" @(%)E861E9#$Z(&]U="!3
M5$1?3$]'24,[#0H@(" @4F5A9&5D,CH@;W5T(%-41%],3T=)0SL-"B @("!"
M:41I<E!I;CH@:6YO=70@4U1$7TQ/1TE##0H@("D[#0IE;F0@5')Y3W5T0FE$
M:7([#0H-"F%R8VAI=&5C='5R92!4<GE/=71":41I<E]A<F-H(&]F(%1R>4]U
M=$)I1&ER(&ES#0H@( T*("!C;VUP;VYE;G0@9')I=F5R#0H@(" @<&]R= T*
M(" @("@-"B @(" @(%=R:71E,E)E861E9#H@:6X@(" @4U1$7TQ/1TE#.PT*
M(" @(" @:&EG:%HZ(" @(" @("!I;B @("!35$1?3$]'24,[#0H@(" @("!2
M96%D960Z(" @(" @(&]U=" @(%-41%],3T=)0SL-"B @(" @($11.B @(" @
M(" @(" @:6YO=70@4U1$7TQ/1TE##0H@(" @*3L-"B @96YD(&-O;7!O;F5N
M=#L-"B @#0H@(&)E9VEN#0H-"B @1')I=F5R7S$@(#H@1')I=F5R("!P;W)T
M(&UA<" H5W)I=&4R4F5A9&5D,2Q(:6=H6C$L4F5A9&5D,2Q":41I<E!I;BD[
M#0H@($1R:79E<E\R(" Z($1R:79E<B @<&]R="!M87 @*%=R:71E,E)E861E
M9#(L2&EG:%HR+%)E861E9#(L0FE$:7)0:6XI.PT*(" -"B @#0IE;F0@5')Y
03W5T0FE$:7)?87)C:#L-"@``
`
end

begin 666 driver.vhd
M;&EB<F%R>2!)145%.PT*=7-E($E%144N<W1D7VQO9VEC7S$Q-C0N86QL.PT*
M#0IE;G1I='D@9')I=F5R(&ES#0H@('!O<G0-"B @* T*(" @(%=R:71E,E)E
M861E9#H@:6X@(" @4U1$7TQ/1TE#.PT*(" @(&AI9VA:.B @(" @(" @:6X@
M(" @4U1$7TQ/1TE#.PT*(" @(%)E861E9#H@(" @(" @;W5T(" @4U1$7TQ/
M1TE#.PT*(" @($11.B @(" @(" @(" @:6YO=70@4U1$7TQ/1TE##0H@("D[
M#0H@(&5N9"!D<FEV97([#0H-"F%R8VAI=&5C='5R92!D<FEV97)?87)C:"!O
M9B!D<FEV97(@:7,-"@T*("!B96=I;@T*#0H@(%)E861E9" \/2!$42!W:&5N
M(%=R:71E,E)E861E9" ]("<Q)R!E;'-E("<P)SL-"B @1%$@/#T@)UHG('=H
M96X@2&EG:%H@/2 G,2<@96QS92 G,2<[#0H@( T*96YD(&1R:79E<E]A<F-H
#.PT*
`
end

Article: 37641
Subject: Atmel IDS 7.5 and also older Versions do not work with Windows XP
From: "Elmar Dukek" <uaml@rz.uni-karlsruhe.de>
Date: Tue, 18 Dec 2001 16:01:54 +0100
Links: << >> << T >> << A >>

When starting IDS 7.5 in Windows XP Home Edition the following
errorsdescription occurs: vw25.exe has detected an error and hast to be
closed. (in german language because of german installation)

AppName: vw25.exe
AppVer: 0.0.0.0
ModName: ntdll.dll
ModVer:5.1.26.00.0
Offset: 0000222c

Does anybody know a solution for this problem?

THX a  lot


Elmar

Article: 37642
Subject: Re: random number generator in Handel-C?
From: klonsky@hotmail.com (Noel Klonsky)
Date: 18 Dec 2001 07:05:52 -0800
Links: << >> << T >> << A >>

Hey there Rob, try 

macro proc RandomGen(Random1,Random2)
{
	/*initialise random numbers*/
	Random1 = 0x1234;
	Random2 = 0xabcd;

  	/* random process */
  	while (1)
	{
  		par
    	{
      		Random1 = (Random1 <- 22) @ (Random1[19] ^ Random1[13] ^ ~Random1[22]);
      		Random2 = (Random2 <- 22) @ (Random2[19] ^ Random2[13] ^ ~Random2[22]);
	    }
	}
}


Noel

robquigley@hotmail.com (rob) wrote in message news:<c48eed90.0112140543.7aa78fbc@posting.google.com>...
> Hi folks,
> 
> I was wondering if anyone knew if there is a random number generator
> facility in Handel-C. I'm using version 3.0.
> 
> Any help would be much appreciated,
> 
> Cheers and THX,
> 
> Rob.

Article: 37643
Subject: Best-case Timing?
From: byrne@ddc-web.com (Stephen Byrne)
Date: 18 Dec 2001 07:27:04 -0800
Links: << >> << T >> << A >>

Hello All,

My company is currently comparing 66MHz PCI core solutions from Xilinx
and Altera, as well as debating using a home-spun core.  One issue
I've come upon is the PCI requirement for a MAX clock-to-out time of 6
ns and MIN clock-to-out time of 2 ns.  Both the Xilinx ISE and Altera
Quartus II tools seem very helpful in supplying MAX (worst-case) Tco
times, but I don't see any info on best-case times.  Apparently the
SDF files for back-annotated timing sim have the same worst-case
numbers repeated 3 times, resulting in the same simulation regardless
of case selection.  My question is: how is anyone (FPGA vendors
included) guaranteeing a MIN Tco of 2 ns across all conditions and
parts if the design tools don't even yield that information?

Thank You,

Stephen Byrne

Article: 37644
Subject: Re: Certicom challenge and FPGA based modular math
From: "Jay Berg" <admin@eCompute.org>
Date: Tue, 18 Dec 2001 08:14:21 -0800
Links: << >> << T >> << A >>

"Jay Berg" <admin@eCompute.org> wrote in message
news:3c1cfff8$0$34821$9a6e19ea@news.newshosting.com...
> After making the mistake of getting involved in the current ECCp109
> distributed computing project (see URL below), I'm now casting around to
> determine if there's a possibility of finding a PCI board with an FPGA
> co-processor capable of handling a small set of modular math functions.
>
> http://www.nd.edu/~cmonico/eccp109/
>

***************

I want to thank everyone (especially Larry) for helping refine an initial
design for the problem I posed a couple of days ago regarding modulo math. I
have a synopsis of the discussions below.

- - - - -

One of the biggest design points that fell out of the discussions is that I
was looking at the problem with too fine a granularity. Rather than looking
at simply providing a modulo multiply, it was strongly suggested that I look
at replacing larger sections of logic with FPGA logic. By re-examining the
client, it becomes obvious that it is possible to extract substantial math
from the client within the very most inner loop.

There are three paths in the inner most loop. These are examined below. Each
of the first two paths require 5 inputs with two of the inputs being
constants with the third path requireing 4 inputs with one being a constant.
Thus each path requires in actuality a total of 3 (variable) inputs and each
path producing a single result.

I've been told that it would be "quite easy" to reduce all three paths to
FPGA logic. Where the CPU provides the initial inputs and then selects the
logic path to execute.

I am now trying to decide whether I can learn enough about FPGAs to do the
work myself, or whether I can find someone willing to donate the time in
return for a couple of FPGA PCI boards.

- - - - -
Path 1 - Total of input parameters needed: 5
        PY (constant value)
        PX (constant value)
        y[i]
        x[i]
        needInvert [i<<2]

    submod_p109 (lambda, PY, y[i]);
    mulmod_p109 (lambda, lambda, &needInvert [i << 2]);
    addmod_p109 (temp_ul, x[i], PX);
    mulmod_p109 (temp2_ul, lambda, lambda);
    submod_p109 (tempx, temp2_ul, temp_ul);
    submod_p109 (temp_ul, x[i],  tempx);
    mulmod_p109 (temp_ul, lambda, temp_ul);
    submod_p109 (res_list[i].y, temp_ul, y[i]);

Path 2 - Total of input parameters needed: 5
        QY (constant value)
        QX (constant value)
        y[i]
        x[i]
        needInvert [i<<2]

    submod_p109 (lambda, QY, y[i]);
    mulmod_p109 (lambda, lambda, &needInvert [i << 2]);
    addmod_p109 (temp_ul, x[i], QX);
    mulmod_p109 (temp2_ul, lambda, lambda);
    submod_p109 (tempx, temp2_ul, temp_ul);
    submod_p109 (temp_ul, x[i],  tempx);
    mulmod_p109 (temp_ul, lambda, temp_ul);
    submod_p109 (res_list[i].y, temp_ul, y[i]);

Path 3 - Total of input parameters needed: 4
        A (constant value)
        y[i]
        x[i]
        needInvert [i<<2]

    mulmod_p109 (temp_ul, x[i], x[i]);
    addmod_p109 (temp2_ul, temp_ul, temp_ul);
    addmod_p109 (temp2_ul, temp2_ul, temp_ul);
    addmod_p109 (lambda, temp2_ul, A);
    mulmod_p109 (lambda, lambda, &needInvert [i << 2]);
    mulmod_p109 (temp_ul, lambda, lambda);
    submod_p109 (temp_ul, temp_ul, x[i]);
    submod_p109 (tempx, temp_ul, x[i]);
    submod_p109 (temp_ul, x[i], tempx);
    mulmod_p109 (temp_ul, lambda, temp_ul);
    submod_p109 (res_list[i].y, temp_ul, y[i]);
- - - - -


A few side notes:
- - - - -
1. The following values are constants.
    a. PX = 000004CC974EBBCBFDC3636FEB9F11C7
    b. PY = 000007611B0EB1229C0BFC5F35521692
    c. QX = 00000233857E4E8B5F0055126E7D7B7C
    d. QY = 000019C8C91063EB4276371D68B6B4D9
    e. A  = 00000FD4C926FD178E9805E663021744
    f. P  = 00001BD579792B380B5B521E6D9FB599
Note that P is the modulo value that all functions use in reducing results.

2. All math is 109-bits.

3. All routines reduce the result by modulo P, prior to storing the result.

4. All functions are in the form of (result, op1, op2). Where the result of
the operation is stored to 'result'.

5. Each of the three paths result in a single value.

6. The math functions each require between 25 (add and subtract) and 325
(multiply) CPU instructions. Using that estimate of function lengths, the
three paths are approximately 1,000 CPU instructions each.

7. The SW seems to lend itself well to parallelism. Currently it appears
that the SW is setup to provide calculations in groups of 128 at a time.
I've been told that this would aid in the pipelining within the FPGA. I am
still waiting for confirmation from the SW author as to the exact behavior
of the SW.

Article: 37645
Subject: Re: is it OK?
From: Andy Peters <andy@exponentmedia.nospam.com>
Date: Tue, 18 Dec 2001 16:52:47 GMT
Links: << >> << T >> << A >>

You could simulate it, and find out for yourself if it is OK.

OK?

"chensw20hotmail.com" wrote:
> 
> Now,i want to implement it by counter controlling.is it OK?
> 
> /*counter[2:0] works if read enable.Data was be shifted by counter control*/
> always @(posedge NA_Clock or negedge Rst )
> begin
> if(Rst)
> NA_Count<=0;
> else if(NA_Read_Enable) NA_Count<=NA_Count+1;
> else NA_Count<=0;
> end
> 
> /*data read out from fifo were allocated is NA_Des_Data0.1.....7 dividually NA_Data_Out[15:0] :fifo data out NA_Des_Data[7:0] [15:0] */ always @(posedge NA_Clock or negedge Rst )
> begin
> if(Rst)
> begin
> NA_Des_Data0 <=16'b0;
> NA_Des_Data1 <=16'b0;
> NA_Des_Data2 <=16'b0;
> NA_Des_Data3 <=16'b0;
> NA_Des_Data4 <=16'b0;
> NA_Des_Data5 <=16'b0;
> NA_Des_Data6 <=16'b0;
> NA_Des_Data7 <=16'b0;
> 
> end
>  else
> case(NA_Count)
> 3'b000: NA_Des_Data0 <=NA_Data_Out; 3'b001: NA_Des_Data1 <=NA_Data_Out; 3'b010: NA_Des_Data2 <=NA_Data_Out; 3'b011: NA_Des_Data3 <=NA_Data_Out; 3'b100: NA_Des_Data4 <=NA_Data_Out; 3'b101: NA_Des_Data5 <=NA_Data_Out; 3'b110: NA_Des_Data6 <=NA_Data_Out; 3'b111: NA_Des_Data7 <=NA_Data_Out; default :
>  begin
> NA_Des_Data0 <=16'b0;
> NA_Des_Data1 <=16'b0;
> NA_Des_Data2 <=16'b0;
> NA_Des_Data3 <=16'b0;
> NA_Des_Data4 <=16'b0;
> NA_Des_Data5 <=16'b0;
> NA_Des_Data6 <=16'b0;
> NA_Des_Data7 <=16'b0;
> end
> endcase
> end
> is it OK?
> Thanks

Article: 37646
Subject: Re: SPI interface in VHDL
From: husby_d@yahoo.com (Don Husby)
Date: 18 Dec 2001 09:19:50 -0800
Links: << >> << T >> << A >>

Patrick Loschmidt <Patrick.Loschmidt@gmx.net> wrote in message news:<3C1F002D.4060502@gmx.net>...
> Hi!
> 
> > I'm curious to know if anyone out there knows where there are some examples
> > of an SPI interface coded in VHDL.
> 
> 
> Which type of interface? There is technical information available for 
> the commercial Xilinx and Altera SPI cores, but of course they don't go 
> into detail.

Modelware (www.modelware.com) also makes a Spi4 core for Xilinx.
Xilinx re-sells their core.  You get much better support if you
buy it directly from Modelware.

Article: 37647
Subject: Re: FGPA express bidir pins Xilinx, FPGA-pmap-18
From: "Falk Brunner" <Falk.Brunner@gmx.de>
Date: Tue, 18 Dec 2001 18:36:54 +0100
Links: << >> << T >> << A >>

"Wilco Vahrmeijer" <wilco@cardiocontrol.com> schrieb im Newsbeitrag
news:9vnl4h$1j1i$1@news.versatel.net...
> Hi all,
>
> We've got a problem with FPGA express (FPGAexpress 3.6.6613 (attached bij
> Xilinx  ISE 4.1)) and bidir pins with a Xilinx device:
>
> I've made two blocks and each block has control signals and one
> bidirectional pin (tri-state buffered). On the upper layer, this two
signals
> are routed to the same output pin. (See attachments)
>
> The problem is a warning from FPGA express:
> "FPGA-pmap-18  (1 Occurrence) Warning: The port type of port
> '/TryOutBiDir-1/BiDirPin' is unknown. An output pad will be inserted"
>
> and FPGA express insert a Outputbuffer instead of a bidir buffer. Internal
> the signal is bidirectional, to the outside it's unidirectional.
>
> I want a bidirectional output pin !! Can somebody help me??

To have a bidirectional bus inside AND outside the FPGA you have to isolate
them.

entity tristate is
  port
  (
    BiDirPin: inout STD_LOGIC
  );
end TryOutBiDir;

architecture TryOutBiDir_arch of TryOutBiDir is

  component driver
    port
    (
      Write2Readed: in    STD_LOGIC;
      highZ:        in    STD_LOGIC;
      Readed:       out   STD_LOGIC;
      DQ:           inout STD_LOGIC
    );
  end component;

  begin

  Driver_1  : Driver  port map (Write2Readed1,HighZ1,Readed1,BiDirPin_int);
  Driver_2  : Driver  port map (Write2Readed2,HighZ2,Readed2,BiDirPin_int);

  BiDirPin<=BidirPin_int when con='1' else 'Z';

end TryOutBiDir_arch;

This code ist not complete, the signal declarations are missing. You also
need to generate the con signal, which controls the Tristate driver of the
IO Pin.

--
MfG
Falk

Article: 37648
Subject: Re: Xilinx Foundation - Routing constraints/prohibit
From: "Falk Brunner" <Falk.Brunner@gmx.de>
Date: Tue, 18 Dec 2001 18:37:54 +0100
Links: << >> << T >> << A >>

"Christian Plessl" <plessl@remove.tik.ee.ethz.ch> schrieb im Newsbeitrag
news:3c1f4e5f@pfaff.ethz.ch...

> Is possible to (completely) prohibit the use of routing ressources on a
> specific area of the FPGA?

Why do you want to do so?

--
MfG
Falk

Article: 37649
Subject: WebPack blows up CPLDs?
From: atali@cygrp.com (Aare Tali)
Date: 18 Dec 2001 09:47:14 -0800
Links: << >> << T >> << A >>

Hi,

I had 2 projects, entities A and B. They both fit into separate
XC9572-s. Then I wanted to know how much resources I have available if
I combine them into one XC95144XL, so I created another project and in
it's main architecture I used entities A and B, connected them with
few signals and copied sources into this new project. Specified the
chip XC95144XL and looked at fitter report. So far so good. But my
prototype board is built using XC9572-s, so I changed target chip to
XC9572, selected entity A in project window and created bit file. I
got no visible warnings that I shouldn't do this. And when I
programmed XC9572 with the file that was supposed to be made for
XC9572, chip let the magic smoke out. Tried twice on both CPLDs, got 4
fried chips. When I created two separate projects again with entities
A and B in them, new chips programmed fine. Any ideas?

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search