Messages from 61350

Article: 61350
Subject: Re: Parameterized Multiplier in Xilinx FPGA
From: yog_aga@yahoo.co.in (ykagarwal)
Date: 2 Oct 2003 02:46:58 -0700
Links: << >> << T >> << A >>

if your synthesis tool (e.g. amplify) supports, u can precisely
control the pipelining levels and
style of the multiplier inferred by * operator.

regards
--yka

Article: 61351
Subject: Evaluation time of Emac Core?
From: "Don" <nono@spam.dk>
Date: Thu, 2 Oct 2003 12:15:09 +0200
Links: << >> << T >> << A >>

Hi I am using the EDK 3.2 to build a MicroBlaze-design with an ethernet-MAC
attached. When synthesizing my design it gives me a warning: "Emac Licensing
in effect".
My question is, how long (hours) can I evaluate the core before the FPGA
needs to be reloaded?

Best Regards
Don

Article: 61352
Subject: Re: Wirelessly Connecting two FPGA development boards (Celoxica RC100 boards)
From: "Patrick MacGregor" <patrickmacgregor@comcast.net>
Date: Thu, 2 Oct 2003 07:41:55 -0400
Links: << >> << T >> << A >>

Looks like you have an equivalent serial rate of 320 MHz, plus some sort of
framing overhead so that you can find bit/byte boundries.  You can certainly
do this with an electrical interface over 10m and save a lot of money over
optical.

What time frame are you looking at to implement the solution?  Reason I ask
is that a small company called Core Foundry is finalizing a new FPGA
development system called PROTEUS.  Their intent is to create a modular FPGA
development platform that can be customized to an application simply by
changing removable plug-in modules.  The first incarnation is a small
aluminum box, about 7"w x 9"d x 1.5"h that has room for 4 I/O modules out
the front.  I've seen modules for dual T1/E1 (so up to 8 ports can fit in a
box), a T3/E3/STS-1 single port electrical interface, and a multirate SFP
module with CDR that can handle 30Mbps through 3.2 Gbps.  This might be the
module you care about as it will accept any SFP plug-in, including the
copper GbE SFP modules from Molex and others.

The I/O modules plug into a main board that has a simple serial interface on
it to the outside world.  Behind the I/O modules are connectors to accept
FPGA boards.  A single-wide board connects one I/O module to one FPGA.  A
double-wide board connects one FPGA to two I/O modules, and a quad-wide
board connects one FPGA to 4 I/O modules.  Behind that are connectors for a
more powerful microcontroller board that is optional.  This will have the
10/100, USB and more serial ports on it.  The first uC board available will
be based on soft FPGA cores as opposed to hard uC chips.

The initial FPGA boards will be Cyclones by Altera, as they are readily
available.  All parts sizes will be available.  Spartan 3 boards will most
likely follow next year whenever they are production qualified` and readily
available.  Beyond that they've talked about plans for Stratix and Virtex
FPGA boards, and perhaps some non-FPGA boards with DSP chips, or maybe
combinations of DSP and FPGA.

For your situation, a single SFP I/O module and a single Cyclone 1C3 FPGA
module should suffice.  Probably wouldn't need the uC board at all.  The
FPGA board has ribbon cable connectors on top so that you could cable into
it (with the box lid removed).  Then serialize and format the data and shoot
it out the SFP optical port (or maybe an SFP GbE port, but you'd have to do
more work on the data up-front to make the transcievers happy).  Mapping
your data into an OC-12 payload would be trivial, on the other hand.

The PROTEUS system is in hardware testing now, and I don't know when it will
become available, or at what price points.  They don't have any info posted
on their website yet either, although they say that info will be posted in
October.  I'm looking to be an early customer myself.

I'm not a wireless expert by any stretch, but Infineon makes a nice looking
BT module, ROK104001, that looks like it would make BT simple.  I think in
small qty they are $20 or so.  I heard about it from a local Insight sales
rep.


"Patrick Twomey" <patrickt@rennes.ucc.ie> wrote in message
news:1d183274.0309300805.472f07a1@posting.google.com...
> Thank for replying to my post. To answer your first question I want to
> mate an optical or wireless communication interface to the Celoxica
> RC100 boards.
> Set up so far is as follows:
>
> Camera -> Celoxica Board -> Ribbon Cable -> Celoxica Board -> Monitor
>
> The Ribbon cable is connected to the Celoxica boards using the expansion
> header on the celoxica boards. This expansion header allows digital
> communition in and out of the FPGA on the Celoxica board. The data on
> the ribbon cable ha a bus width of 32 (i.e. is 32 bits wide) and and the
> data changes every 100 ns (10 MHz). All I want to do is remove the
> ribbon cable and replace it with an optical or wireless communication
> system (preferably a wireless system). So system would be:
>
> Camera -> Celoxica Board -> Wireless transmitter -> Receiver -> Celoxica
> board -> Monitor
>
> One board and the transmitter would be at one end of a room, the other
> board and receiver at other end of room. The transmission range is to be
> small e.g. max of 5-8 meters. The data rate is fairly high so not sure
> if a wireless system would be up to the task. Hope this has clarified my
> situation.
>
>
> "Patrick MacGregor" <patrickmacgregor@comcast.net> wrote in message
news:<Vc6dncJWbLocWOWiXTWJhg@comcast.com>...
>
>
> > Can you explain a bit more?  Are you planning on looking to replace the
>
>
> > Celoxica boards with something else, or do you want to mate the Celoxica
>
>
> > boards to some optical or wireless transmission system?  If so, how
would
>
>
> > you want to transfer data to/from the optical or wireless interface
boards?
>
>
> >
>
>
> >
>
>
> > "Patrick Twomey" <patrickt@rennes.ucc.ie> wrote in message
>
>
> > news:1d183274.0309290336.5aa14a7c@posting.google.com...
>
>
> > > I am trying to connect two FPGA development boards together. The
boards in
>
>
> > > question are two Celoxica RC100 development boards. Video in is from
an
>
>
> >  analog
>
>
> > > camera. The video data is converted to digital and stored on
>
>
> > > SRAM. There is an expansion header for inter-connectivity. On the
other
>
>
> >  board
>
>
> > > video out to a monitor occurs after reading data from the SRAM on this
>
>
> >  board.
>
>
> > > Have connected to two boards via a ribbon cable connected to the
expansion
>
>
> > > headers. Want to replace this cable with wireless or optical
transmission.
>
>
> >  Is
>
>
> > > there any development boards available for this. The pixel clock is 10
MHz
>
>
> >  and
>
>
> > > there are at least 16 bits per pixel (32 aftere error correction
>
>
> >  encoding).
>
>
> > > Access to a 80 MHz on board clock is
>
>
> > > available. Any help would be much appreciated.

Article: 61353
Subject: Re: Automatic I/O voltage sensing (as XILINX ParallelCable IV)
From: "Amontec Team, Laurent Gauch" <laurent.gauch@amontecDELETEALLCAPS.com>
Date: Thu, 02 Oct 2003 13:55:14 +0200
Links: << >> << T >> << A >>

Stephan Buchholz wrote:
> Laurent
> 
>     The Philps Semiconductor GTL2010 10-bit bi-directional low voltage
> translator might help
> Steve Buchholz
> "Amontec Team, Laurent Gauch" <laurent.gauch@amontecDELETEALLCAPS.com> wrote
> in message news:3f7ac650$1@news.vsnet.ch...
> 
>>Hi all,
>>
>>Sorry to ask about analog question, but it 's relative to FPGA too.
>>
>>Do you know a schematic to do 'Automatic I/O voltage sensing' as XILINX
>>does with the ParallelCable IV.
>>
>>I am designing a new JTAG interface (USB), and I want to be able to
>>drive correctly the target JTAG signals (3.3V, 2.5V, 1.8V, 1.2V).
>>
>>Are there any lvttl level shifter device to do this work?
>>
>>Thanks for your advice.
>>
>>Laurent Gauch
>>www.amontec.com
>>
> 
> 
> 

Thanks,

You confirm what I'm thinking to use.

For your info, GTL2010 is corresponding with TVC family from Texas.
I will try with this !

Laurent
-> www.amontec.com

Article: 61354
Subject: Re: Good VHDL/Verilog editor?
From: "Valentin Tihomirov" <valentin@abelectron.com>
Date: Thu, 2 Oct 2003 15:04:47 +0300
Links: << >> << T >> << A >>

> I still like Aldec for design entry.  Editor is very much studio editor
> like, plus you can run sims right there as well as integrate in the rest
> of your tool flow.  For the price, I think it is a great value.

Heh, I appretiate Aldec cos it is not studio-like as opposed to Xilinx's
WebPack. And more feature rich and its level ov integration of different
tools (like jumping to errorous code line and more).

Article: 61355
Subject: CUPL documentation?
From: ge <e_c_l_e_s@a-znet.com>
Date: Thu, 02 Oct 2003 08:31:38 -0400
Links: << >> << T >> << A >>

I am looking at using CUPL to implement some simple-ish functions in
the Atmel ATF150x series parts.  I have WinCUPL, and its manual, but I
feel like there's something missing in the documentation:  

I've found some use of a "property" directive in CUPL, which seems to
be used to switch device/vendor-specific functions - things like (for
Atmel) pin-keeper, JTAG on/off, etc. 

So, two questions:

First, I don't find a "PROPERTY" directive in the WinCUPL manual.  Am
I missing it, or is there some more complete CUPL language reference
available?

Second, it seems like the available "properties" and syntax for the
"property" directive would need to be identified for each device.  The
functions are described in the spec sheets, but I find no information
on how to activate them with CUPL.  It seems like I'm missing a layer
of documentation.  Any pointers would be appreciated.

TIA,
George


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----==  Over 100,000 Newsgroups - 19 Different Servers! =-----

Article: 61356
Subject: Re: CUPL documentation?
From: ge <e_c_l_e_s@a-znet.com>
Date: Thu, 02 Oct 2003 09:05:49 -0400
Links: << >> << T >> << A >>

Never mind.  Once again, as soon as I ask the question, I find the
answer.  It's in the fitter help, and Atmel's FAQ.




-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----==  Over 100,000 Newsgroups - 19 Different Servers! =-----

Article: 61357
Subject: Re: Good VHDL/Verilog editor?
From: "jakab tanko" <jtanko@ics-ltd.com>
Date: Thu, 2 Oct 2003 09:07:16 -0400
Links: << >> << T >> << A >>

Have a look at  http://www.ultraedit.com
---
jakab

Article: 61358
Subject: Re: Good VHDL/Verilog editor?
From: Bob Perlman <bobsrefusebin@hotmail.com>
Date: Thu, 02 Oct 2003 14:28:32 GMT
Links: << >> << T >> << A >>

On Thu, 2 Oct 2003 09:07:16 -0400, "jakab tanko" <jtanko@ics-ltd.com>
wrote:

>Have a look at  http://www.ultraedit.com
>---
>jakab
>

I use UltraEdit, too.  I also run emacs Verilog mode from inside
UltraEdit, in batch mode, to automatically generate port lists, etc.
And I store commonly-used Verilog code snippets as UltraEdit
templates.

Verilog (and VHDL) syntax coloring files are available on the
UltraEdit web site.  I also have a Xilinx UCF syntax coloring file; if
anyone needs it, just send e-mail to bp <at> cambriandesign <dot> com.

Bob Perlman
Cambrian Design Works

Article: 61359
Subject: Re: LVDS_25_DCI : Top Ten List
From: Austin Lesea <Austin.Lesea@xilinx.com>
Date: Thu, 02 Oct 2003 07:46:50 -0700
Links: << >> << T >> << A >>

Brian,

Excellent list.

But I have one correction, the capacitance to ground is ~ 8pf, thus the
differential capacitance is 4 pf (two 8 pf in series).  Unfortunately,
to meet ESD, and have the IOB also do the other 35 standards, the
capacitance is not as low as everyone would like.  Simulations at the
die, however, show a very nice waveform, even though it may look
questionable at the pins of the device (due to the t-line effects).

Nothing beats an on die 100 ohm termination.

LVDS_25_DCI was never intended to replace a simple 100 ohm external
termination.  That was reserved for the improved input terminator (a
simple 100 ohms) that was added to Virtex 2 Pro.  It was also an
afterthought, that was suggested to us by a customer, when they messed
up, and forgot all the resistors.  It is VERY ugly in the power
department, and we did not realize that the power could be as high as
~85 mW per pair due to the way the DCI circuit operates.  Also, freezing
DCI does mean that you might be trying to measure the 25 ohm termination
voltage with the reference resistors, so the current in them does
increase, too.

If I may suggest, use LVDCI_25_DCI only for clock inputs, or a few
signals.  Always use DCI_Freeze to reduce the jitter.  Also look at what
happens when you do not have a 100 ohm termination.  For some signals,
and lengths of pcb, it may not be required.  And we will check out the
IBIS model issue.

As for allowing the power estimator, spreadsheet, answers, etc. to all
catch up with all of the "top ten" list:  that is just tough to do, but
you are right, we should do it (and will).

Spartan 3 addresses a different market than Virtex II, or II Pro, and
was never intended to replace them.  We reserve the right to
differentiate product lines by having different features.  I am sure
everyone would like to have a Spartan 3 that could replace a Virtex II
or II Pro, but that was a) not the market we were after, and b) not
possible with the process/design/technology we chose.

The Spartan folks are busily planning and designing their next chip(s),
and we in the Virtex camp are busy with our next product offering.

Thanks for your comments,

Austin

Brian Davis wrote:

> Top Ten Things I wish I never had needed to learn about LVDS_25_DCI:
>
>  1) Parallel DCI input standards in Virtex2 continuously modulate
>    the input termination offset voltage unless you enable bitgen's
>    FreezeDCI option
>
>  2) With FreezeDCI on, the entire bottom half of 2V40, 2V80, and
>    any CS144 packages are unavailable for LVDS_25_DCI inputs (this
>    includes half the global clock inputs to the chip) due to DCI
>    unavailability in banks having only ALT_VRP/N pins
>
>  3) With FreezeDCI on, dual purpose config pins cannot be used as
>    LVDS_25_DCI inputs
>
>  4) 5.2i S/W doesn't catch illegal pin assignments due to #2 and #3
>
>  5) With FreezeDCI on, input terminator accuracy for 2R values
>    degrades to +/-20%
>
>  6) With FreezeDCI on, each bank will have a (different) random
>    input offset voltage due to split terminator 2R variations
>
>  7) LVDS_25_DCI terminator overhead power per input pair far exceeds
>    the theoretical 62.5 mW number published in Answer Record 15633
>
>  8) With FreezeDCI on, worst case VCCO power overhead per
>    LVDS_25_DCI input pair approaches 100 mW
>
>  9) With FreezeDCI on, worst case DCI VRP/N VCCO power overhead
>    per I/O bank approaches 200 mW
>
> 10) 5.2i Xpower incorrectly assigns DCI power to the 1.5V VCCINT
>    supply, and it doesn't use the worst case DCI power numbers
>
> 11) V2 Power Estimator spreadsheet doesn't support LVDS_25_DCI,
>    but if you fake it by using two single ended DCI 2R split
>    terminated inputs per actual LVDS pair, it also uses the
>    wildly optimistic power numbers
>
> 12) LVDS_25_DCI IBIS models don't work in HyperLynx
>
> 13) Massive 8pf IBIS C_COMP input capacitance value for the
>    V2 LVDS inputs requires external back termination and/or
>    input matching scheme to achieve reasonable signaling when
>    driving FPGA inputs from a modern high speed LVDS driver
>
> Interesting Answer Database Search Keywords:
>
>   FreezeDCI
>   LVDS AND DCI AND termination
>   DCI AND power
>   IBIS AND Hyperlynx  ( in answer archive )
>
> Suggestions to Xilinx:
>
>   -  Have somebody document the plethora of V2 DCI hardware
>     and software problems ('challenges'? 'features'?) in one
>     place ( a detailed application note? ) ASAP.
>
>   -  Hiding the FPGA IOB/CLB/FF/interconnect power consumption
>     numbers within an encrypted spreadsheet and buggy SW makes
>     it impossible to cross-check the resulting power calculations.
>
>   -  Please take a look at page 145 of the ORCA-4 datasheet
>     ("Package Parasitics"): there, in human readable form, is a
>     usable package model that can be simulated in any SPICE.
>
>   -  Also note that the ORCA-4 IBIS C_COMP value for the general
>     purpose LVDS inputs is a much more reasonable 2 pf.
>
>   -  Real differential LVDS input terminators are quite wonderful
>     (no VCCO power hit, no split terminator DC offset problems).
>
>       Making them available (LXXX_25_DT) only in the V2Pro, and
>     not in the Spartan3, is an exceptionally HUGE mistake.
>
>
> Brian

Article: 61360
Subject: Re: Frustrations with Marketing
From: rickman <spamgoeshere4@yahoo.com>
Date: Thu, 02 Oct 2003 11:38:45 -0400
Links: << >> << T >> << A >>

Tom Seim wrote:
> 
> Xilinx's marketing is about as bad as it gets. Frankly, I'm surprised
> that they are the largest FPGA vendor. I have had bad experiences with
> them in the (far) past. In particular, when they changed vendors for
> the serial proms. They cut off the old vendor with the (wishful)
> thinking that the new one would take over. Well, the new one choked
> big time and us users were left holding the bag. At the time I was
> running my own company and desparately needed those parts. Good
> luck!!! I was F**KED!!! Peter took exception the last time I mentioned
> this. In private e-mail I reminded him that if Xilinx doesn't ship
> product he still collects his pay check - as a private business owner
> if I didn't ship product the revenue stopped.
> 
> My latest run in with brand X shouldn't have happened. I thought I was
> doing them a favor by ordering a license renewal for $4K. Guess what?
> XILINX SCREWED UP!!! We have a year end deadline (Sep 25); did Xilinx
> care? NO!!!! Only by Hurcelean effort did I mananage to get the order
> placed (after I started a week and a half before the deadline). I got
> an apology from them. But SO WHAT!!
> 
> I think they have gotten full of themselves and don't really care.
> They know us suckers have to deal with them no matter what. Well,
> maybe we do. Doesn't make me feel any better.

There are a great many aspects of this line of work that put the small
business owner at a great disadvantage.  Allocation is one of the big
ones.  Right now everyone is trying to get my business even though it is
not very large at this time.  But as soon as the market starts growing
again I am sure I will be back at the bottom of the "call" list.  

I won't say your experiences are unique, but I don't think Xilinx is in
the habit of ignoring their customers either.  But I do agree that the
growth of a company makes it much harder to do business with in an
efficient way.  In that regard, Xilinx is no exception.  

A larger company has the option of redesigning a product with an
alternate FPGA if a vendor switches to the "dark side".  But the small
company with lower volumes does not have that luxury until the problem
becomes untenable.  Even then schedules may preclude such a change.  In
those cases, the small business is just SOL.  That is why all new boards
use as few parts as possible that can not be replaced.  I much prefer to
not use serial proms of any kind and like to keep the FPGAs as generic
as possible.  The Xilinx parts would have had an advantage on our new
board, but they are not supporting modular configuration on the Spartan
3s and so they are the same to us as the Cyclone chips at this point.  

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 61361
Subject: High-performance workstation
From: "DK" <dknews@ueidaq.com>
Date: Thu, 2 Oct 2003 11:43:02 -0400
Links: << >> << T >> << A >>


Hi,

could somebody suggest hi-performance workstation I can use to compile my
designs in a fastest possible way?

I use Altera Quartus III and EP1C12Q240 device. It 90% full with most of the
memory used. My Athlon XP 1800+ on ASUS A7V266-C motherboard with 512M of
RAM takes 8/23 minutes to compile/fit. I did try dual Xeon 2200 on TYAN MB
and it did only 5% faster..

Did somebody try Athlon 64-bit processor?

-- 
Dennis

Article: 61362
Subject: Re: ISE WebPack 6.1 Impact problem
From: "Andras Tantos" <andras_tantos@tantos.yahoo.com>
Date: Thu, 2 Oct 2003 08:48:51 -0700
Links: << >> << T >> << A >>

> >Could be a typing error, I already found one. In ECS 6.1i (the schematic
> >editor), if you put the USELOWSKEWLINES=TRUE attribute on a signal, the
> >resulting .vhf file contains a spelling error that keeps the design from
> >synthesizing (SIGNAL is spelled "SIGANL").
>
> Interesting.  I wonder how that one got past testing.  I'll bet it
> gets tested now/tomorrow.
>
> Many years ago, a friend described a neat trick for testing software.
> The idea was to make sure the each line of code was at
> least exerecised in order to find the gross bugs.  He started with
> a clean listing.  Install breakpoints.  When you get there, mark
> that line and the rest of the block, that is until the next
> branch, skip, return or such.  Eventually, you have marked off
> all the easy stuff and now you have to generate test cases
> to tickle the hard/obscure ones.
>
> This is just the software version of making sure your test bench
> wiggles all the signals at least once.
>

If software testing would be that easy. Actually there are tools (code
coverage tools) that are there for this very purpose: measure the percentage
of blocks (lines) that were executed during a test.

However it's not that easy at all to reach 100% coverage. You might need
fault injection and other techniques, and even with that some SW constructs
cannot be covered 100% (think about testing a SW for *all possible*
out-of-memory conditions or for *all possible* floating-point exceptions).

And in many cases the problem is not the branches that are there (and thus
can be tested by code-coverage) but the ones that are missing: unchecked
special conditions, return values, etc.

Anyway, I'm sure you all know all this...

This bug should have been found though.

Regards,
Andras Tantos

Article: 61363
Subject: Re: Digesting runs of ones or zeros "well"
From: johnhandwork@mail.com (John_H)
Date: 2 Oct 2003 08:51:34 -0700
Links: << >> << T >> << A >>

rickman <spamgoeshere4@yahoo.com> wrote in message news:<3F7B3F59.191901A4@yahoo.com>...
> My experience has been that it does not much matter how you code
> combinatorial logic like this.  The tools run it through a grinder and
> produce an optimal version (in its own mind).  When I want to optimize
> like this, I either use a "keep" attribute on the wire, or sometimes you
> can instantiate primitives.  For logic I don't think primitives work
> since gates just get remapped.

I overuse the syn_keep attribute and I hate the idea of instantiating
LUTs.  My Carnot skills aren't exactly used regularly.

> But I still don't understand your code.  Why does the outer loop range
> over 64 values.

I've had problems with bit ranges in the past where [i+4:i] is a
complaint.  Perhaps this isn't an issue with for loops but I've
learned to avoid them in general logic.  They do work fine in generate
blocks, however.  I stepped through every bit to make a comparison to
the adjacent bit; 3 adjacent comparisons lumped into one variable
(with an eventual syn_keep) would give me 4-input functions that
should pack into LUTs.  The complex end of the inside loop is so that
the three "LUTs" per byte are 4-input, 4-input, and 3-input functions.

> I would code two nested loops where the outer loop
> ranges over the 8 outputs and the inner loop ranges over the 9 inputs
> for each output.  Or just skip the inner loop and use two outputs from
> two sets of four inputs feeding a 3 input function and use keeps on the
> first two output arrays.  Maybe that is what you are doing, but I can't
> figure out the code easily.  
> 
> I see you are incrementing the i variable by j and ranging j in the
> second loop by some complex control expression.  Can't you just
> increment i by 8?  
> 
> for( i=0; i<64; i=i+8 ) begin
>   k = i % 8;
>   for( j=0; j<4; j=j+1 ) begin
>     runBitsA_[k] = runBitsA_[k] & bytePlus1[i+j];
>     runBitsB_[k] = runBitsB_[k] & bytePlus1[i+j+4];
>   end
>   runByte_[i] = runBitsA_[i] & runBitsB_[k] & bytePlus1[i+9];
> end
> 
> Put the keep on runBitsA_ and runBitsB_ and you should get your two
> level structure.  

This works very well for runs of ones only.  I need to identify runs
of ones or runs of zeros.  The technique can be expanded to my needs
resulting in runBitsA, B, and C where one of them needs to cover 2
comparisons, not 3 like the others.  ...which is really is the
approach I was coding but using consecutive bits in a vector rather
than {A,B,C} and using the one statement rather than 3 to make the
assignments, dealing with the 2 comparison exception by terminating
the inside loop early.

Thanks for the help.

Article: 61364
Subject: Re: Digesting runs of ones or zeros "well"
From: johnhandwork@mail.com (John_H)
Date: 2 Oct 2003 08:58:47 -0700
Links: << >> << T >> << A >>

"Vinh Pham" <a@a.a> wrote in message news:<XcSeb.39218$5z.21702@twister.socal.rr.com>...
> Whoopsy, brain-fart.  My previous code will create 3 levels of logic.  If we
> didn't have to detect both nine 1s or nine 0s, then it'd work okay.

Thanks for noticing :-)

I like the code below with respect to its symmetry - it's a lot easier
to read than the stuff I generated.  The four 3-input LUTs feed a
single 4-input LUT with (only a) little arguement from the
synthesizer, I'm sure.  It can be done with fewer LUTs by using
4-input LUTs covering 3 compares each but then the symmetry gets lost
and the coding gets unpleasant.

I think I have an acceptable solution together that gives me good
speed and good utilization which I'll post separately.

Thanks for your thoughs with this.

> Here's an idea for one that should generate 2 levels, but it looks uglier.
> Definately not as compact as rickman's.
> 
> 
> data[64:0]  -- input signal
> ninth_bit[7:0] -- intermediate signal
> run_dibble[31:0] -- intermediate signal
> run_byte[7:0] -- output signal
> 
> 
> for byte 0...7
> 
>     ninth_bit[byte] = data[(byte+1)*8]
> 
>     for dibble 0...3
> 
>         lsb = byte*8 + dibble*2
>         msb = byte*8 + dibble*2 + 1
> 
>         if data[lsb] = ninth_bit[byte] AND data[msb] = ninth_bit[byte] then
>             run_dibble[byte*4 + dibble] = 1
>         else
>             run_dibble[byte*4 + dibble] = 0
>         end
> 
>     end loop
> 
>     lsb = byte*4
>     msb = byte*4 + 3
> 
>     if run_dibble[msb:lsb] = "1111" then
>         run_byte[byte] = 1
>     else
>         run_byte[byte] = 0
>     end
> 
> end loop

Article: 61365
Subject: Re: Digesting runs of ones or zeros "well"
From: johnhandwork@mail.com (John_H)
Date: 2 Oct 2003 09:11:10 -0700
Links: << >> << T >> << A >>

"Martin Euredjian" <0_0_0_0_@pacbell.net> wrote in message news:<FALeb.6677$fB4.1788@newssvr29.news.prodigy.com>...
> John,
> 
> 1- How many 65 bit words per second (ms, ns?) do you have to process?
  This run detection is one part of a 100MHz-200MHz mechanism.
> 2- Where do the 65 bits come from?  (internal, external)
  Internal, blindsided from BlockRAMs with a new value per cycle.
> 3- Do they get into the FPGA in parallel or serially?
  Entirely parallel, into the BlockRAMs at full width.
> 4- Why are you saying that you need two levels of logic? (trying to control
> delay with combinatorial logic is not a great idea).
  If I go from BlockRAM to registers, I have the (relatively) long
Tcko delay for the BlockRAM read and associated routing leaving little
time to manipulate the data within the period.  If I register the data
from the BlockRAM, it's best to generate and use the run values in the
next cycle requiring moe logic after I flag the runs, suggesting
minimum delay is best.
> 5- Why fight with inference?  Instantiate what primitives you need.
  The logic primitives are what I've avoided.  I don't want to use
LUT4 primitives with INIT attributes since I might mess up the carnot
map.  This is why the inference has been broken down into bits that
can be retained (with syn_keep or other method).

> Two logic levels?
> 
> Two LUT's to look at two consecutive nibbles.
> One LUT to AND the output of the above with the next most significant bit
> (the ninth bit).
> That's it.  Two levels.  24 LUT's.
> Is that what you wanted?

Almost.  The LUTs can't look at full nibbles.  Since I need to make
sure all bits are equal to each other, there's a "smear."  One attempt
was to XOR adjacent bits, then to do the 8-wide AND of the result,
letting the synth give me the "best" results.  It didn't.  Thinking
about the XOR-to-AND progression, 4 bits are needed to implement 3
bits of the AND, so two 4-bit LUTs and one 3-bit LUT are needed,
feeding a 3-input AND.

  That's it.  Two levels.  32 LUTs.
  But the synth doesn't like my inferrences.
  I think I have a solution that "works."

Article: 61366
Subject: Re: Is Xilinx Webpack 6.1 help crippled?...
From: "MM" <mbmsv@yahoo.com>
Date: Thu, 2 Oct 2003 12:17:24 -0400
Links: << >> << T >> << A >>

For those who have the same problem, here is a reply from Xilinx tech
support:
----------------------------------------------------------------------------
----------
This is a recent issue that just came up.  It looks like the "books"
directory is not installed with the Webpack download.  I've put one up on
the ftp site.
It goes in the %WebPACK%\doc\usenglish\ directory.  Here is the link to
the books.zip:

ftp://customer:xilinx@ftp.xilinx.com/download/books.zip
----------------------------------------------------------------------------
----------

The help browser though seems to be a different issue...

/Mikhail

Article: 61367
Subject: Re: Digesting runs of ones or zeros "well"
From: johnhandwork@mail.com (John_H)
Date: 2 Oct 2003 09:33:20 -0700
Links: << >> << T >> << A >>

For anyone interested in how I got things together, I ended up using
one generate loop to instantiate 8 MUXF5s.  Why MUXF5s?

1) One can make an 8-input AND with 2 LUTs and a little extra delay by
having the first 4-bit AND feed the select and the sel==0 input - if
the AND is false the result is false, if the AND is true, the result
is the other AND.

2) By using a primitive, the logic feeding the primitive's pin isn't
optimized across the primitive.

The synthesizer will produce a nice 2-level implementation for 5
compares but not 8 so splitting it up into 5 compares and 3, the MUXF5
used as an AND can give a nice balance of delays.  Its slightly more
than 2 LUTs of delay, but very slightly.  The code looks cleaner and
the implementation is tight.

===============================================================
module testRun ( input             clk
               , input      [64:0] bytesPlus1
               , output reg [ 7:0] runByte
               );

wire [ 7:0] runMux;
wire [63:0] xnorBits = bytesPlus1[63:0] ^~ bytesPlus1[64:1];
// the result of a bit compare a==b is the same as a^~b

genvar h;
generate
  for( h=0; h<8; h=h+1 )
  begin : run
    MUXF5 mux ( .O(runMux[h]), .S ( & xnorBits[h*8+2:h*8+0] )
                             , .I1( & xnorBits[h*8+7:h*8+3] )
                             , .I0( & xnorBits[h*8+2:h*8+0] ) );
  end
endgenerate

always @(posedge clk)  runByte <= runMux;

endmodule

Article: 61368
Subject: Re: Graphics rendering -- use a BRAM line buffer
From: "Jan Gray" <jsgray@acm.org>
Date: Thu, 02 Oct 2003 16:39:12 GMT
Links: << >> << T >> << A >>

A simple and workable approach is to use one or more BRAMs as a LINE buffer
and a LINE Z buffer, perhaps double buffered for simplicity. Then by sorting
(and incrementally updating on the fly) your display list of graphics
primitives by Y coordinate and then by X coordinate, you can iterate over
them and render them into the line buffer. Works fine for Gouraud shaded
filled primitives like trapezoids too. (Textures will require more memory
ports, perhaps on-chip, perhaps not.)

So long as you can render a line worth of graphics faster than you shift out
the previously rendered line, you're looking good. No frame buffer, no high
bandwidth frame buffer memory, no frame buffer memory I/Os. Just pretty
raster graphics.

With a soft CPU core to do display list management, and a simple hardware
span-filler coprocessor on the interface to the line buffer, scenes of
limited complexity seem quite doable in even a rather spartan FPGA.  For
more complexity and more performance, move the display list manager and span
edge DDAs to hardware.

See also my 1995 article on an FPGA-based rendering coprocessor:
http://fpgacpu.org/usenet/render.html.

Jan Gray, Gray Research LLC

Article: 61369
Subject: Re: High-performance workstation
From: nweaver@ribbit.CS.Berkeley.EDU (Nicholas C. Weaver)
Date: Thu, 2 Oct 2003 16:50:26 +0000 (UTC)
Links: << >> << T >> << A >>

In article <3f7af52e_1@newsfeed.slurp.net>, DK <dknews@ueidaq.com> wrote:
>could somebody suggest hi-performance workstation I can use to compile my
>designs in a fastest possible way?
>
>I use Altera Quartus III and EP1C12Q240 device. It 90% full with most of the
>memory used. My Athlon XP 1800+ on ASUS A7V266-C motherboard with 512M of
>RAM takes 8/23 minutes to compile/fit. I did try dual Xeon 2200 on TYAN MB
>and it did only 5% faster..

If it doesn't swap, getting the latest is a small but substantial
speed increase, but you aren't going to see an order of magnitude
faster anytime soon.

One of the problems is just that many of the techniques are memory
latency bound, and memory latency is not getting better.  Others are
cache bound, and the Athlons have pretty good cache, but cache is
still a bottleneck.

AFAIK, none of the programs are yet dual-processor or SMT optimized,
if they were, a dual SMT P4 machine would be good, but as I said, not
currently.

>Did somebody try Athlon 64-bit processor?

The 64 bit athlon's real improvement is going to be on address space
size, which will matter on the largest designs, not on performance.
-- 
Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Article: 61370
Subject: Re: Digesting runs of ones or zeros "well"
From: Goran Bilski <goran@xilinx.com>
Date: Thu, 02 Oct 2003 19:10:37 +0200
Links: << >> << T >> << A >>

Hi,

Why not use the carry-chain?

You can do any kind of detection on that primitive and it will save you LUTs

Göran

John_H wrote:

>For anyone interested in how I got things together, I ended up using
>one generate loop to instantiate 8 MUXF5s.  Why MUXF5s?
>
>1) One can make an 8-input AND with 2 LUTs and a little extra delay by
>having the first 4-bit AND feed the select and the sel==0 input - if
>the AND is false the result is false, if the AND is true, the result
>is the other AND.
>
>2) By using a primitive, the logic feeding the primitive's pin isn't
>optimized across the primitive.
>
>The synthesizer will produce a nice 2-level implementation for 5
>compares but not 8 so splitting it up into 5 compares and 3, the MUXF5
>used as an AND can give a nice balance of delays.  Its slightly more
>than 2 LUTs of delay, but very slightly.  The code looks cleaner and
>the implementation is tight.
>
>===============================================================
>module testRun ( input             clk
>               , input      [64:0] bytesPlus1
>               , output reg [ 7:0] runByte
>               );
>
>wire [ 7:0] runMux;
>wire [63:0] xnorBits = bytesPlus1[63:0] ^~ bytesPlus1[64:1];
>// the result of a bit compare a==b is the same as a^~b
>
>genvar h;
>generate
>  for( h=0; h<8; h=h+1 )
>  begin : run
>    MUXF5 mux ( .O(runMux[h]), .S ( & xnorBits[h*8+2:h*8+0] )
>                             , .I1( & xnorBits[h*8+7:h*8+3] )
>                             , .I0( & xnorBits[h*8+2:h*8+0] ) );
>  end
>endgenerate
>
>always @(posedge clk)  runByte <= runMux;
>
>endmodule
>  
>

Article: 61371
Subject: Re: High-performance workstation
From: Mike Treseler <mike.treseler@flukenetworks.com>
Date: Thu, 02 Oct 2003 10:12:03 -0700
Links: << >> << T >> << A >>

DK wrote:
> Hi,
> 
> could somebody suggest hi-performance workstation I can use to compile my
> designs in a fastest possible way?
> 
> I use Altera Quartus III and EP1C12Q240 device. It 90% full with most of the
> memory used. My Athlon XP 1800+ on ASUS A7V266-C motherboard with 512M of
> RAM takes 8/23 minutes to compile/fit. I did try dual Xeon 2200 on TYAN MB
> and it did only 5% faster..

Your machine is fine. 23 minutes is not bad.
Consider using more simuation before synthesis.
A sim is 10x faster than a synth.
Consider loading suse or redhat linux  and running Quartus/linux
(can dual boot win/linux if you like)

  --Mike Treseler

Article: 61372
Subject: Re: Digesting runs of ones or zeros "well"
From: rickman <spamgoeshere4@yahoo.com>
Date: Thu, 02 Oct 2003 13:53:06 -0400
Links: << >> << T >> << A >>

John_H wrote:
> 
> rickman <spamgoeshere4@yahoo.com> wrote in message news:<3F7B3F59.191901A4@yahoo.com>...
> > My experience has been that it does not much matter how you code
> > combinatorial logic like this.  The tools run it through a grinder and
> > produce an optimal version (in its own mind).  When I want to optimize
> > like this, I either use a "keep" attribute on the wire, or sometimes you
> > can instantiate primitives.  For logic I don't think primitives work
> > since gates just get remapped.
> 
> I overuse the syn_keep attribute and I hate the idea of instantiating
> LUTs.  My Carnot skills aren't exactly used regularly.

Actually, I don't think logic primatives will work since the back end
mapper can redo logic at will.  The keep attribute is what is required
to define the LUTs and even that is not guaranteed since it only results
in a wire being kept; the LUT can still be split if other logic uses the
same inputs.  


> > But I still don't understand your code.  Why does the outer loop range
> > over 64 values.
> 
> I've had problems with bit ranges in the past where [i+4:i] is a
> complaint.  Perhaps this isn't an issue with for loops but I've
> learned to avoid them in general logic.  They do work fine in generate
> blocks, however.  I stepped through every bit to make a comparison to
> the adjacent bit; 3 adjacent comparisons lumped into one variable
> (with an eventual syn_keep) would give me 4-input functions that
> should pack into LUTs.  The complex end of the inside loop is so that
> the three "LUTs" per byte are 4-input, 4-input, and 3-input functions.

I don't really see what problem you are trying to solve with that, but
then I am not as well versed in verilog compared to my VHDL. 


> > I would code two nested loops where the outer loop
> > ranges over the 8 outputs and the inner loop ranges over the 9 inputs
> > for each output.  Or just skip the inner loop and use two outputs from
> > two sets of four inputs feeding a 3 input function and use keeps on the
> > first two output arrays.  Maybe that is what you are doing, but I can't
> > figure out the code easily.
> >
> > I see you are incrementing the i variable by j and ranging j in the
> > second loop by some complex control expression.  Can't you just
> > increment i by 8?
> >
> > for( i=0; i<64; i=i+8 ) begin
> >   k = i % 8;
> >   for( j=0; j<4; j=j+1 ) begin
> >     runBitsA_[k] = runBitsA_[k] & bytePlus1[i+j];
> >     runBitsB_[k] = runBitsB_[k] & bytePlus1[i+j+4];
> >   end
> >   runByte_[i] = runBitsA_[i] & runBitsB_[k] & bytePlus1[i+9];
> > end
> >
> > Put the keep on runBitsA_ and runBitsB_ and you should get your two
> > level structure.
> 
> This works very well for runs of ones only.  I need to identify runs
> of ones or runs of zeros.  The technique can be expanded to my needs
> resulting in runBitsA, B, and C where one of them needs to cover 2
> comparisons, not 3 like the others.  ...which is really is the
> approach I was coding but using consecutive bits in a vector rather
> than {A,B,C} and using the one statement rather than 3 to make the
> assignments, dealing with the 2 comparison exception by terminating
> the inside loop early.

Again, I may not completely understand your problem.  This was intended
to show you how to solve the problem.  To cover the adjacent zeros, you
just do the same logic using the OR operator and invert the result.  

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 61373
Subject: Re: Good VHDL/Verilog editor?
From: tlee@post.com (lee)
Date: 2 Oct 2003 11:06:15 -0700
Links: << >> << T >> << A >>

http://www.crimsoneditor.com/

and it's free!

--Lee

"Valentin Tihomirov" <valentin@abelectron.com> wrote in message news:<3f7c1470_1@news.estpak.ee>...
> > I still like Aldec for design entry.  Editor is very much studio editor
> > like, plus you can run sims right there as well as integrate in the rest
> > of your tool flow.  For the price, I think it is a great value.
> 
> Heh, I appretiate Aldec cos it is not studio-like as opposed to Xilinx's
> WebPack. And more feature rich and its level ov integration of different
> tools (like jumping to errorous code line and more).

Article: 61374
Subject: Re: Graphics rendering -- use a BRAM line buffer
From: "Martin Euredjian" <0_0_0_0_@pacbell.net>
Date: Thu, 02 Oct 2003 18:19:45 GMT
Links: << >> << T >> << A >>

Thanks, I'll go look at your article.


-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Martin Euredjian

To send private email:
0_0_0_0_@pacbell.net
where
"0_0_0_0_"  =  "martineu"


"Jan Gray" <jsgray@acm.org> wrote in message
news:QwYeb.11170$RW4.4195@newsread4.news.pas.earthlink.net...
> A simple and workable approach is to use one or more BRAMs as a LINE
buffer
> and a LINE Z buffer, perhaps double buffered for simplicity. Then by
sorting
> (and incrementally updating on the fly) your display list of graphics
> primitives by Y coordinate and then by X coordinate, you can iterate over
> them and render them into the line buffer. Works fine for Gouraud shaded
> filled primitives like trapezoids too. (Textures will require more memory
> ports, perhaps on-chip, perhaps not.)
>
> So long as you can render a line worth of graphics faster than you shift
out
> the previously rendered line, you're looking good. No frame buffer, no
high
> bandwidth frame buffer memory, no frame buffer memory I/Os. Just pretty
> raster graphics.
>
> With a soft CPU core to do display list management, and a simple hardware
> span-filler coprocessor on the interface to the line buffer, scenes of
> limited complexity seem quite doable in even a rather spartan FPGA.  For
> more complexity and more performance, move the display list manager and
span
> edge DDAs to hardware.
>
> See also my 1995 article on an FPGA-based rendering coprocessor:
> http://fpgacpu.org/usenet/render.html.
>
> Jan Gray, Gray Research LLC
>
>

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search