Messages from 82550

Article: 82550
Subject: Re: ISE 7.1 for 64 bit Linux ???
From: Rudolf Usselmann <russelmann@hotmail.com>
Date: Thu, 14 Apr 2005 11:45:14 +0700
Links: << >> << T >> << A >>

Eric Smith wrote:

> Rudolf Usselmann <russelmann@hotmail.com> writes:
>> Unfortunately I am using a "unsupported OS" (FC3) ...
>> So I guess I am out of luck using 7.1 64 bit ...
>> 
>> I have tried making sure all xilinx environment variables
>> are not set, and that the installation directory is empty -
>> I am still getting seg. fault ...
> 
> Strange, I'm using FC3 and it works for me, with one inconsequential
> problem.  It needed older versions of some libraries before it would
> load, but after I installed those it was fine.  I posted about that on
> March 14:
> 
>   http://groups-beta.google.com/group/comp.arch.fpga/msg/4b592cb14bad823f

Eric,

I read your post and did try the things you suggest, but like
you, I too was NOT successful, installing the 64 bit version of
ISE 7.1.

Best Regards,
rudi
=============================================================
Rudolf Usselmann,  ASICS World Services,  http://www.asics.ws
Your Partner for IP Cores, Design, Verification and Synthesis

Article: 82551
Subject: Re: Regarding driving of SCL and SDA pins of I2C
From: "Keith" <chen.evans@gmail.com>
Date: 13 Apr 2005 22:05:48 -0700
Links: << >> << T >> << A >>

You can find everything that you want to know at the Philips site:

http://www.semiconductors.philips.com/markets/mms/protocols/i2c/

I recommend that you download and read the specification first. You can
probably safely ignore anything related to multimaster or high-speed
mode, but be aware that these exist. They also have discussion forums
where you can ask questions.

-Keith

Article: 82552
Subject: Re: PPC405 Performance Monitoring
From: "Nju Njoroge" <njoroge@stanford.edu>
Date: 13 Apr 2005 22:30:06 -0700
Links: << >> << T >> << A >>


Anthony Mahar wrote:
> Nju Njoroge wrote:
>
> Interesting question for the "Monitoring Capsule Design" paper...
they
> state they monitor behavior "between the CPU and L1 Dcache."  Did
they
> explain how they were able to do this, since the PPC405 and L1 are
part
> of the same hard core?
>
You are right--the CPU and the L1 cache are in the same hard core, so
we don't have access to the interface inside the CPU core and the
cache. As I described in my previous post, they placed their monitor at
the interface of the L1 cache port that are usually connected to the
PLB. Thus, instead of connecting their CPU to the PLB bus, they
connected the PPC core to their monitor, which is then connected to the
PLB.

NN

Article: 82553
Subject: Re: Regarding driving of SCL and SDA pins of I2C
From: Bob Monsen <rcsurname@comast.net>
Date: Wed, 13 Apr 2005 22:31:56 -0700
Links: << >> << T >> << A >>

On Wed, 13 Apr 2005 23:14:58 -0400, Mark Jones wrote:

> James Beck wrote:
>> In article <upWdnTeAPOxSqcDfRVn-sA@buckeye-express.com>, abuse@127.0.0.1 
>> says...
>> 
>>>praveen.kantharajapura@gmail.com wrote:
>>>
>>>>Hi all,
>>>>
>>>>This is a basic qustion regarding SDA and SCL pins.
>>>>Since both these pins are bidirectional these should pins need to be
>>>>tristated , so that the slave can acknowledge on SDA.
>>>
>>>
>>> No, both pins are not bidirectional. Only the master device drives the SCK
>>>line, and all slaves must leave their SCK's as input.
>> 
>> 
>> Not true, a slave device can extend a cycle through clock stretching and 
>> the only way to do that is for the slave device to be able to hold the 
>> clock line low.
>> 
>> http://www.i2c-bus.org/clockstretching/
>> 
> 
>  Explain that to a Noob.
> 
>  Please.

A slave device can use the clock as a primitive flow control mechanism.

If the slave takes some amount of time to process a byte, it can prevent
the master from starting the next byte by simply holding the clock low.
The master cannot clock the data until the clock is released by the slave.

I2C clocks are not really clocks, they are 'data valid' signals. They
don't have to go at any a particular rate, and aren't constrained by
anything except setup and hold times for the devices.

Regards,
 Bob Monsen

Article: 82554
Subject: Re: PPC405 Performance Monitoring
From: "Nju Njoroge" <njoroge@stanford.edu>
Date: 13 Apr 2005 22:34:12 -0700
Links: << >> << T >> << A >>

Anthony Mahar wrote:
> Nju Njoroge wrote:
> > Anthony Mahar wrote:
> >
> >>Hello,
> >>
> >>Is there a way to do performance monitoring on the PPC405 in the
> >
> > Virtex
> >
> >>II Pro?  I am specifically interested in cache hits.
> >>
> >>I have wedged my own device between the CPU's instruction and data
> >
> > PLB
> >
> >>interfaces and can currently get cache misses.  But I need to find
a
> >
> > way
> >
> >>to determine cache hits of an application running under an
operating
> >>system.
> >>
> >>If it was stand alone I could figure that information out by the
> >
> > number
> >
> >>of load and store instructions, but this is an operating system
with
> >>context switches, interrupt handlers, etc.
> >>
> >>Is there a way to gather this information?  There did not seem to
be
> >
> > any
> >
> >>performance monitoring registers as seen with newer PowerPC and x86
> >>systems.  Can the trace port be used to passively monitor execution
> >
> > for
> >
> >>load/store instructions?
> >
> >
> > Unfortunately, I have few answers to your questions. However, I
know of
> > a research group in Georgia Tech that is designing/designed a
memory
> > access monitor, which sounds similar to yours. You may want to
> > correspond with them to exchange notes. I learned of their monitor
at
> > the HPCA 2005 FPGA workshop. Here is a link to the workshop
> > http//cag.csail.mit.edu/warfp2005/. A link to the workshop
> > presentations is here at
> > http//cag.csail.mit.edu/warfp2005/program.html. Their presentation
was
> > titled "Evaluating System wide Monitoring Capsule Design using
Xilinx
> > Virtex II Pro FPGA". Their paper has their contact information.
> >
> > As for the trace port, I have used it with a IBM/Agilent RISCWatch
(RW)
> > box, which collects a dynamic trace of the instructions over 8
million
> > CPU cycles. The main limitation is that it only works for stand
alone
> > apps. When you have virtual memory enabled (while running Linux for
> > instance), RW uses the TLB to conduct the virtual to physical
address
> > translations. This is great for regular code. However, when an
> > interrupt is detected, the CPU converts to using physical addresses
for
> > the interrupt handler. Unfortunately, RW continues to use the TLB
so it
> > tries to translate physical addresses, for which no "translations"
> > exists, so RW is unable to resolve  interrupt handler instructions.
> > After this point, the trace is corrupted. In any case, if you are
> > interested in learning more about RW, you can refer to this appnote
> > http//direct.xilinx.com/bvdocs/appnotes/xapp545.pdf. It has links
to
> > all manuals for the RW box and its tools.
> >
> > Lastly, for my own curiosity, how difficult was it to design and
debug
> > your monitor? The guy I spoke to from Georgia Tech at the workshop
said
> > they used Chipscope to learn the protocol (along with IBM's PLB
spec).
> > He claims that this was a painstaking process.
> >
> > NN
> >
>
> Thank you Nju,
>
> I am going to dig into those docs right now.
>
> My design was not intended to be a monitor, but an active bus
> transaction modifier.  On certain transactions, I have to perform
> certain operations on the data going to the PPC405.  This means I
> selectively pass data through, or perform some higher latency
operations.
>
> Since I am currently interested in cache-miss performance, I only
count
> the number of transaction requests from L1 cache.  Because it is an
> individual word that caused the instruction miss, all other words
> retrieved in the transaction are, of course, not considered as a
miss.
> This makes it extremely easy to monitor the number of transaction
> requests.
>
> While the module is an active component between the CPU and PLB, it
is
> very easy to add a passive monitor once you have a way to have the
EDK
> inject the monitor in the middle.  For myself, It required some time
to
> understand the EDK .mpd format and effectively create a PLB-PLB
bridge
> (no logic, pure pass through), and there may be better ways with the
> "transparent" bus format that I haven't had time to look into.  But
at
> the time it was also my first EDK peripheral.
>
If I understand correctly, you are saying that your transaction
modifier acts as a PLB Bus to PLB Bus bridge. So, in the EDK project,
you connected the CPU to a PLB bus, then connected your module to that
PLB bus and then connected another PLB bus on the other side of your
pcore?

CPU <->PLB Bus -> your pcore <-> PLB BUS <-> Memory (Cache/BRAM)

If my understanding is correct, you in essence designed a PLB-PLB
bridge, like the PLB-OPB bridge, right?

In our research, we also designed a PLB to PLB bridge. Our pcore was
initially a pass-through in between the two buses, then we placed our
real module when we got the pass-through running.

The guys from Georgia Tech, however, interfaced their monitor module
directly with PPC's PLB ports, so they couldn't use EDK's abstraction
of the bus protocol through the PLB IPIF module. In fact, they had to
synthesize their project in ISE since EDK wouldn't support what they
were trying to do. That's why they had to use ChipScope to really see
what the processor does.
> As for 'learning' the PLB system, I found the IBM CoreConnect Bus
> Functional Model (BFM) for the PLB, with the PLB doc, to be
instrumental
> in observing every kind of transaction I had to handle.  I think the
BFM
> would be far easier than using ChipScope/Docs alone.  The BFM allows
the
> generation of almost any kind of cycle-accurate PLB transaction a
master
> and slave can use.
>
> One other model I would like to begin using is the Xilinx provided
> PPC405 swift model, which will allow the same code used by the real
> processor to run on the simulation swift model simulation.  This will

> cause PLB transactions to occur in the same way they will on the real

> system, i.e. cache line fills based on the PPC405 MMU's state, etc.
>
In designing our pass-through, we used the swift models. I definitely
recommend learning how to use them. The swift models allow you to
conduct full-system simulations. As for the BFM's, we weren't able to
use them for our pcore since EDK 6.3i IPIF Create/Import wizard didn't
support the use of Verilog modules (7.1 now supports this). We could
have hacked this by using a netlist, but you cannot pass
parameters/generics into a netlist, which is a feature that is required
for our pcore.  I have used the BFM's for a VHDL module I worked on in
the past and I agree that they too were helpful. 

NN

Article: 82555
Subject: Re: Regarding driving of SCL and SDA pins of I2C
From: Bob Monsen <rcsurname@comast.net>
Date: Wed, 13 Apr 2005 22:37:58 -0700
Links: << >> << T >> << A >>

On Wed, 13 Apr 2005 05:30:39 -0700, praveen.kantharajapura wrote:

> Hi all,
> 
> This is a basic qustion regarding SDA and SCL pins.
> Since both these pins are bidirectional these should pins need to be
> tristated , so that the slave can acknowledge on SDA.
> 
> But i have seen in some docs that a '1' need to be converted to a 'Z'
> while driving on SDA and SCL, what is the reason behind this????
> 
> Thanks in advance,
> Praveen

You need a primer in I2C.

If you are the only master, and drive the SDA wire to 1, nothing bad will
happen unless, the slave thinks it's supposed to ack at the wrong time, at
which time you'll get a short between the master and slave. The standard
specifies resistors you can add to keep the devices from getting damaged
in this case.

However, you CANNOT drive SCL to 1, because a slave is allowed to hold you
off by driving it low. You have to notice this, so driving it high is not
going to work unless you make the clock slow enough so that the slave will
never hold you off.

In most I2C applications, the bus and clock should only go high when
nobody is pulling it low.

There are 'fast' modes of I2C which may not obey this restriction.
However, a device starts off in the pullup mode, and then switches over, I
believe.

----
Regards,
 Bob Monsen

Article: 82556
Subject: Re: "The ISE 7.1 Experience"
From: "Jochen" <JFrensch@HarmanBecker.com>
Date: 13 Apr 2005 22:47:05 -0700
Links: << >> << T >> << A >>

Ralf Duschef wrote:
> There are plenty of issues in 7.1 SP1.

yes, there are :-(

>  Handle with care!
good advice - use it, but carefully

Austin Lesea wrote:
> > > Have you logged this into the hotline as a case?  Best way to
address
> > > new software glitches is to report them.

Hopefully, it's not only me, reporting 'glitches' (!?!) to Xilinx...
Maybe SP2 has 'stabelized'.

b.t.w. - heard about 'glitches' only in HW-Design before !!!

Jochen

Article: 82557
Subject: Re: PPC405 Performance Monitoring
From: "Nju Njoroge" <njoroge@stanford.edu>
Date: 13 Apr 2005 23:22:32 -0700
Links: << >> << T >> << A >>

Anthony Mahar wrote:
> Nju Njoroge wrote:
> > Anthony Mahar wrote:
> >
> >>Hello,
> >>
> >>Is there a way to do performance monitoring on the PPC405 in the
> >
> > Virtex
> >
> >>II Pro?  I am specifically interested in cache hits.
> >>
> >>I have wedged my own device between the CPU's instruction and data
> >
> > PLB
> >
> >>interfaces and can currently get cache misses.  But I need to find
a
> >
> > way
> >
> >>to determine cache hits of an application running under an
operating
> >>system.
> >>
> >>If it was stand alone I could figure that information out by the
> >
> > number
> >
> >>of load and store instructions, but this is an operating system
with
> >>context switches, interrupt handlers, etc.
> >>
> >>Is there a way to gather this information?  There did not seem to
be
> >
> > any
> >
> >>performance monitoring registers as seen with newer PowerPC and x86
> >>systems.  Can the trace port be used to passively monitor execution
> >
> > for
> >
> >>load/store instructions?
> >
> >
> > Unfortunately, I have few answers to your questions. However, I
know of
> > a research group in Georgia Tech that is designing/designed a
memory
> > access monitor, which sounds similar to yours. You may want to
> > correspond with them to exchange notes. I learned of their monitor
at
> > the HPCA 2005 FPGA workshop. Here is a link to the workshop
> > http//cag.csail.mit.edu/warfp2005/. A link to the workshop
> > presentations is here at
> > http//cag.csail.mit.edu/warfp2005/program.html. Their presentation
was
> > titled "Evaluating System wide Monitoring Capsule Design using
Xilinx
> > Virtex II Pro FPGA". Their paper has their contact information.
> >
> > As for the trace port, I have used it with a IBM/Agilent RISCWatch
(RW)
> > box, which collects a dynamic trace of the instructions over 8
million
> > CPU cycles. The main limitation is that it only works for stand
alone
> > apps. When you have virtual memory enabled (while running Linux for
> > instance), RW uses the TLB to conduct the virtual to physical
address
> > translations. This is great for regular code. However, when an
> > interrupt is detected, the CPU converts to using physical addresses
for
> > the interrupt handler. Unfortunately, RW continues to use the TLB
so it
> > tries to translate physical addresses, for which no "translations"
> > exists, so RW is unable to resolve  interrupt handler instructions.
> > After this point, the trace is corrupted. In any case, if you are
> > interested in learning more about RW, you can refer to this appnote
> > http//direct.xilinx.com/bvdocs/appnotes/xapp545.pdf. It has links
to
> > all manuals for the RW box and its tools.
> >
> > Lastly, for my own curiosity, how difficult was it to design and
debug
> > your monitor? The guy I spoke to from Georgia Tech at the workshop
said
> > they used Chipscope to learn the protocol (along with IBM's PLB
spec).
> > He claims that this was a painstaking process.
> >
> > NN
> >
>
> Thank you Nju,
>
> I am going to dig into those docs right now.
>
> My design was not intended to be a monitor, but an active bus
> transaction modifier.  On certain transactions, I have to perform
> certain operations on the data going to the PPC405.  This means I
> selectively pass data through, or perform some higher latency
operations.
>
> Since I am currently interested in cache-miss performance, I only
count
> the number of transaction requests from L1 cache.  Because it is an
> individual word that caused the instruction miss, all other words
> retrieved in the transaction are, of course, not considered as a
miss.
> This makes it extremely easy to monitor the number of transaction
> requests.
>
> While the module is an active component between the CPU and PLB, it
is
> very easy to add a passive monitor once you have a way to have the
EDK
> inject the monitor in the middle.  For myself, It required some time
to
> understand the EDK .mpd format and effectively create a PLB-PLB
bridge
> (no logic, pure pass through), and there may be better ways with the
> "transparent" bus format that I haven't had time to look into.  But
at
> the time it was also my first EDK peripheral.
>
If I understand correctly, you are saying that your transaction
modifier acts as a PLB Bus to PLB Bus bridge. So, in your XPS project,
you connected the CPU to a PLB bus, then connected your module to that
PLB bus and then connected another PLB bus on the other side of your
pcore? I assume you also used Create/Import IPIF Wizard, right.

CPU <->PLB Bus -> your pcore <-> PLB BUS <-> Memory (Cache/BRAM)

If my understanding is correct, you in essence designed a PLB-PLB
bridge, as in the diagram above.

In our research, we also designed a PLB to PLB bridge. Our pcore was
initially a pass-through in between the two buses, then we placed our
real RTL when we got the pass-through working.

The guys from Georgia Tech, however, interfaced their monitor module
directly with PPC's PLB ports, so they couldn't use EDK's abstraction
of the bus protocol through the PLB IPIF module. In fact, they had to
synthesize their project in ISE since EDK wouldn't support what they
were trying to do. That's why they had to use ChipScope to really see
what the processor does.
> As for 'learning' the PLB system, I found the IBM CoreConnect Bus
> Functional Model (BFM) for the PLB, with the PLB doc, to be
instrumental
> in observing every kind of transaction I had to handle.  I think the
BFM
> would be far easier than using ChipScope/Docs alone.  The BFM allows
the
> generation of almost any kind of cycle-accurate PLB transaction a
master
> and slave can use.
>
> One other model I would like to begin using is the Xilinx provided
> PPC405 swift model, which will allow the same code used by the real
> processor to run on the simulation swift model simulation.  This will

> cause PLB transactions to occur in the same way they will on the real

> system, i.e. cache line fills based on the PPC405 MMU's state, etc.
>
In designing our pass-through, we used the swift models. I definitely
recommend learning how to use them. The swift models allow you to
conduct full-system simulations. As for the BFM's, we weren't able to
use them for our pcore since EDK 6.3i IPIF Create/Import wizard didn't
support the use of Verilog modules (7.1 supports this now). We could
have hacked this by using a netlist, but you cannot pass
parameters/generics into a netlist, which is a feature we require for
our pcore.  I have used the BFM's for a VHDL module I worked on in the
past and I agree that they too were helpful. 
NN

Article: 82558
Subject: Re: virtex4 reconfiguration time
From: "Antti Lukats" <antti@openchip.org>
Date: Thu, 14 Apr 2005 08:39:13 +0200
Links: << >> << T >> << A >>

"Marc Randolph" <mrand@my-deja.com> schrieb im Newsbeitrag
news:1113444520.108760.133130@l41g2000cwc.googlegroups.com...
>
> Antti Lukats wrote:
> > "Stephane" <stephane@nospam.fr> schrieb im Newsbeitrag
> > news:d3jggu$kpo$1@ellebore.extra.cea.fr...
> > > Antti Lukats wrote:
> > > > "Stephane" <stephane@nospam.fr> schrieb im Newsbeitrag
> > > > news:d3j43r$e32$1@ellebore.extra.cea.fr...
> > > >
> > > I don't agree with you: here are the 32 configuration data bits:
> > >
> > > PAD209 X27Y127     IOB_X1Y127 F14 1 IO_L1P_D31_LC_1
> >
> > those are Local Clock, the SelectMAP is 8 bit wide !!!!
>
> Actually the OP is correct - that IS supposed to be a 32-bit SelectMAP
> interface... the ug075.pdf pinout document discusses it briefly.  I
> don't blame everone for being confused about it though - Xilinx makes
> just enough mention of it that you wonder if it might work, but when I
> asked my trusty FAE about it a few months ago, he said it is not
> supported at this time.

wopla! I did see the paramter of bus width on the ICAP V4, but in ALL DOCs
the selectmap is defined as 8 bit, that is on ALL DOCs except the pinouts
docs!

> Also, _LC pins are Low Capacitance pins (can't do LVDS output).  Local
> clock pins are called _CC (for Clock Capable).  Global clocks are
> thankfully _GC.

ah I was looking at the list of pins that contained _LC and _CC mixture so I
messed the two

> > > >>so the minimum reconfiguration time for this part should be a
> little bit
> > > >>more than 7.4/100/32 = 2.3ms
>
> 7.4/100/8 = 9.25 ms, plus a little at the beginning and end.  I'd
> budget at least 10ms, maybe a few more.
>
> Have fun!
>
>    Marc
>

Article: 82559
Subject: Re: Regarding driving of SCL and SDA pins of I2C
From: "Antti Lukats" <antti@openchip.org>
Date: Thu, 14 Apr 2005 08:40:42 +0200
Links: << >> << T >> << A >>

<praveen.kantharajapura@gmail.com> schrieb im Newsbeitrag
news:1113452931.790055.285850@f14g2000cwb.googlegroups.com...
>
> Antti Lukats wrote:
> > <praveen.kantharajapura@gmail.com> schrieb im Newsbeitrag
> > news:47cf10b7.0504130430.9a34497@posting.google.com...
> > > Hi all,
> > >
> > > This is a basic qustion regarding SDA and SCL pins.
> > > Since both these pins are bidirectional these should pins need to
> be
> > > tristated , so that the slave can acknowledge on SDA.
> > >
> > > But i have seen in some docs that a '1' need to be converted to a
> 'Z'
> > > while driving on SDA and SCL, what is the reason behind this????
> > >
> > > Thanks in advance,
> > > Praveen
> >
> > well in order to drive '1' (EXERNAL RESISTIVE PULLUP) you need to Z
> the wire
> > eg tristate it.
> > 0 is driven as 0
> > 1 is driven (or relased) as Z, ext pullup will pull the wire high
>
> In order to drive a '1' , i will not tristate it to 'Z' i will drive a
> '1' only.
> Any issues(Hardware malfunction) if i drive a'1' instead of 'Z'
>
> >
> > Antti

YES, the other poster. The master should drive 0 and Z at least on the SDA
pin
and only in the case that there is no multimastering and no clock stretching
it is ok to drive 0 and 1 on the SCL

antti

Article: 82560
Subject: Re: Reverse engineering masked ROMs, PLAs
From: langwadt@ieee.org (Lasse Langwadt Christensen)
Date: 14 Apr 2005 00:00:38 -0700
Links: << >> << T >> << A >>

Kelly Hall <khall@acm.org> wrote in message news:<QA07e.2350$dT4.172@newssvr13.news.prodigy.com>...
> Delbert Cecchi wrote:
> 
> > I was referring to the US Electronic Intelligence or something plane
> > that got kidnapped out of international airspace near china and forced
> > to land.  Got the crew back in a while.  As I recall we got the airframe
> > back in boxes.  It was rumored the crew didn't have enough time to
> > destroy all.  Probably within last 10 or so years.  Google should turn
> > it up.  EC137 may have been the aircraft type.
> 
> A Chinese F-8 and a US EP-3 collided during an intercept; the F-8 was 
> lost and the EP-3 performed an emergency landing at Hainan airfield.  A 
> fairly standard cock-up between great powers.
> 
> Kelly

the theme for this episode of Jag:
http://www.tvtome.com/tvtome/servlet/GuidePageServlet/showid-242/epid-99581/

though the ending is a bit different ;)

-Lasse

Article: 82561
Subject: Re: Simulation and actual FPGA implementation, how different it is?
From: "Simon Peacock" <nowhere@to.be.found>
Date: Thu, 14 Apr 2005 20:30:41 +1200
Links: << >> << T >> << A >>

One thing simulation isn't good at is creating random inputs..for example...
I've been working on a telephone port.. and the FPGA simulation is good..
but there are other chips, and they didn't always function as expected.
This caused the real FPGA to lock up or and do strange, unexpected things
also (accidentally) pins weren't locked down by the original designer so
some features were by accident rather than by design.

The simulator also won't pick up metastability issues...  I had that one
byte me too

But a successful simulation is a milestone.  I've taken a simulation to a
working prototype PCB in less than a week.. Mind you .... I've spent the
last 2 weeks fixing up "unexpected" glitches.. not to do with the FPGA.. but
due to real world timings when the FPGA interacts with the outside world but
the board did work exactly as expected.

Simon


"Ankit Raizada" <ankit.raizada@gmail.com> wrote in message
news:1113393997.772874.97950@l41g2000cwc.googlegroups.com...
> I am just wondering if i simulate a design given in verilog using a
> test fixure in a modern simulator like ModelSim and the outputs are
> verified, what are the chances that the design will still not work in
> the actual FPGA assuming it fits and Place and Route is successful.
>
> What are the factors that make this difference and how can i catch them
> in the design cycle.
>
> I am actually creating few designs for DSP algos for my acadmic
> project, and being a beginnner in this whole DSP over FPGA I find it
> rather difficult to decide wather to call a successful simulation a
> milestone in the design cycle or not.
>
> Please share your experiences and ideas on this
>

Article: 82562
Subject: Re: Embedded MicroBlaze solution
From: David <david.nospam@westcontrol.removethis.com>
Date: Thu, 14 Apr 2005 10:49:06 +0200
Links: << >> << T >> << A >>

On Wed, 13 Apr 2005 13:00:07 -0700, Shalin Sheth wrote:

Is this some sort of FAQ reply for people who want more speed from a
MicroBlaze?  I've never used a MicroBlaze or Xilinx (I use Nios II on
Altera chips), but it looks like you almost completely missed the OP's
point - he is not (yet !) interested in the quality of code generated by
the compiler, but is suffering from 24-cycle memory reads on the SDRAM. 
This is most likely a problem with the SDRAM controller or its setup. 
Perhaps you are getting a full bank + row + column select for every
access, although even then 24 cycles is way too long.  I don't know what
sort of tools Xilinx has, and how they compare to Altera's SOPC Builder,
but when I had trouble with my SDRAM (it took 2 cycles per access instead
of 1, during bursts), I tested with a simple setup of a Nios II running
from internal FPGA memory, a DMA component (to easily generate burst
sequences), and the SDRAM controller.  Using the debugger, I manually set
the DMA to burst read or write and used SignalTap (ChipScope on Xilinx?)
to view what was happening.  That way you are simplifying things as much
as possible to concentrate on the specific problem.

> Vladmir,
> 
> Interesting data point.  How much did his performance increase after 
> enabling caches?
> 
> First, check to make sure that you have compiler optimization enabled. 
> This does make a hugh difference in optimizing your software code (2-3x 
> in some instances).  I would suggest using the latest EDK 7.1 GNU 
> compiler here.
> 
> Second, in EDK 7.1 a new MCH_OPB_SDRAM memory controller was released 
> that connects to the Xilinx CacheLink interface of MicroBlaze v4.0. 
> This also greatly improves performance when using caches.
> 
> Finally, you may want to use tools like xil_profile to see where the 
> processor is spending a lot of its time.  You may be able to improve the 
> performance by enabling hardware features such as multiplier, divider or 
> barrel shifter.
> 
> Cheers,
> Shalin-
> 

> v_mirgorodsky@yahoo.com wrote:
>> Hi, ALL!
>> 
>> Recently one of my friends faced very strange problem. He had the
>> MicroBlaze CPU in his design running with 50MHz clock speed. He also
>> had external SDRAM module and his application was executing out of
>> external SDRAM memory. During first few benchmark tests he realized
>> that it takes about 24 clock cycles to access memory :(  This means
>> that cool embedded 50MHz MicroBlaze CPU runs slower than poor external
>> 8MHz AVR. After my advice he enabled the cache within MicroBlaze, but
>> application execution speed did not increased significantly.
>> 
>> As he described later, this was one of hand-on samples from EDK. May be
>> the sample is not optimized for performance and very simplified, but
>> net performance of 2MHz processor is not even close to advertised by
>> Xilinx :(
>> 
>> Could any one give any comment on that?
>> 
>> Regards,
>> Vladimir S. Mirgorodsky
>>

Article: 82563
Subject: Verilog problems with SelectRAM clocking within a finite state machine
From: stuartgalt73@yahoo.com.au (StuartG)
Date: 14 Apr 2005 02:30:07 -0700
Links: << >> << T >> << A >>

I'm quite new to FPGA/Verilog and I'm not sure if this is the correct
news group to use for this kind of posting - appologies if I've posted
to the wrong place.

Anyway, I'm having problems using memory within a FSM. I'm currently
using Xilinx ISE for a VirtexII. I'm trying to use the RAMB16_S18
memory primitive (SelectRAM).

I've written a short test which writes the sequence 0, 1, 2, 3 ....
14, 15 to the RAM in one state, then in another it reads it back.
However, I read back the memory as 15, 0, 1, 2 .... 13, 14. I'm
assuming that the 15 in the first read-back element is from the
previous cycle and hence the whole lot is offset due to a clocking
issue.

I've truncated the code and copied below:

// Buffer clock for the ram
wire Raw_Data_Profile_CLK;
BUFG Raw_Data_Profile_CLK_Buffer(.I(Main_Clock),
.O(Raw_Data_Profile_CLK));

// Setup the RAM
reg  [9:0]  	Data_Address;
reg  [15:0] 	Data_In;
reg  [2:0]  	Data_In_Parity;
wire [15:0] 	Data_Out;
wire [2:0]  	Data_Out_Parity;
reg         		WE;
RAMB16_S18 Data_RAM ( .DI(   Data_In),
                      .DIP(  Data_In_Parity),
                      .ADDR( Data_Address),
                      .EN(   1'b1),
                      .WE(   WE),   
                      .SSR(  1'b0),
                      .CLK(  CLK),
                      .DO(   Data_Out),
                      .DOP(  Data_Out_Parity) );

// Update the Next State for the finite state machine
reg [1:0] Current_State, Next_State;
always @(posedge Main_Clock)
begin
  Current_State <= Next_State;
end

// Implement the Finite State Machine
reg [4:0] Test_Counter;
always @(posedge Main_Clock)
begin
  case(Current_State)
    Test_State_1:
      Begin
      WE <= 1'b1;  // Enable Writes to the Memory   
      Data_Address <= Test_Counter;  // Select the address (ie 0, 1,
2, 3 etc)
      Data_In <= Test_Counter;  // Fill the mem location with 0, 1, 2,
3 etc
      Test_Counter <= Test_Counter + 1; // Increment the counter
      If(Test_Counter >= 15)
        Begin
        Next_State <= Test_State_2; // Goto next state
        Test_Counter <= 0;
        End
      Else
        Next_State <= Test_State_1; // Stay in this state
      End

      Test_State_2:
        Begin
        WE <= 1'b0;  // Enable reads to the Memory   
        Data_Address <= Test_Counter;  // Select the address (ie 0, 1,
2 etc)
        Output <= Data_Out;  // Retrieve the data from the memory
        Test_Counter <= Test_Counter + 1; // Increment the counter
        If(Test_Counter >= 15)
	  Begin
          Next_State <= Test_State_1; // Goto next state
          Test_Counter <= 0;
          End
        Else
          Next_State <= Test_State_2; // Stay in this state
        End
end

Article: 82564
Subject: tools used for ASIC synthesis
From: "teen" <nkishorebabu123@yahoo.com>
Date: 14 Apr 2005 03:05:21 -0700
Links: << >> << T >> << A >>

Hai all,
         can you plz let me know the different tools used in industry
for ASIC SYNTHESIS

regards,
kishore

Article: 82565
Subject: Fitting functionality in an XC2VP30 FPGA.
From: simon.stockton@baesystems.com (stockton)
Date: 14 Apr 2005 03:11:34 -0700
Links: << >> << T >> << A >>

Dear All,

I am trying to establish if I can fit the following functionality into
a single FPGA.

I have FPGA resource utilisation statistics (from the ISE tool MAP
process) for four functional blocks. I also have the statistics on the
number of available resources on the XC2VP30 FPGA.

My question is can I use the individual MAP reports to accuratly
estimate if my four seperate functional blocks will fit in the XC2VP30
FPGA?

                 XC2VP30  Block A  Block B  Block C  Block D  Total

Slices           13,696   4,248    2,771    848      5,370    13,237
   Flip-Flops    27,392   5,056    3,406    95       4,888    13,445
   4-Input LUTs  27,392   5,885    3,036    1,581    8,281    18,782

Since each of the four blocks were 'compiled' (MAP reports generated
in the ISE tool) individually no slice contains un-related logic.

The other FPGA resources (DCMs, GCLKs, PPCs etc ... ) are all under
utilised.

My understanding is that since the 'Total' number of slices used is
less than the number of slices available on the 'XC2VP30' no slice
will have to contain un-related logic so my four blocks (Block A - D)
will fit inside my XC2VP30 FPGA.

Is this correct or have I made some critical assumptions regarding
combining functionality within the FPGA and regarding timing aspects?

Regards

Simon

Article: 82566
Subject: Re: Fitting functionality in an XC2VP30 FPGA.
From: "Antti Lukats" <antti@openchip.org>
Date: Thu, 14 Apr 2005 12:33:57 +0200
Links: << >> << T >> << A >>

Hi Simon,

if decvice total is 13.696 and A+B+C+D total is 13.237 then I am 99%
positive that combining A+B+C+D in single design ABCD will cause problems.
Unless of course large parts of A,B,C or D are optimized away when combined.
Hm, I take the 99.8% back, lets say I am 80% sure you will have _some_ sort
(possible not related to # of slice resource used) of problems. The ABCD
slice useage can be lower (thats why I reduced the 99% sure to 80% sure)
than the A+B+C+D as the slice utilization ratio may be better but here it
depends how good the design fits and how good the tools really are.

the 'unrelated logic' is not what I you think it is, I think. Unrelated
means that the tools generated additional logic that was not in the original
design, in order to achive performance or routing or any other reason. So
its not directly bound to the slice utilization.

I am sometimes wrong (usually not). Anyway cases where I am wrong or my
guess is totally wrong interest me, so please post some results what
happened with ABCD in single design !

antti
http://gforge.openchip.org

whuuuuuuups!
I checked your domain name ;) well my advice (based on your numbers) is that
you should take larger FPGA (if the A, B, C, D can not be optimzed to use at
least 10% less resources) just to be prepared for in-field design change
that may cause the design to not fit any more.


"stockton" <simon.stockton@baesystems.com> schrieb im Newsbeitrag
news:dbcd481c.0504140211.42f75283@posting.google.com...
> Dear All,
>
> I am trying to establish if I can fit the following functionality into
> a single FPGA.
>
> I have FPGA resource utilisation statistics (from the ISE tool MAP
> process) for four functional blocks. I also have the statistics on the
> number of available resources on the XC2VP30 FPGA.
>
> My question is can I use the individual MAP reports to accuratly
> estimate if my four seperate functional blocks will fit in the XC2VP30
> FPGA?
>
>                  XC2VP30  Block A  Block B  Block C  Block D  Total
>
> Slices           13,696   4,248    2,771    848      5,370    13,237
>    Flip-Flops    27,392   5,056    3,406    95       4,888    13,445
>    4-Input LUTs  27,392   5,885    3,036    1,581    8,281    18,782
>
> Since each of the four blocks were 'compiled' (MAP reports generated
> in the ISE tool) individually no slice contains un-related logic.
>
> The other FPGA resources (DCMs, GCLKs, PPCs etc ... ) are all under
> utilised.
>
> My understanding is that since the 'Total' number of slices used is
> less than the number of slices available on the 'XC2VP30' no slice
> will have to contain un-related logic so my four blocks (Block A - D)
> will fit inside my XC2VP30 FPGA.
>
> Is this correct or have I made some critical assumptions regarding
> combining functionality within the FPGA and regarding timing aspects?
>
> Regards
>
> Simon

Article: 82567
Subject: re:Simualtion of Rocket I/O MGT in ModelSim XE
From: digitreaco@yahoo-dot-de.no-spam.invalid (digi)
Date: Thu, 14 Apr 2005 05:51:13 -0500
Links: << >> << T >> << A >>

> simulator such as ModelSim SE, ModelSim PE, Synopsys VCS, or
Cadence
> NC-Sim.
> 

For Modelsim PE you need to buy extra the swift models and in SE you
don't need that.

Article: 82568
Subject: Re: Embedded MicroBlaze solution
From: =?ISO-8859-1?Q?G=F6ran_Bilski?= <goran.bilski@xilinx.com>
Date: Thu, 14 Apr 2005 13:13:20 +0200
Links: << >> << T >> << A >>

Hi,

The SDRAM controller doesn't need 24 clock cycle for a single access.
It's more around 12 clock cycles.
But it seems that both the instruction and data interface on microblaze is 
connected to the same memory controller and that no internal memory is used.
So for a load instruction to execute, it will require two 12-clock cycles 
accesses. Store is done a few cycles faster and instruction that doesn't access 
memory should be 12 clock cycles.

Using LMB will reduce instruction fetches to 1 clock cycles and data accesses to 
2 clock cycles. That is the same latency as for cache hits.

It seems unusual that the usage of caches doesn't improve the performance.
It's to my knowledge always a big improvement compared to running from external 
memories specially SDRAM or DDR.

Fast SRAM will have much less latency.

In order to get cacheline burst access, the MCH_OPB_SDRAM controller should be 
used. It will do read and write burstlines both for instruction and data cache 
misses.

Göran Bilski


David wrote:
> On Wed, 13 Apr 2005 13:00:07 -0700, Shalin Sheth wrote:
> 
> Is this some sort of FAQ reply for people who want more speed from a
> MicroBlaze?  I've never used a MicroBlaze or Xilinx (I use Nios II on
> Altera chips), but it looks like you almost completely missed the OP's
> point - he is not (yet !) interested in the quality of code generated by
> the compiler, but is suffering from 24-cycle memory reads on the SDRAM. 
> This is most likely a problem with the SDRAM controller or its setup. 
> Perhaps you are getting a full bank + row + column select for every
> access, although even then 24 cycles is way too long.  I don't know what
> sort of tools Xilinx has, and how they compare to Altera's SOPC Builder,
> but when I had trouble with my SDRAM (it took 2 cycles per access instead
> of 1, during bursts), I tested with a simple setup of a Nios II running
> from internal FPGA memory, a DMA component (to easily generate burst
> sequences), and the SDRAM controller.  Using the debugger, I manually set
> the DMA to burst read or write and used SignalTap (ChipScope on Xilinx?)
> to view what was happening.  That way you are simplifying things as much
> as possible to concentrate on the specific problem.
> 
> 
> 
> 
> 
>>Vladmir,
>>
>>Interesting data point.  How much did his performance increase after 
>>enabling caches?
>>
>>First, check to make sure that you have compiler optimization enabled. 
>>This does make a hugh difference in optimizing your software code (2-3x 
>>in some instances).  I would suggest using the latest EDK 7.1 GNU 
>>compiler here.
>>
>>Second, in EDK 7.1 a new MCH_OPB_SDRAM memory controller was released 
>>that connects to the Xilinx CacheLink interface of MicroBlaze v4.0. 
>>This also greatly improves performance when using caches.
>>
>>Finally, you may want to use tools like xil_profile to see where the 
>>processor is spending a lot of its time.  You may be able to improve the 
>>performance by enabling hardware features such as multiplier, divider or 
>>barrel shifter.
>>
>>Cheers,
>>Shalin-
>>
> 
> 
> 
> 
> 
>>v_mirgorodsky@yahoo.com wrote:
>>
>>>Hi, ALL!
>>>
>>>Recently one of my friends faced very strange problem. He had the
>>>MicroBlaze CPU in his design running with 50MHz clock speed. He also
>>>had external SDRAM module and his application was executing out of
>>>external SDRAM memory. During first few benchmark tests he realized
>>>that it takes about 24 clock cycles to access memory :(  This means
>>>that cool embedded 50MHz MicroBlaze CPU runs slower than poor external
>>>8MHz AVR. After my advice he enabled the cache within MicroBlaze, but
>>>application execution speed did not increased significantly.
>>>
>>>As he described later, this was one of hand-on samples from EDK. May be
>>>the sample is not optimized for performance and very simplified, but
>>>net performance of 2MHz processor is not even close to advertised by
>>>Xilinx :(
>>>
>>>Could any one give any comment on that?
>>>
>>>Regards,
>>>Vladimir S. Mirgorodsky
>>>
> 
>

Article: 82569
Subject: Re: Reading old F2.1i schematics
From: Engineering Guy <whataloginsfor@os.pl>
Date: Thu, 14 Apr 2005 13:16:33 +0200
Links: << >> << T >> << A >>

lecroy7200@chek.com wrote:
>>>>I'm sure others have this problem ...
>>>>
>>>>Is there a tool that'll let one view and hopefully print a
> 
> schematic
> 
>>>>done in the old Xilinx F2.1i schematic tool?  The new stuff
> 
> doesn't
> 
>>>>want to know about the old stuff, and worse is that you can't even
>>>>install 2.1i on an XP machine. (Yeah, that'll teach me to
> 
> upgrade.)
> 
>>>>I don't want to do anything with this schematic other than view
> 
> it.
> 
>>>>I'm doing a new board sorta based on an old design, and the new
> 
> design
> 
>>>>will of course be in VHDL rather than as a schematic.
>>>>
>>>
>>>Aldec's tool Active-HDL has capability of importing the Foundation
>>>schematics and entire projects. The import utility not only allows
>>>printing, but also importing these files into their format,
> 
> maintain and
> 
>>>even convert into an HDL design that can be targeted for any
>>>family/device. They show this capability on their website:
>>>http://downloads.aldec.com/Previews/Presentations/IP_Core.html
>>>
>>
>>Oops sorry, wrong link:
>>
> 
> http://downloads.aldec.com/Previews/Presentations/Active-XE_Edition.html
> 
> I have this same problem.  The new Foundation can import back to
> version 4.  Because of a lawsuit between Aldec and Xilinx, they can not
> ship older versions of the Foundation tools.  The newest Aldec tools
> can't seem to import the 2.1 project.  However, I was able to import
> the version 2.1 into version 3.1 and then read the whole project with
> the new Aldec tools.  What a pain.
> 
They have a utility in the Tools menu that helps to convert the old 
pre-2.5 schematics to a format that is possible to import. As I remember 
it also allows to edit the schematics in .sch format before import.

Article: 82570
Subject: Re: Flowcharts and diagrams
From: Engineering Guy <whataloginsfor@os.pl>
Date: Thu, 14 Apr 2005 13:25:20 +0200
Links: << >> << T >> << A >>

Reinier wrote:

> Hi,
> 
> I'm looking for a freeware or low cost program do document and
> illustrate the signal processing flow in my FPGA design.  I'd like to
> use building blocks like adders, multipliers, memory, busses etc. What
> do you guys use to make some nice looking pictures? I don't want to
> spend days learning Corel Draw or something huge like that.
> 
> Thanks,
> Reinier

Never used, but heard of Dia:
	http://www.gnome.org/projects/dia/
It is claimed to be a Visio replacement. Its multi-platform and free.
Give it a try.

EG

Article: 82571
Subject: Re: Xilinx VIIPro power supplies
From: "Roger" <enquiries@rwconcepts.co.uk>
Date: Thu, 14 Apr 2005 12:38:43 GMT
Links: << >> << T >> << A >>

Thanks for your replies. I was never considering 1 regulator per RIO but 
having never built anything that uses RIO before I'm keen to know what is 
considered best practice. I was just asking for opinions on the need (or 
otherwise) for a RIO regulator and a second "everything else on 2.5V" 
regulator.

Looking at the recent replies, there seems to be some confusion. However as 
the UG seems to imply separate 2.5V regulators, maybe that's the way I 
should play it.

Thanks,

Roger

"jason.stubbs" <jason.stubbs@gmail.com> wrote in message 
news:1113425021.561172.85090@z14g2000cwz.googlegroups.com...
Extract from "The RocketIOT Transceiver User Guide UG024 (v2.5)
December 9, 2004"

"PCB Design Requirements (Page 109)

To operate properly, the RocketIO transceiver requires a certain level
of noise isolation
from surrounding noise sources. For this reason, it is required that
both dedicated voltage
regulators and passive high-frequency filtering be used to power the
RocketIO circuitry."

If you dont use the RIO's you still have to supply power, but you can
use the VCCAUX supply in this case.

Hope this helps clarify the situation

Jason

Article: 82572
Subject: Re: virtex4 reconfiguration time
From: Stephane <stephane@nospam.fr>
Date: Thu, 14 Apr 2005 14:57:24 +0200
Links: << >> << T >> << A >>

Antti Lukats wrote:
> "Marc Randolph" <mrand@my-deja.com> schrieb im Newsbeitrag
> news:1113444520.108760.133130@l41g2000cwc.googlegroups.com...
> 
>>Antti Lukats wrote:
>>
>>>"Stephane" <stephane@nospam.fr> schrieb im Newsbeitrag
>>>news:d3jggu$kpo$1@ellebore.extra.cea.fr...
>>>
>>>>Antti Lukats wrote:
>>>>
>>>>>"Stephane" <stephane@nospam.fr> schrieb im Newsbeitrag
>>>>>news:d3j43r$e32$1@ellebore.extra.cea.fr...
>>>>>
>>>>
>>>>I don't agree with you: here are the 32 configuration data bits:
>>>>
>>>>PAD209 X27Y127     IOB_X1Y127 F14 1 IO_L1P_D31_LC_1
>>>
>>>those are Local Clock, the SelectMAP is 8 bit wide !!!!
>>
>>Actually the OP is correct - that IS supposed to be a 32-bit SelectMAP
>>interface... the ug075.pdf pinout document discusses it briefly.  I
>>don't blame everone for being confused about it though - Xilinx makes
>>just enough mention of it that you wonder if it might work, but when I
>>asked my trusty FAE about it a few months ago, he said it is not
>>supported at this time.
> 
> 
> wopla! I did see the paramter of bus width on the ICAP V4, but in ALL DOCs
> the selectmap is defined as 8 bit, that is on ALL DOCs except the pinouts
> docs!
> 

Thank you guys for your feedback!

I am to understand that a 32b selectMap is reserved for future use, when 
  7.1i will be stable, and xilinx engineers more available...

Ok, but how can the internal conf logic detect what is the kind of 
incoming bistream? As soon as the syncro words? In that case, one can 
not place any garbage on D[31..8], as they might be badly interpreted!

Actually, I was puzzled by this recent xilinx answer:

7.1i ECS - Bus width of pin I and O is incorrect in symbol ICAP_VIRTEX4

  Family: Software
  Product Line: FPGA Implementation
  Part: ECS
  Version: 	 Record Number: 20920
  Last Modified: 03/23/05 08:27:54
  Status: Active

   	Problem Description:

Keywords: input, output, icap, 32, 8

Urgency: Standard

General Description:
In the Xilinx Schematic Editor, the ICAP_VIRTEX4 symbol has an I and O 
pin with a bus width of 8. The width should be 32.

Solution 1:

This problem has been fixed in the latest 7.1i Service Pack available at:
http://support.xilinx.com/xlnx/xil_sw_updates_home.jsp
The first service pack containing the fix is 7.1i Service Pack 1.

Article: 82573
Subject: Re: PPC405 Performance Monitoring
From: Anthony Mahar <amahar@vt.edu>
Date: Thu, 14 Apr 2005 09:07:55 -0400
Links: << >> << T >> << A >>

As they state in their paper 
http://cag.csail.mit.edu/warfp2005/submissions/29-suh.pdf

"In our initial study, we deploy a monitoring capsule in Dcaches to mon-
itor the memory behavior between a CPU and L1 Dcache."

It is not possible to monitor signals between the CPU and L1 cache (I or 
D).  Was the monitoring of CPU/L1 inferred by the cache misses seen 
coming from L1?  Even so, a lot of memory behavior is missed when only 
observing cache misses.

Regards,
Tony

Nju Njoroge wrote:
> Anthony Mahar wrote:
> 
>>Nju Njoroge wrote:
>>
>>Interesting question for the "Monitoring Capsule Design" paper...
> 
> they
> 
>>state they monitor behavior "between the CPU and L1 Dcache."  Did
> 
> they
> 
>>explain how they were able to do this, since the PPC405 and L1 are
> 
> part
> 
>>of the same hard core?
>>
> 
> You are right--the CPU and the L1 cache are in the same hard core, so
> we don't have access to the interface inside the CPU core and the
> cache. As I described in my previous post, they placed their monitor at
> the interface of the L1 cache port that are usually connected to the
> PLB. Thus, instead of connecting their CPU to the PLB bus, they
> connected the PPC core to their monitor, which is then connected to the
> PLB.
> 
> NN
>

Article: 82574
Subject: Re: Xilinx VIIPro power supplies
From: "jason.stubbs" <jason.stubbs@gmail.com>
Date: 14 Apr 2005 06:11:09 -0700
Links: << >> << T >> << A >>

Roger,

The way I understood it, and therefore implemented it was to use a
single linear regulator (LT1963) to power all of the RIO on the FPGA.
If the LR is capable of supplying more than one FPGA's RIO circuitry,
then that is acceptable.  As long as all of the RIO supply pins are
individually filtered with ferrite beads (and caps when they are not
embedded), this should work.  Under no circumstances should a switching
regulator be used to power the RIO.  Also, do not use the same LR that
powers RIO to power the internal logic or IO of the FPGA.

As you said in your earlier post, a linear reg for RIO, and a seperate
reg for everything else.

Regards

Jason

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search