Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search

Messages from 142550

Article: 142550
Subject: Re: Soft Processor IP core report
From: Herbert Kleebauer <klee@unibwm.de>
Date: Sun, 16 Aug 2009 22:58:24 +0200
Links: << >>  << T >>  << A >>
Nico Coesel wrote:
> Herbert Kleebauer <klee@unibwm.de> wrote:

> >system and the system tools (like dir, copy, format, tasklist,
> >kill, ...) fits in 64 kbyte ROM. But it all was written in assembly
> >an not in a HLL.
> 
> That would take ages. 

It took two students and a few month to design and implement 
the complete hardware (graphics-, serial-, parallel- and keboard-
card) at gate level with schematic entry (no VHDL) and write
the complete multitasking OS.

> Last week I was investigating whether I could
> take some load from a CPU into an FPGA. Ofcourse I didn't want to
> rewrite some complicated algorithm with assembly language.

That depends on the instruction set. It's really a joy to program
a 68k in assembly but I would get crazy if I had to do this with
the ARM instruction set. And the XPROZ has a real assembly programmer
friendly instruction set.

 
> Besides, a modern C compiler produces faster or more compact code
> (depending on the optmisation settings) than an assembly programmer.

That may be true for complex CPU architectures and average programmers
but surely not for extremely simple FPGA-CPU designs.

Article: 142551
Subject: Re: Soft Processor IP core report
From: nico@puntnl.niks (Nico Coesel)
Date: Sun, 16 Aug 2009 21:49:06 GMT
Links: << >>  << T >>  << A >>
Herbert Kleebauer <klee@unibwm.de> wrote:

>Nico Coesel wrote:
>> Herbert Kleebauer <klee@unibwm.de> wrote:
>
>> >system and the system tools (like dir, copy, format, tasklist,
>> >kill, ...) fits in 64 kbyte ROM. But it all was written in assembly
>> >an not in a HLL.
>> 
>> That would take ages. 
>
>It took two students and a few month to design and implement 
>the complete hardware (graphics-, serial-, parallel- and keboard-
>card) at gate level with schematic entry (no VHDL) and write
>the complete multitasking OS.
> 
>> Besides, a modern C compiler produces faster or more compact code
>> (depending on the optmisation settings) than an assembly programmer.
>
>That may be true for complex CPU architectures and average programmers
>but surely not for extremely simple FPGA-CPU designs.

That also depends on the compiler. An 8051 is a relative simple
architecture so the compiler must work very hard to get compact code.
There are several commercial 8051 compilers that do an excellent job
and produce code that is hard to improve by hand.

-- 
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
                     "If it doesn't fit, use a bigger hammer!"
--------------------------------------------------------------

Article: 142552
Subject: ANNC: Parallel flash programming using boundary-scan
From: zgora <skswrus@gmail.com>
Date: Sun, 16 Aug 2009 15:11:12 -0700 (PDT)
Links: << >>  << T >>  << A >>
Hi,

For those who interested in programming of parallel NOR flash memories
via JTAG/boundary-scan, please have a look at our TopJTAG Flash
Programmer software.

http://www.topjtag.com/flash-programmer/

The benefit of using boundary-scan to program flash memories is that
there is no dependency on what device is connected to flash memory. It
could be any JTAG (IEEE 1149.1) compliant chip, i.e. most CPLDs,
FPGAs, microcontrollers, CPUs, etc. There is no dependency on logic
inside the chip.

In some cases, for example, when flash is connected to a CPLD, using
boundary-scan is the easiest (or the only) option to program the
flash.

The disadvantage of boundary-scan method is that it=92s quite slow in
comparison to target assisted programming. However, again, it=92s
sometimes the easiest or the only option.

Cheers,
Sergey Katsyuba
http://www.topjtag.com

Article: 142553
Subject: Re: Soft Processor IP core report
From: "Antti.Lukats@googlemail.com" <antti.lukats@googlemail.com>
Date: Sun, 16 Aug 2009 22:57:27 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Aug 16, 11:58=A0pm, Herbert Kleebauer <k...@unibwm.de> wrote:
> "Antti.Luk...@googlemail.com" wrote:
> > well, if you only have 1 Block RAM (1KByte or even just 512 byte), how
> > much of XProz3
> > code would fit? If there is a lot of memory, then there is also
> > usually more resources
> > to include little bit bigger processor than XProz(3)
>
> You always have to compare the total cost. RAM is in opposite to
> logic and routing resources a very compact structure. How many
> flip-flops and gates can you add to a FPGA-CPU design until it consumes
> more chip area than an additional RAM block? Define an concrete
> application then we can compare the logic count for the CPU
> and the necessary RAM size.
>
> > And, yes it makes sense to use HLL, as my AVR HLL compiler is able to
> > produce
> > code more dense than AVR assembler (it really is!).
> > ..
>
> Funny, and I thought you are a serious person. Can you show me
> the code?
>
> But now we can prove that any program for the AVR can be
> reduced to size zero:
>
> 1. write the program in assembly
> 2. rewrite it in C and it will be at least one byte smaller
> 3. disassemble the binary to get an assembly program of the
> =A0 =A0same size as the C binary
> 4. goto 2.

Dear Herbert,

I am a very serious person.

I understand that you have not that deep knowledge as I, so I explain
why my AVR
compiler was/is able to produce better code then assembler.

an assembler uses normally 1 or 2 passes
my compiler uses many passes for branch optimization, the number of
passes used is not fixed.

so if you write an AVR program in Assembler
then assembler that converts the code is not able to figure out
shortest branch types for all jumps/branches - this just isnt possible
withing 2 passes
my compiler uses what he thinks the branch types should be, compiles
the binary, then checks if there are places where jumps can be
optimized, does the optimizations, checks the code if there are no
branches "out of ranges" fixes, and optimizes, doing iterations, up to
20 passes or when no more optimizations are possible and fixes are
needed.

of course, taking the output of my compiler, converting it to
assembler, and assemblying THAT code would yield same amount of binary
code, yes,

but a program just written in assembly, and just assembled may be
larger than that done with my compiler (if the original code has
branches that can be optimized)

see, there is explanation for the "smaller" code. much technical. not
funny.

**
you would be crazy to code in ARM assembly?
why? I did optimize a SPI flash bootloader for ARM, the resulting code
occupied 92 bytes of ARM code.
ok, cheating a little, yes of course I used THUMB mode, in ARM mode
the code would have been larger.

also those who are not crazy DO write assembler for ARM
and use HLL for processors with 512 instruction space.

and both uses make sense.

Antti




























Article: 142554
Subject: Re: Soft Processor IP core report
From: Frank Buss <fb@frank-buss.de>
Date: Mon, 17 Aug 2009 08:25:27 +0200
Links: << >>  << T >>  << A >>
Antti.Lukats@googlemail.com wrote:

> so if you write an AVR program in Assembler
> then assembler that converts the code is not able to figure out
> shortest branch types for all jumps/branches - this just isnt possible
> withing 2 passes
> my compiler uses what he thinks the branch types should be, compiles
> the binary, then checks if there are places where jumps can be
> optimized, does the optimizations, checks the code if there are no
> branches "out of ranges" fixes, and optimizes, doing iterations, up to
> 20 passes or when no more optimizations are possible and fixes are
> needed.

How much percent do you save with this concept? And I guess good assemblers
does this, too. BTW: this is an interesting article on this topic:

http://coding.derkeiler.com/Archive/Assembler/alt.lang.asm/2006-11/msg00216.html

It is interesting to see that none of both common algorithms "start with
all small branches and fix the branches which need to be longer" and "start
with all long branches and adjust all possible branches to small ones" are
perfect, even with 20 passes. The article says, that it is a NP complete
problem, so for an optimal solution you need 2^n passes, where n is the
number of branches. Has anyone a link to the mathematically proof?

But who cares? Except for pathologically cases I think the approach
starting with small branches would lead to nearly perfect code, maybe 0.1%
longer than perfect :-) But of course, you'll need many passes, not just 2.

-- 
Frank Buss, fb@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

Article: 142555
Subject: Re: BCD in FPGA
From: backhus <goouse@twinmail.de>
Date: Sun, 16 Aug 2009 23:28:11 -0700 (PDT)
Links: << >>  << T >>  << A >>
Hi Andrew,
first of all, the second approach doesn't handle BCD at all. Just
simple binary.
Your first module is a single BCD adder that can be used for wider
numbers by cascading the adders.
The edn paper describes an approach to add wider BCD numbers in a
single module, using two adders.
One simply adds up the numbers in binary style , the other one adds 6
to the result as you did in your module with the line : assign {cout,
s} =3D unadj<10? unadj : unadj+6;
Then follows a special network of multiplexors to choose the right
result of either the first or the second adder.
The advantage is that the adders can work straight over all bits, and
this makes them use the carry chains. Your first module cant do that
because of the BCD-correction.
Your module Test doesn't have any correction circuit, and misses the
second adder (for 8 bits: +66(dez).)

a little hint (uncomplete):

 module Test2(
 =A0input =A0[7:0] a,
 =A0input =A0[7:0] b,
 =A0output [7:0] s
 );
  wire yb;
  wire yc;
 =A0assign yb =3D a + b;
 =A0assign yc =A0=3D yb + 8b01100110;
#
#  add multiplexor structure here to generate output s from yb and yc
#
 endmodule

Have a nice synthesis
  Eilert

Article: 142556
Subject: Re: Xilinx 3E design programs fine with 500E but fails with 250E
From: Steve <srkh28@gmail.com>
Date: Sun, 16 Aug 2009 23:32:06 -0700 (PDT)
Links: << >>  << T >>  << A >>


Hi All,



We have solved this problem to a functional degree. Thanks to all for
your help. Of 56 production boards, we were able to accurately figure
out our device short falls and produce a work around solution. Some
common facts we now know about the XIlinx 250E and 500E FPGA's.



1. The problem was not associated with purchasing the device from a
non-authorised vendor or device stepping

2. The Xilinx Spartan 500E works as advertised

3. The Xilinx Spartan 250E does not work as advertised (the supplied
SPI flash programming core with ISE 10 and later is precariously
flawed)

4. Using the supplied SPI core with 250E's results in corrupt code
being written to flash causing excess (950mA+) current to be consumed
by VCCint (1V2) and the device failing to execute downloads. We dont
know what is consuming the excess current, but speculate the cause to
be from high speed boot retries of the corrupt code. As we have no
information regarding this core, we can only speculate on what our CRO
and analyser tell us.

5. Behaviour is 100% reliably repeatable.

6. All 250E boards perform correctly and pass function and performance
tests when used with tools other than Xilinx Impact and their SPI
programming core



For those stuck with the 250E and a serial SPI flash, the only
solution to program the attached SPI flash is to use Picoblaze, or
program the SPI device independently.





steve



Article: 142557
Subject: Re: Soft Processor IP core report
From: David Brown <david@westcontrol.removethisbit.com>
Date: Mon, 17 Aug 2009 08:42:53 +0200
Links: << >>  << T >>  << A >>
Herbert Kleebauer wrote:
> "Antti.Lukats@googlemail.com" wrote:

>> And, yes it makes sense to use HLL, as my AVR HLL compiler is able to
>> produce
>> code more dense than AVR assembler (it really is!).
>> ..
> 
> Funny, and I thought you are a serious person. Can you show me
> the code?
> 
> But now we can prove that any program for the AVR can be
> reduced to size zero:
> 
> 1. write the program in assembly
> 2. rewrite it in C and it will be at least one byte smaller
> 3. disassemble the binary to get an assembly program of the
>    same size as the C binary
> 4. goto 2.

I have no experience of Antti's compiler (I don't even know which HLL it 
uses), but I have lots of experience with assembly and C, on different 
architectures and with different compilers, and lots of experience with 
the "C vs. assembly" debates (they pop up regularly on comp.arch.embedded).

Clearly no compiler can ever produce better code than /could/ be written 
in assembler - your little reduction ad absurdum proves that.  But it is 
undoubtedly the case that for some compiler+target combinations, the 
compiler can generate better code than /would/ be written in assembler. 
  The key difference here is that no one writes the fastest or most 
compact possible assemble code - code maintenance and legibility, and 
speed of writing and debugging the code ensure that.

To give a simple example, imagine coding the expression "y = x * k" 
where k is a compile-time constant which will vary from build to build, 
and the target has no hardware multiply instruction.  When you 
re-compile with different values of k, the compiler will generate a 
fairly optimal series of shifts and adds depending on the value of k. 
If you are writing in assembly, you might check for a few special cases 
for k, but otherwise you'll use a general multiplication routine because 
anything else would be hideous to write and maintain.

I've seen plenty of cases where assembly programs take advantage of the 
target architecture in ways that don't fit well in the C model.  In such 
cases, the assembly code can be several times smaller or faster than the 
compiler-generated code.  I've also seen plenty of target/compiler 
combinations that give poor code.  But I've also seen plenty of cases 
when a C compiler generates smarter code than I had thought of, and 
plenty of cases when assembly programs have been re-written in C with 
the result of being smaller and faster (or more typically, more features 
for the same size and speed) because it's easier to write better 
algorithms and structures and let the compiler figure out the details.

Article: 142558
Subject: Re: Soft Processor IP core report
From: "Antti.Lukats@googlemail.com" <antti.lukats@googlemail.com>
Date: Mon, 17 Aug 2009 00:12:58 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Aug 17, 9:42=A0am, David Brown <da...@westcontrol.removethisbit.com>
wrote:
> Herbert Kleebauer wrote:
> > "Antti.Luk...@googlemail.com" wrote:
> >> And, yes it makes sense to use HLL, as my AVR HLL compiler is able to
> >> produce
> >> code more dense than AVR assembler (it really is!).
> >> ..
>
> > Funny, and I thought you are a serious person. Can you show me
> > the code?
>
> > But now we can prove that any program for the AVR can be
> > reduced to size zero:
>
> > 1. write the program in assembly
> > 2. rewrite it in C and it will be at least one byte smaller
> > 3. disassemble the binary to get an assembly program of the
> > =A0 =A0same size as the C binary
> > 4. goto 2.
>
> I have no experience of Antti's compiler (I don't even know which HLL it
> uses), but I have lots of experience with assembly and C, on different
> architectures and with different compilers, and lots of experience with
> the "C vs. assembly" debates (they pop up regularly on comp.arch.embedded=
).
>
> Clearly no compiler can ever produce better code than /could/ be written
> in assembler - your little reduction ad absurdum proves that. =A0But it i=
s
> undoubtedly the case that for some compiler+target combinations, the
> compiler can generate better code than /would/ be written in assembler.
> =A0 The key difference here is that no one writes the fastest or most
> compact possible assemble code - code maintenance and legibility, and
> speed of writing and debugging the code ensure that.
>
> To give a simple example, imagine coding the expression "y =3D x * k"
> where k is a compile-time constant which will vary from build to build,
> and the target has no hardware multiply instruction. =A0When you
> re-compile with different values of k, the compiler will generate a
> fairly optimal series of shifts and adds depending on the value of k.
> If you are writing in assembly, you might check for a few special cases
> for k, but otherwise you'll use a general multiplication routine because
> anything else would be hideous to write and maintain.
>
> I've seen plenty of cases where assembly programs take advantage of the
> target architecture in ways that don't fit well in the C model. =A0In suc=
h
> cases, the assembly code can be several times smaller or faster than the
> compiler-generated code. =A0I've also seen plenty of target/compiler
> combinations that give poor code. =A0But I've also seen plenty of cases
> when a C compiler generates smarter code than I had thought of, and
> plenty of cases when assembly programs have been re-written in C with
> the result of being smaller and faster (or more typically, more features
> for the same size and speed) because it's easier to write better
> algorithms and structures and let the compiler figure out the details.- H=
ide quoted text -
>
> - Show quoted text -

David

I did write my compiler in 1997
AVR assembler in 1997 and most likely Atmel AVR assembler in 2009 does
NOT do branch optimization.
my compiler was initially targetting AT90S1200 (512 instructions
space), it DOES know ALL the low
level details of AVR architecture and USES all features offered byt
the AVR, and I tried very hard to
make the compiler as good as possible, of course the branch
optimization is not always possible
to solve in optimal way, but sometimes even a few instructions saved
do count. In 2009 maybe
the value of that is nil, but in 1997 where the code size of flash
MCU's was still small (or expensive)
I did see a value in that optimization as well.

and of course, it is always possible to write asm so that compiler
cant do better, no doubt,
but also assembler written "just by writing" may have small chance
that a smart compiler
would have done better (because fo the time spend by the compiler
developer).

writing that compiler did
* take 2+ years of my life
* earned me some 20KUSD
* almost costed me my sanity :(

the source code of the compiler last version (and IDE) was
unfortunatly lost in fire (PC and hard disk damaged).
I have resurreccted some older revision source code of the command
line version and use it internally for many projects.

Antti


Article: 142559
Subject: Re: BCD in FPGA
From: Andrew Holme <ajholme@hotmail.com>
Date: Mon, 17 Aug 2009 02:41:47 -0700 (PDT)
Links: << >>  << T >>  << A >>
On 17 Aug, 07:28, backhus <goo...@twinmail.de> wrote:
> Hi Andrew,
> first of all, the second approach doesn't handle BCD at all. Just
> simple binary.
> Your first module is a single BCD adder that can be used for wider
> numbers by cascading the adders.
> The edn paper describes an approach to add wider BCD numbers in a
> single module, using two adders.
> One simply adds up the numbers in binary style , the other one adds 6
> to the result as you did in your module with the line : assign {cout,
> s} =3D unadj<10? unadj : unadj+6;
> Then follows a special network of multiplexors to choose the right
> result of either the first or the second adder.
> The advantage is that the adders can work straight over all bits, and
> this makes them use the carry chains. Your first module cant do that
> because of the BCD-correction.
> Your module Test doesn't have any correction circuit, and misses the
> second adder (for 8 bits: +66(dez).)
>
> a little hint (uncomplete):
>
> =A0module Test2(
> =A0=A0input =A0[7:0] a,
> =A0=A0input =A0[7:0] b,
> =A0=A0output [7:0] s
> =A0);
> =A0 wire yb;
> =A0 wire yc;
> =A0=A0assign yb =3D a + b;
> =A0=A0assign yc =A0=3D yb + 8b01100110;
> #
> # =A0add multiplexor structure here to generate output s from yb and yc
> #
> =A0endmodule
>
> Have a nice synthesis
> =A0 Eilert

The "second approach" was not supposed to be a BCD adder!  It's just
an attempt to persuade the Xilinx tools to use the YB port of a
Spartan 3 SLICE.

The binary adders in the EDN schematic have intermediate carry outputs
C4, C8, C12 and D4, D8, D12 e.t.c.  Their method relies on getting
access to the carry chain at these points.


Article: 142560
Subject: Re: Soft Processor IP core report
From: Herbert Kleebauer <klee@unibwm.de>
Date: Mon, 17 Aug 2009 13:19:30 +0200
Links: << >>  << T >>  << A >>
"Antti.Lukats@googlemail.com" wrote:
> On Aug 16, 11:58 pm, Herbert Kleebauer <k...@unibwm.de> wrote:

> > > And, yes it makes sense to use HLL, as my AVR HLL compiler is able to produce
> > > code more dense than AVR assembler (it really is!).
> > > ..
> >
> > Funny, and I thought you are a serious person. Can you show me
> > the code?

> I am a very serious person.
> 
> I understand that you have not that deep knowledge as I, so I explain
> why my AVR
> compiler was/is able to produce better code then assembler.
> 
> an assembler uses normally 1 or 2 passes
> my compiler uses many passes for branch optimization, the number of
> passes used is not fixed.

The AVR doesn't have branch instructions which could be optimized
in a multi pass compilation. There are 16 conditional branches
with a 7 bit offset, an unconditional branch with a 12 bit offset,
a direct jump with a 22 bit address and two indirect jumps. 

The only way to do some optimization is by moving code blocks,
which, at least for small flash sizes, is better done by hand
than by a compiler.

1111 00## #### #000  bcs.b      label               ; C=1
1111 00## #### #000  blo.b      label               ; C=1
1111 00## #### #001  beq.b      label               ; Z=1
1111 00## #### #010  bmi.b      label               ; N=1
1111 00## #### #011  bvs.b      label               ; V=1
1111 00## #### #100  blt.b      label               ; S=(N eor V) = 1
1111 00## #### #101  bhcs.b     label               ; H=1
1111 00## #### #110  bts.b      label               ; T=1
1111 00## #### #111  bis.b      label               ; I=1
1111 01## #### #000  bcc.b      label               ; c=0
1111 01## #### #000  bhs.b      label               ; C=0
1111 01## #### #001  bne.b      label               ; Z=0
1111 01## #### #010  bpl.b      label               ; N=0
1111 01## #### #011  bvc.b      label               ; V=0
1111 01## #### #100  bge.b      label               ; S=(N eor V) = 0
1111 01## #### #101  bhcc.b     label               ; H=0
1111 01## #### #110  btc.b      label               ; T=0
1111 01## #### #111  bic.b      label               ; I=0

1100 #### #### ####  br.w       label

1001 010# #### 110#  jmp.l      label
#### #### #### ####

1001 0100 0000 1001  jmp.w      (r31|r30)
1001 0100 0001 1001  jmp.l      (*|r31|r30)

 



> **
> you would be crazy to code in ARM assembly?

I'm just reading the ARM manual. I thought nothing could be crazier
than the AVR32 instruction set, but the ARM instruction set surely is.


> also those who are not crazy DO write assembler for ARM

I didn't say that only crazy people write assembly code for the ARM.
I said, if you have to write ARM assembly code for a longer time,
you will become crazy.

Article: 142561
Subject: Re: Soft Processor IP core report
From: David Brown <david@westcontrol.removethisbit.com>
Date: Mon, 17 Aug 2009 14:15:54 +0200
Links: << >>  << T >>  << A >>
Herbert Kleebauer wrote:
> "Antti.Lukats@googlemail.com" wrote:
>> On Aug 16, 11:58 pm, Herbert Kleebauer <k...@unibwm.de> wrote:
> 
>>>> And, yes it makes sense to use HLL, as my AVR HLL compiler is able to produce
>>>> code more dense than AVR assembler (it really is!).
>>>> ..
>>> Funny, and I thought you are a serious person. Can you show me
>>> the code?
> 
>> I am a very serious person.
>>
>> I understand that you have not that deep knowledge as I, so I explain
>> why my AVR
>> compiler was/is able to produce better code then assembler.
>>
>> an assembler uses normally 1 or 2 passes
>> my compiler uses many passes for branch optimization, the number of
>> passes used is not fixed.
> 
> The AVR doesn't have branch instructions which could be optimized
> in a multi pass compilation. There are 16 conditional branches
> with a 7 bit offset, an unconditional branch with a 12 bit offset,
> a direct jump with a 22 bit address and two indirect jumps. 
> 
> The only way to do some optimization is by moving code blocks,
> which, at least for small flash sizes, is better done by hand
> than by a compiler.
> 

It's a minor point, leading to very marginal differences in code 
quality, but it /is/ possible to do some optimisation for very large 
functions, or if you are using inter-procedural optimisations.  In such 
cases, it's not uncommon to have conditional branches (including tail 
call optimisations) that have targets outside the 7 bit offset range - 
optimal choices of branch and jump instructions can make a difference 
here.  You wouldn't do it with multiple "peephole" style passes, however.

It is also possible to make choices of branches based on whether you 
expect the branch to be taken or not.  A human assembly programmer will 
not normally be very good at that for most code, since they would 
(should!) emphasis the logical structure of the code rather than 
cycle-counting (critical code is a different matter).  A decent compiler 
can do a reasonable job, possibly aided by things like gcc's 
"builtin_expect" function.

But these make very little difference to real code, unless you need to 
squeeze out a few percent more speed.

> 
>> **
>> you would be crazy to code in ARM assembly?
> 
> I'm just reading the ARM manual. I thought nothing could be crazier
> than the AVR32 instruction set, but the ARM instruction set surely is.
> 

Risc instruction sets are often difficult to follow, especially for 
bigger devices.  I've been using a PPC recently - it really would not be 
easy to write optimal code for an architecture that includes 24 
different variants of the "add" instruction, never mind trying to 
comprehend the bit field instructions.  It's /so/ much easier to let the 
compiler worry about these details - if you try and hand-code the 
assembly, the chances of overlooking a faster code sequence for a given 
task are very high.

> 
>> also those who are not crazy DO write assembler for ARM
> 
> I didn't say that only crazy people write assembly code for the ARM.
> I said, if you have to write ARM assembly code for a longer time,
> you will become crazy.

:-)

Article: 142562
Subject: Re: Virtex 4 package code
From: "MM" <mbmsv@yahoo.com>
Date: Mon, 17 Aug 2009 10:50:24 -0400
Links: << >>  << T >>  << A >>
Jon,

These are probably V4FX parts, right? If that's the case ES4 was the last 
before fully qualified parts had been released. As far as I remember there 
were no any major issues with ES4s and the bitstream was compatible with 
fully qualified chips, including MGT related portions. However, strictly 
speaking, I believe you should make a separate build for them with CONFIG 
STEPPING = "SCD1" in the ucf file. Earlier ES releases had serious issues 
with MGTs and certainly not recommended for any production boards if MGTs 
are required.


/Mikhail




"maxascent" <maxascent@yahoo.co.uk> wrote in message 
news:zoKdncTItZ7wpBXXnZ2dnUVZ_oGdnZ2d@giganews.com...
>I have obtained some Virtex 4 devices that have a code on them of ES4.
> Looking around the web it seems that these are engineering samples. My
> question is are these devices good to use or can they be used in a
> production board?
>
> Jon 



Article: 142563
Subject: Re: Virtex 4 package code
From: johnbean_uk@hotmail.com
Date: Mon, 17 Aug 2009 08:18:57 -0700 (PDT)
Links: << >>  << T >>  << A >>
You are right they are FX parts. So if I didnt want to use them would
it be ok to sell them or is this not possible with ES parts?

Thanks

Jon

Article: 142564
Subject: Re: Virtex 4 package code
From: "MM" <mbmsv@yahoo.com>
Date: Mon, 17 Aug 2009 11:59:15 -0400
Links: << >>  << T >>  << A >>
<johnbean_uk@hotmail.com> wrote
> You are right they are FX parts. So if I didnt want to use them would
> it be ok to sell them or is this not possible with ES parts?

In my view it's OK to sell them if they came from a trusted source. I think 
you'll find a buyer if the price is fair.


/Mikhail




Article: 142565
Subject: Re: Using carry chain of counters for term count detect
From: JimLewis <Jim@SynthWorks.com>
Date: Mon, 17 Aug 2009 09:15:48 -0700 (PDT)
Links: << >>  << T >>  << A >>
Hi Rick,
Integers and such are great for sim run time, however, if you are not
getting the hardware you want, here is an array based algorithm that
only uses one carry cell to implement the zero detect.  It has a few
extras that you may want to remove.  BaseReg is the loadable base
register.  CntReg keeps the current count value.  IntReg is a
registered version of the zero detect.

Best,
Jim
SynthWorks VHDL training


TimerProc : process (Clk, nReset)
  variable Dec : unsigned(CntReg'Length downto 0) ;
begin
  if (nReset = '0') then
    BaseReg <= (others => '0') ;
    CntReg  <= (others => '0') ;
    IntReg  <= '0' ;
  elsif rising_edge(Clk) then
    if (TimerSel = '1' and Read = '0') then
      BaseReg <= unsigned(DataIn) ;
    end if ;
    Dec  := ('0' & CntReg) - 1 ;
    if (Dec(Dec'Left) = '1') then
      CntReg  <= BaseReg ;
    else
      CntReg  <= Dec(CntReg'Range);
    end if ;
    IntReg    <= Dec(Dec'Left) ;
  end if ;
end process ;



Article: 142566
Subject: Operating same logic at two frequencies
From: fpgabuilder <parekh.sh@gmail.com>
Date: Mon, 17 Aug 2009 10:52:06 -0700 (PDT)
Links: << >>  << T >>  << A >>
I am writing some logic that is supposed to work at 80MHz in one mode
and 40MHz in another.  I am wondering if I need to run timing analysis
at both the frequencies?  The idea is to use clock muxes to select one
or the other clock during operation.

TIA for any insights.

Best regards,
Sanjay

Article: 142567
Subject: Re: Using carry chain of counters for term count detect
From: Andy <jonesandy@comcast.net>
Date: Mon, 17 Aug 2009 11:12:56 -0700 (PDT)
Links: << >>  << T >>  << A >>
That's about the cleanest example using vectors I've seen.

I'm not sure it wouldn't be subject to the same final optimizations
from Synplify (et al?), since those optimizations were related more to
the entire carry chain than to just the end of it. Although outputting
the carry bit in the IntReg register would likely give it a strong
nudge towards preserving the carry bit intact (if not the entire
chain). I've not checked any results from integer-coded
implementations that also registered (count - 1 < 0) as a boolean
output.

Be careful if CntReg'Range is not "n downto 0".

Andy

Article: 142568
Subject: Re: Operating same logic at two frequencies
From: Andy <jonesandy@comcast.net>
Date: Mon, 17 Aug 2009 11:15:35 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Aug 17, 12:52=A0pm, fpgabuilder <parekh...@gmail.com> wrote:
> I am writing some logic that is supposed to work at 80MHz in one mode
> and 40MHz in another. =A0I am wondering if I need to run timing analysis
> at both the frequencies? =A0The idea is to use clock muxes to select one
> or the other clock during operation.
>
> TIA for any insights.
>
> Best regards,
> Sanjay

In an FPGA, if it works at 80, it will work at 40, so long as the same
clock distribution network is used (usually the case).

Run timing analysis at 80.

Be careful how you switch clock speeds...

Andy

Article: 142569
Subject: Re: Operating same logic at two frequencies
From: "Symon" <symon_brewer@hotmail.com>
Date: Mon, 17 Aug 2009 19:46:48 +0100
Links: << >>  << T >>  << A >>
fpgabuilder wrote:
> I am writing some logic that is supposed to work at 80MHz in one mode
> and 40MHz in another.  I am wondering if I need to run timing analysis
> at both the frequencies?  The idea is to use clock muxes to select one
> or the other clock during operation.
>
> TIA for any insights.
>
> Best regards,
> Sanjay

Run it at 80MHz all the time. Use a clock enable to make it go at 40MHz. 
That makes the timing analysis easy.
Syms. 



Article: 142570
Subject: Re: Operating same logic at two frequencies
From: fpgabuilder <parekh.sh@gmail.com>
Date: Mon, 17 Aug 2009 11:52:56 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Aug 17, 11:15=A0am, Andy <jonesa...@comcast.net> wrote:
> On Aug 17, 12:52=A0pm, fpgabuilder <parekh...@gmail.com> wrote:
>
> > I am writing some logic that is supposed to work at 80MHz in one mode
> > and 40MHz in another. =A0I am wondering if I need to run timing analysi=
s
> > at both the frequencies? =A0The idea is to use clock muxes to select on=
e
> > or the other clock during operation.
>
> > TIA for any insights.
>
> > Best regards,
> > Sanjay
>
> In an FPGA, if it works at 80, it will work at 40, so long as the same
> clock distribution network is used (usually the case).
>
> Run timing analysis at 80.
>
> Be careful how you switch clock speeds...
>
> Andy

Thanks for the info Andy.  Why do you say "In an FPGA, if it works at
80, it will work at 40" ?  I had remember reading about a couple of
generations old fpgas that the clock network delays are almost always
less compared to the data delays and therefore hold times are
typically not an issue.  But here the clock is half and the FPGA is
much faster.

-sanjay


Article: 142571
Subject: VHDL code for finding standard deviation for a chunk of numbers
From: Pratap <pratap.iisc@gmail.com>
Date: Mon, 17 Aug 2009 13:16:29 -0700 (PDT)
Links: << >>  << T >>  << A >>
Hi
I want to synthesize a block which can find out standard deviation of
a given sequence of numbers. Preferably I need least area. For
simplicity of hardware I can manage with multiple powers of 2 number
of samples.
Can anybody help me find out that or do that coding?
Thanks,
-Pratap

Article: 142572
Subject: Re: VHDL code for finding standard deviation for a chunk of numbers
From: Rob Gaddi <rgaddi@technologyhighland.com>
Date: Mon, 17 Aug 2009 13:24:02 -0700
Links: << >>  << T >>  << A >>
On Mon, 17 Aug 2009 13:16:29 -0700 (PDT)
Pratap <pratap.iisc@gmail.com> wrote:

> Hi
> I want to synthesize a block which can find out standard deviation of
> a given sequence of numbers. Preferably I need least area. For
> simplicity of hardware I can manage with multiple powers of 2 number
> of samples.
> Can anybody help me find out that or do that coding?
> Thanks,
> -Pratap

Can you live with variance?  Neither task is trivial, but saving
yourself the square root at the end will save you some headaches.

-- 
Rob Gaddi, Highland Technology
Email address is currently out of order

Article: 142573
Subject: Embedded Memory Controller
From: "Roger" <rogerwilson@hotmail.com>
Date: Mon, 17 Aug 2009 21:31:03 +0100
Links: << >>  << T >>  << A >>
Comparing the Virtex 6 and Spartan 6 devices , one difference is the lack of 
any Embedded Memory Controller hard cores in the Virtex 6. These cores look 
very useful so why aren't they also included in the V6? Does anyone have a 
view?

Rog. 


Article: 142574
Subject: Re: VHDL code for finding standard deviation for a chunk of numbers
From: Pratap <pratap.iisc@gmail.com>
Date: Mon, 17 Aug 2009 13:32:49 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Aug 18, 1:24=A0am, Rob Gaddi <rga...@technologyhighland.com> wrote:
> On Mon, 17 Aug 2009 13:16:29 -0700 (PDT)
>
> Pratap <pratap.i...@gmail.com> wrote:
> > Hi
> > I want to synthesize a block which can find out standard deviation of
> > a given sequence of numbers. Preferably I need least area. For
> > simplicity of hardware I can manage with multiple powers of 2 number
> > of samples.
> > Can anybody help me find out that or do that coding?
> > Thanks,
> > -Pratap
>
> Can you live with variance? =A0Neither task is trivial, but saving
> yourself the square root at the end will save you some headaches.
>
> --
> Rob Gaddi, Highland Technology
> Email address is currently out of order

Yes...variance will also do...



Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search