Messages from 155075

Article: 155075
Subject: Re: RS232 VHDL-core
From: sergiy.lukin@gmail.com
Date: Fri, 5 Apr 2013 07:29:34 -0700 (PDT)
Links: << >> << T >> << A >>

=D1=81=D1=80=D0=B5=D0=B4=D0=B0, 9 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2005=C2=A0=
=D0=B3., 16:33:14 UTC+2 =D0=BF=D0=BE=D0=BB=D1=8C=D0=B7=D0=BE=D0=B2=D0=B0=D1=
=82=D0=B5=D0=BB=D1=8C jandc =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB:
> > I wonder if there is any simple way to send the data from a block-ram t=
o=20
> > the RS232-interface, without the need to write all the RS232 VHDL-code=
=20
> > myself!
>=20
> There you go. And can we now stop requesting RS232 stuff? ;)
>=20
> Jan

Hi Jan!

I know this might be a little too late to ask, but would you be so kind as =
to post here the libraries mentioned by backhus? I mean these:

> library work;
> use work.shift_registers.all;

> library nicethings;
> use nicethings.ASCII.all, nicethings.overloaded_std_logic_arith.all;

Best regards,
Sergiy.

Article: 155076
Subject: Re: MISC - Stack Based vs. Register Based
From: Mark Wills <markrobertwills@yahoo.co.uk>
Date: Fri, 5 Apr 2013 07:33:10 -0700 (PDT)
Links: << >> << T >> << A >>

On Apr 5, 1:31=A0pm, Arlet Ottens <usene...@c-scape.nl> wrote:
> On 04/05/2013 09:51 AM, Mark Wills wrote:
>
>
>
>
>
> >> I'm pretty sure that conclusion is not correct. =A0If you have an
> >> instruction that does two or three memory accesses in one instruction
> >> and you replace it with three instructions that do one memory access
> >> each, you end up with two extra memory accesses. =A0How is this faster=
?
>
> >> That is one of the reasons why I want to increase code density, in my
> >> machine it automatically improves execution time as well as reducing t=
he
> >> amount of storage needed.
> > I think you're on the right track. With FPGAs it's really quite simple
> > to execute all instructions in a single cycle. It's no big deal at all
> > - with MPY and DIV being exceptions. In the 'Forth CPU world' even
> > literals can be loaded in a single cycle. It then comes down to
> > careful selection of your instruction set. With a small enough
> > instruction set one can pack more than one instruction in a word - and
> > there's your code density. If you can pack more than one instruction
> > in a word, you can execute them in a single clock cycle. With added
> > complexity, you may even be able to execute them in parallel rather
> > than as a process.
>
> Multiple instructions per word sounds like a bad idea. It requires
> instructions that are so small that they can't do very much, so you need
> more of them. And if you need 2 or more small instructions to do
> whatever 1 big instruction does, it's better to use 1 big instruction
> since it makes instruction decoding more efficient and simpler.- Hide quo=
ted text -
>
> - Show quoted text -

If you're referring to general purpose CPUs I'm inclined to agree.
When commenting earlier, I had in mind a Forth CPU, which executes
Forth words as native CPU instructions. That is, the instruction set
is Forth.

Since there are no need for things like addressing modes in (shall we
say) classical Forth processors then you don't actually need things
like bit-fields in which to encode registers and/or addressing modes*.
All you're left with is the instructions themselves. And you don't
need that many bits for that.

A Forth chip that I am collaborating on right now two 6 bit
instruction slots per 16-bit word, and a 4-bit 'special' field for
other stuff. We haven't allocated all 64 instructions yet.

* Even though it's not strictly necessary in a classical registerless
Forth CPU, bit-fields can be useful. We're using a couple of bits to
tell the ALU if a word pushes results or pops arguments, for example.

Article: 155077
Subject: Re: MISC - Stack Based vs. Register Based
From: Arlet Ottens <usenet+5@c-scape.nl>
Date: Fri, 05 Apr 2013 17:01:27 +0200
Links: << >> << T >> << A >>

On 04/05/2013 04:33 PM, Mark Wills wrote:

>>>> That is one of the reasons why I want to increase code density, in my
>>>> machine it automatically improves execution time as well as reducing the
>>>> amount of storage needed.
>>> I think you're on the right track. With FPGAs it's really quite simple
>>> to execute all instructions in a single cycle. It's no big deal at all
>>> - with MPY and DIV being exceptions. In the 'Forth CPU world' even
>>> literals can be loaded in a single cycle. It then comes down to
>>> careful selection of your instruction set. With a small enough
>>> instruction set one can pack more than one instruction in a word - and
>>> there's your code density. If you can pack more than one instruction
>>> in a word, you can execute them in a single clock cycle. With added
>>> complexity, you may even be able to execute them in parallel rather
>>> than as a process.
>>
>> Multiple instructions per word sounds like a bad idea. It requires
>> instructions that are so small that they can't do very much, so you need
>> more of them. And if you need 2 or more small instructions to do
>> whatever 1 big instruction does, it's better to use 1 big instruction
>> since it makes instruction decoding more efficient and simpler.- Hide quoted text -
>>
>> - Show quoted text -
>
> If you're referring to general purpose CPUs I'm inclined to agree.
> When commenting earlier, I had in mind a Forth CPU, which executes
> Forth words as native CPU instructions. That is, the instruction set
> is Forth.
>
> Since there are no need for things like addressing modes in (shall we
> say) classical Forth processors then you don't actually need things
> like bit-fields in which to encode registers and/or addressing modes*.
> All you're left with is the instructions themselves. And you don't
> need that many bits for that.
>
> A Forth chip that I am collaborating on right now two 6 bit
> instruction slots per 16-bit word, and a 4-bit 'special' field for
> other stuff. We haven't allocated all 64 instructions yet.
>
> * Even though it's not strictly necessary in a classical registerless
> Forth CPU, bit-fields can be useful. We're using a couple of bits to
> tell the ALU if a word pushes results or pops arguments, for example.

Well, I was thinking about general purpose applications. A Forth CPU may 
map well to a Forth program, but you also have to take into account how 
well the problem you want to solve maps to a Forth program.

A minimal stack based CPU can be efficient if the values you need can be 
kept on top of the stack. But if you need 5 or 6 intermediate values, 
you'll need to store them in memory, resulting in expensive access to 
them. Even getting the address of a memory location where a value is 
stored can be expensive.

Compare that to a register based machine with 8 registers. You need 3 
more opcode bits, but you get immediate access to a pool of 8 
intermediate values. And, with some clever encoding (like rickman 
suggested) some operations can be restricted to a subset of the 
registers, relaxing the number of encoding bits required.

It would be interesting to see a comparison using non-trivial 
applications, and see how much code is required for one of those minimal 
stack CPUs compared to a simple register based CPU.

Article: 155078
Subject: Re: MISC - Stack Based vs. Register Based
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Fri, 5 Apr 2013 17:33:30 +0000 (UTC)
Links: << >> << T >> << A >>

In comp.arch.fpga Arlet Ottens <usenet+5@c-scape.nl> wrote:

(snip on MISC, RISC, and CISC)

> Multiple instructions per word sounds like a bad idea. It requires 
> instructions that are so small that they can't do very much, so you need 
> more of them. And if you need 2 or more small instructions to do 
> whatever 1 big instruction does, it's better to use 1 big instruction 
> since it makes instruction decoding more efficient and simpler.

Well, it depends on your definition of instruction.

The CDC 60 bit computers have multiple instructions per 60 bit word,
but then the word is big enough.

Now, consider that on many processors an instruction can push or pop
only one register to/from the stack.

The 6809 push and pop instructions have a bit mask that allows up
to eight register to be pushed or popped in one instruction.
(It takes longer for more, but not as long as for separate
instructions.)

For x87, there are instructions that do some math function and
then do, or don't, remove values from the stack. Some might count
those as two or three instructions.

-- glen

Article: 155079
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Fri, 05 Apr 2013 18:00:34 -0400
Links: << >> << T >> << A >>

On 4/4/2013 9:17 PM, Albert van der Horst wrote:
> In article<kjkpnp$qdp$1@dont-email.me>, rickman<gnuarm@gmail.com>  wrote:
>>
>> Albert, do you have a reference about this?
>
> Not in a wikipedia sense where you're not allowed to mention original
> research, and are only quoting what is in the books. It is more experience
> that school knowledge.
>
> If you want to see what can be done with a good macro processor like m4
> study the one source of the 16/32/64 bit ciforth x86 for linux/Windows/Apple.
> See my site below.
>
> The existance of an XLAT instruction (to name an example) OTOH does virtually
> nothing to make the life of an assembler programmer better.
>
> Groetjes Albert

Ok, I'm a bit busy with a number of things to go into this area now, but 
I appreciate the info.

I have used macro assemblers in the past and got quite good at them.  I 
even developed a micro programmed board which used a macro assembler and 
added new opcodes for my design (it was a company boilerplate design 
adapted to a new host) to facilitate some self test functions.  I sorta 
got yelled at because this meant you needed my assembler adaptations to 
assemble the code for the board.  I wasn't ordered to take it out so it 
remained.  I didn't.  I left within a year and the company eventually 
folded.  You can make a connection if you wish... lol

-- 

Rick

Article: 155080
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Fri, 05 Apr 2013 18:08:47 -0400
Links: << >> << T >> << A >>

On 4/5/2013 3:51 AM, Mark Wills wrote:
> On Apr 5, 1:07 am, rickman<gnu...@gmail.com>  wrote:
>> On 4/4/2013 7:16 AM, Albert van der Horst wrote:
>>
>>
>>
>>
>>
>>> In article<kjin8q$so...@speranza.aioe.org>,
>>> glen herrmannsfeldt<g...@ugcs.caltech.edu>    wrote:
>>>> In comp.arch.fpga Rod Pemberton<do_not_h...@notemailnotq.cpm>    wrote:
>>>>> "rickman"<gnu...@gmail.com>    wrote in message
>>>>> news:kjf48e$5qu$1@dont-email.me...
>>>>>> Weren't you the person who brought CISC into this discussion?
>>
>>>>> Yes.
>>
>>>>>> Why are you asking this question about CISC?
>>
>>>>> You mentioned code density.  AISI, code density is purely a CISC
>>>>> concept.  They go together and are effectively inseparable.
>>
>>>> They do go together, but I am not so sure that they are inseperable.
>>
>>>> CISC began when much coding was done in pure assembler, and anything
>>>> that made that easier was useful. (One should figure out the relative
>>>> costs, but at least it was in the right direction.)
>>
>>> But, of course, this is a fallacy. The same goal is accomplished by
>>> macro's, and better. Code densitity is the only valid reason.
>>
>> I'm pretty sure that conclusion is not correct.  If you have an
>> instruction that does two or three memory accesses in one instruction
>> and you replace it with three instructions that do one memory access
>> each, you end up with two extra memory accesses.  How is this faster?
>>
>> That is one of the reasons why I want to increase code density, in my
>> machine it automatically improves execution time as well as reducing the
>> amount of storage needed.
>>
>> --
>>
>> Rick- Hide quoted text -
>>
>> - Show quoted text -
>
> I think you're on the right track. With FPGAs it's really quite simple
> to execute all instructions in a single cycle. It's no big deal at all
> - with MPY and DIV being exceptions. In the 'Forth CPU world' even
> literals can be loaded in a single cycle. It then comes down to
> careful selection of your instruction set. With a small enough
> instruction set one can pack more than one instruction in a word - and
> there's your code density. If you can pack more than one instruction
> in a word, you can execute them in a single clock cycle. With added
> complexity, you may even be able to execute them in parallel rather
> than as a process.

I have looked at the multiple instruction in parallel thing and have not 
made any conclusions yet.  To do that you need a bigger instruction word 
and smaller instruction opcodes.  The opcodes essentially have to become 
specific for the execution units.  My design has three, the data stack, 
the return stack and the instruction fetch.  It is a lot of work to 
consider this because there are so many tradeoffs to analyze.

One issue that has always bugged me is that allocating some four or five 
bits for the instruction fetch instruction seems very wasteful when some 
70-90% of the time the instruction is IP < IP+1.  Trying to Huffman 
encode this is a bit tricky as what do you do with the unused bit???  I 
gave up looking at this until after I master the Rubik's Cube. lol

It does clearly have potential, it's just a bear to unlock without 
adding a lot of data pathways to the design.

-- 

Rick

Article: 155081
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Fri, 05 Apr 2013 18:25:01 -0400
Links: << >> << T >> << A >>

On 4/5/2013 11:01 AM, Arlet Ottens wrote:
> On 04/05/2013 04:33 PM, Mark Wills wrote:
>
>>>>> That is one of the reasons why I want to increase code density, in my
>>>>> machine it automatically improves execution time as well as
>>>>> reducing the
>>>>> amount of storage needed.
>>>> I think you're on the right track. With FPGAs it's really quite simple
>>>> to execute all instructions in a single cycle. It's no big deal at all
>>>> - with MPY and DIV being exceptions. In the 'Forth CPU world' even
>>>> literals can be loaded in a single cycle. It then comes down to
>>>> careful selection of your instruction set. With a small enough
>>>> instruction set one can pack more than one instruction in a word - and
>>>> there's your code density. If you can pack more than one instruction
>>>> in a word, you can execute them in a single clock cycle. With added
>>>> complexity, you may even be able to execute them in parallel rather
>>>> than as a process.
>>>
>>> Multiple instructions per word sounds like a bad idea. It requires
>>> instructions that are so small that they can't do very much, so you need
>>> more of them. And if you need 2 or more small instructions to do
>>> whatever 1 big instruction does, it's better to use 1 big instruction
>>> since it makes instruction decoding more efficient and simpler.- Hide
>>> quoted text -
>>>
>>> - Show quoted text -
>>
>> If you're referring to general purpose CPUs I'm inclined to agree.
>> When commenting earlier, I had in mind a Forth CPU, which executes
>> Forth words as native CPU instructions. That is, the instruction set
>> is Forth.
>>
>> Since there are no need for things like addressing modes in (shall we
>> say) classical Forth processors then you don't actually need things
>> like bit-fields in which to encode registers and/or addressing modes*.
>> All you're left with is the instructions themselves. And you don't
>> need that many bits for that.
>>
>> A Forth chip that I am collaborating on right now two 6 bit
>> instruction slots per 16-bit word, and a 4-bit 'special' field for
>> other stuff. We haven't allocated all 64 instructions yet.
>>
>> * Even though it's not strictly necessary in a classical registerless
>> Forth CPU, bit-fields can be useful. We're using a couple of bits to
>> tell the ALU if a word pushes results or pops arguments, for example.
>
> Well, I was thinking about general purpose applications. A Forth CPU may
> map well to a Forth program, but you also have to take into account how
> well the problem you want to solve maps to a Forth program.

Just to make a point, my original post wasn't really about Forth 
processors specifically.  It was about MISC processors which may or may 
not be programmed in Forth.

> A minimal stack based CPU can be efficient if the values you need can be
> kept on top of the stack. But if you need 5 or 6 intermediate values,
> you'll need to store them in memory, resulting in expensive access to
> them. Even getting the address of a memory location where a value is
> stored can be expensive.

There aren't many algorithms that can't be dealt with reasonably using 
the data and return stacks.  The module I was coding that made me want 
to try a register approach has four input variables, two double 
precision.  These are in memory and have to be read in because this is 
really an interrupt routine and there is no stack input.  The main 
process would have access to these parameters to update them.

The stack routine uses up to 9 levels on the data stack currently.  But 
that is because to optimize the execution time I was waiting until the 
end when I could save them off more efficiently all at once.  But - in 
the process of analyzing the register processor I realized that I had an 
unused opcode that would allow me to save the parameters in the same way 
I am doing it in the register based code, reducing the stack usage to 
five items max.

I understand that many stack based machines only have an 8 level data 
stack, period.  The GA144 is what, 10 words, 8 circular and two registers?

> Compare that to a register based machine with 8 registers. You need 3
> more opcode bits, but you get immediate access to a pool of 8
> intermediate values. And, with some clever encoding (like rickman
> suggested) some operations can be restricted to a subset of the
> registers, relaxing the number of encoding bits required.

That's the whole enchilada with a register MISC, figuring out how to 
encode the registers in the opcodes.  I think I've got a pretty good 
trade off currently, but I have not looked at running any Forth code on 
it yet.  This may be much less efficient than a stack machine... duh!

> It would be interesting to see a comparison using non-trivial
> applications, and see how much code is required for one of those minimal
> stack CPUs compared to a simple register based CPU.

I have been invited to get a presentation to the SVFIG on my design.  I 
need to work up some details and will let you know when that will be. 
They are talking about doing a Google+ hangout which I suppose is a 
video since it would be replayed for the meeting.  I'm not sure I can be 
ready in time for the next meeting, but I'll see.  I don't think my 
design is all that novel in the grand scheme of MISC.  So I likely will 
do this as a comparison between the two designs.  I think I also need to 
bone up on some of the other designs out there like the J1 and the B16.

-- 

Rick

Article: 155082
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Sat, 06 Apr 2013 17:32:03 -0400
Links: << >> << T >> << A >>

On 4/5/2013 3:51 AM, Mark Wills wrote:
>
> I think you're on the right track. With FPGAs it's really quite simple
> to execute all instructions in a single cycle. It's no big deal at all
> - with MPY and DIV being exceptions. In the 'Forth CPU world' even
> literals can be loaded in a single cycle. It then comes down to
> careful selection of your instruction set. With a small enough
> instruction set one can pack more than one instruction in a word - and
> there's your code density. If you can pack more than one instruction
> in a word, you can execute them in a single clock cycle. With added
> complexity, you may even be able to execute them in parallel rather
> than as a process.

This morning I found one exception to the one cycle rule.  Block RAM. 
My original design was for the Altera ACEX part which is quite old now, 
but had async read on the block rams for memory and stack.  So the main 
memory and stacks could be accessed for reading and writing in the same 
clock cycle, read/modify/write.  You can't do that with today's block 
RAMs, they are totally synchronous.

I was looking at putting the registers in a block RAM so I could get two 
read ports and two write ports.  But this doesn't work the way an async 
read RAM will.  So I may consider using a multiphase clock.  The 
operations that really mess me up is the register indirect reads of 
memory, like stack accesses... or any memory accesses really, that is 
the only way to address memory, via register.  So the address has to be 
read from register RAM, then the main memory RAM is read, then the 
result is written back to the register RAM.  Wow!  That is three clock 
edges I'll need.

If I decide to go with using block RAM for registers it will give me N 
sets of regs so I have a big motivation.  It also has potential for 
reducing the total amount of logic since a lot of the multiplexing ends 
up inside the RAM.

The multiphase clocking won't be as complex as using multiple machine 
cycles for more complex instructions.  But it is more complex than the 
good old simple clock I have worked with.  It will also require some 
tighter timing and more complex timing constraints which are always hard 
to get correct.

-- 

Rick

Article: 155083
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Sat, 06 Apr 2013 17:37:03 -0400
Links: << >> << T >> << A >>

On 4/5/2013 8:31 AM, Arlet Ottens wrote:
> On 04/05/2013 09:51 AM, Mark Wills wrote:
>
>>> I'm pretty sure that conclusion is not correct. If you have an
>>> instruction that does two or three memory accesses in one instruction
>>> and you replace it with three instructions that do one memory access
>>> each, you end up with two extra memory accesses. How is this faster?
>>>
>>> That is one of the reasons why I want to increase code density, in my
>>> machine it automatically improves execution time as well as reducing the
>>> amount of storage needed.
>
>> I think you're on the right track. With FPGAs it's really quite simple
>> to execute all instructions in a single cycle. It's no big deal at all
>> - with MPY and DIV being exceptions. In the 'Forth CPU world' even
>> literals can be loaded in a single cycle. It then comes down to
>> careful selection of your instruction set. With a small enough
>> instruction set one can pack more than one instruction in a word - and
>> there's your code density. If you can pack more than one instruction
>> in a word, you can execute them in a single clock cycle. With added
>> complexity, you may even be able to execute them in parallel rather
>> than as a process.
>
> Multiple instructions per word sounds like a bad idea. It requires
> instructions that are so small that they can't do very much, so you need
> more of them. And if you need 2 or more small instructions to do
> whatever 1 big instruction does, it's better to use 1 big instruction
> since it makes instruction decoding more efficient and simpler.

I think multiple instructions per word is a good idea of you have a wide 
disparity in speed between instruction memory and the CPU clock rate. 
If not, why introduce the added complexity?  Well, unless you plan to 
execute them simultaneously...   I took a look at a 16 bit VLIW idea 
once and didn't care for the result.  There are just too many control 
points so that a proper VLIW design would need more than just 16 bits I 
think.  At least in the stack design.

An 18 bit instruction word might be worth looking at in the register 
CPU.  But getting more parallelism in the register design will require 
more datapaths and there goes the "minimal" part of the MISC.

So many options, so little time...

-- 

Rick

Article: 155084
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Sat, 06 Apr 2013 17:44:56 -0400
Links: << >> << T >> << A >>

On 4/5/2013 10:33 AM, Mark Wills wrote:
> On Apr 5, 1:31 pm, Arlet Ottens<usene...@c-scape.nl>  wrote:
>> On 04/05/2013 09:51 AM, Mark Wills wrote:
>>
>>
>>
>>
>>
>>>> I'm pretty sure that conclusion is not correct.  If you have an
>>>> instruction that does two or three memory accesses in one instruction
>>>> and you replace it with three instructions that do one memory access
>>>> each, you end up with two extra memory accesses.  How is this faster?
>>
>>>> That is one of the reasons why I want to increase code density, in my
>>>> machine it automatically improves execution time as well as reducing the
>>>> amount of storage needed.
>>> I think you're on the right track. With FPGAs it's really quite simple
>>> to execute all instructions in a single cycle. It's no big deal at all
>>> - with MPY and DIV being exceptions. In the 'Forth CPU world' even
>>> literals can be loaded in a single cycle. It then comes down to
>>> careful selection of your instruction set. With a small enough
>>> instruction set one can pack more than one instruction in a word - and
>>> there's your code density. If you can pack more than one instruction
>>> in a word, you can execute them in a single clock cycle. With added
>>> complexity, you may even be able to execute them in parallel rather
>>> than as a process.
>>
>> Multiple instructions per word sounds like a bad idea. It requires
>> instructions that are so small that they can't do very much, so you need
>> more of them. And if you need 2 or more small instructions to do
>> whatever 1 big instruction does, it's better to use 1 big instruction
>> since it makes instruction decoding more efficient and simpler.- Hide quoted text -
>>
>> - Show quoted text -
>
> If you're referring to general purpose CPUs I'm inclined to agree.
> When commenting earlier, I had in mind a Forth CPU, which executes
> Forth words as native CPU instructions. That is, the instruction set
> is Forth.
>
> Since there are no need for things like addressing modes in (shall we
> say) classical Forth processors then you don't actually need things
> like bit-fields in which to encode registers and/or addressing modes*.
> All you're left with is the instructions themselves. And you don't
> need that many bits for that.
>
> A Forth chip that I am collaborating on right now two 6 bit
> instruction slots per 16-bit word, and a 4-bit 'special' field for
> other stuff. We haven't allocated all 64 instructions yet.
>
> * Even though it's not strictly necessary in a classical registerless
> Forth CPU, bit-fields can be useful. We're using a couple of bits to
> tell the ALU if a word pushes results or pops arguments, for example.

That can be useful.  The automatic pop of operands is sometimes 
expensive by requiring a DUP beforehand.  In my design the fetch/store 
words have versions to drop the address from the return stack or 
increment and hold on to it.  In looking at the register design I 
realized it would be useful to just make the plus and the drop both 
options so now there are three versions, fetch, fetch++ and fetchK (keep).

-- 

Rick

Article: 155085
Subject: Re: MISC - Stack Based vs. Register Based
From: Brian Davis <brimdavis@aol.com>
Date: Sun, 7 Apr 2013 14:59:42 -0700 (PDT)
Links: << >> << T >> << A >>

rickman wrote:
>
> So the main memory and stacks could be accessed for reading and writing
> in the same clock cycle, read/modify/write.  You can't do that with today's
> block RAMs, they are totally synchronous.
>
I had the same problem when I first moved my XC4000 based RISC
over to the newer parts with registered Block RAM.

I ended up using opposite edge clocking, with a dual port BRAM,
to get what appears to be single cycle access on the data and
instruction ports.

As this approach uses the same clock, the constraints are painless;
but you now have half a clock for address -> BRAM setup, and half
for the BRAM data <-> core data setup. The latter can cause some
some timing issues if the core is configured with a byte lane mux
so as to support 8/16/32 bit {sign extending} loads.

-Brian

Article: 155086
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Mon, 08 Apr 2013 01:16:52 -0400
Links: << >> << T >> << A >>

On 4/7/2013 5:59 PM, Brian Davis wrote:
> rickman wrote:
>>
>> So the main memory and stacks could be accessed for reading and writing
>> in the same clock cycle, read/modify/write.  You can't do that with today's
>> block RAMs, they are totally synchronous.
>>
> I had the same problem when I first moved my XC4000 based RISC
> over to the newer parts with registered Block RAM.
>
> I ended up using opposite edge clocking, with a dual port BRAM,
> to get what appears to be single cycle access on the data and
> instruction ports.
>
> As this approach uses the same clock, the constraints are painless;
> but you now have half a clock for address ->  BRAM setup, and half
> for the BRAM data<->  core data setup. The latter can cause some
> some timing issues if the core is configured with a byte lane mux
> so as to support 8/16/32 bit {sign extending} loads.

Yes, that was one way to solve the problem.  This other I considered was 
to separate the read and write on the two ports.  Then the read would be 
triggered from the address that was at the input to the address 
register... from the previous cycle.  So the read would *always* be done 
and the data presented whether you used it or not.  I'm not sure how 
much power this would waste, but the timing impact would be small.

I looked at making the register block RAM part of the main memory 
address space.  This would required a minimum of three clock cycles in a 
machine cycle, read address or data from register, use address to read 
or write data from/to memory and then write data to register.  If it 
helps timing, the memory write can be done at the same time as the 
register write.  I'm not crazy about this approach, but I'm considering 
how useful it would be to have direct address capability of the multiple 
register banks.

Some of the comments about register vs. stacks and what I have seen of 
the J1 has made me think about a hybrid approach using stacks in memory, 
but with offset access, so items further down in the stack can be 
operands, not just TOS and NOS.  This has potential for saving stack 
operations.  The J1 has a two bit field controlling the stack pointer, I 
assume that is +1 to -2 or 1 push to 2 pop.  The author claims this 
provides some ability to combine Forth functions into one instruction, 
but doesn't provide details.  I guess the compiler code would have to be 
examined to find out what combinations would be useful.

The compiler end is not my strong suit, but I suppose I could figure out 
how to take advantage of features like this.

-- 

Rick

Article: 155087
Subject: Re: MISC - Stack Based vs. Register Based
From: albert@spenarnc.xs4all.nl (Albert van der Horst)
Date: 08 Apr 2013 09:52:52 GMT
Links: << >> << T >> << A >>

In article <kjtjne$nfu$1@dont-email.me>, rickman  <gnuarm@gmail.com> wrote:
>On 4/7/2013 5:59 PM, Brian Davis wrote:
>> rickman wrote:
>>>
>>> So the main memory and stacks could be accessed for reading and writing
>>> in the same clock cycle, read/modify/write.  You can't do that with today's
>>> block RAMs, they are totally synchronous.
>>>
>> I had the same problem when I first moved my XC4000 based RISC
>> over to the newer parts with registered Block RAM.
>>
>> I ended up using opposite edge clocking, with a dual port BRAM,
>> to get what appears to be single cycle access on the data and
>> instruction ports.
>>
>> As this approach uses the same clock, the constraints are painless;
>> but you now have half a clock for address ->  BRAM setup, and half
>> for the BRAM data<->  core data setup. The latter can cause some
>> some timing issues if the core is configured with a byte lane mux
>> so as to support 8/16/32 bit {sign extending} loads.
>
>Yes, that was one way to solve the problem.  This other I considered was
>to separate the read and write on the two ports.  Then the read would be
>triggered from the address that was at the input to the address
>register... from the previous cycle.  So the read would *always* be done
>and the data presented whether you used it or not.  I'm not sure how
>much power this would waste, but the timing impact would be small.
>
>I looked at making the register block RAM part of the main memory
>address space.  This would required a minimum of three clock cycles in a
>machine cycle, read address or data from register, use address to read
>or write data from/to memory and then write data to register.  If it
>helps timing, the memory write can be done at the same time as the
>register write.  I'm not crazy about this approach, but I'm considering
>how useful it would be to have direct address capability of the multiple
>register banks.
>
>Some of the comments about register vs. stacks and what I have seen of
>the J1 has made me think about a hybrid approach using stacks in memory,
>but with offset access, so items further down in the stack can be
>operands, not just TOS and NOS.  This has potential for saving stack
>operations.  The J1 has a two bit field controlling the stack pointer, I
>assume that is +1 to -2 or 1 push to 2 pop.  The author claims this
>provides some ability to combine Forth functions into one instruction,
>but doesn't provide details.  I guess the compiler code would have to be
>examined to find out what combinations would be useful.

This is the approach we took with the FIETS chip, about 1980, emulated
on an Osborne CPM computer, never build. The emulation could run a Forth
and it benefited from reaching 8 deep into both the return and the data
stack. It still would be interesting to build using modern FPGA.

>
>The compiler end is not my strong suit, but I suppose I could figure out
>how to take advantage of features like this.
>
>--
>
>Rick

Groetjes Albert
-- 
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Article: 155088
Subject: Re: FPGA for large HDMI switch
From: jonesandy@comcast.net
Date: Mon, 8 Apr 2013 06:13:41 -0700 (PDT)
Links: << >> << T >> << A >>

Matt, can you elaborate on why the OP cannot do this in an FPGA, if a suitable FPGA is available & cost-effective? 

I completely understand that it may be highly unlikely that it can be done in a cost-effective FPGA, but you excluded that as a reason in your reply.

Andy

Article: 155089
Subject: Re: RS232 VHDL-core
From: jonesandy@comcast.net
Date: Mon, 8 Apr 2013 06:23:39 -0700 (PDT)
Links: << >> << T >> << A >>

You do realize that this thread is over 8 years old, right?

And frankly, anything that needs a package named "overloaded_std_logic_arith" would raise my suspicions about its worth. 

Andy

Article: 155090
Subject: Re: FPGA for large HDMI switch
From: thomas.entner99@gmail.com
Date: Mon, 8 Apr 2013 08:58:34 -0700 (PDT)
Links: << >> << T >> << A >>

You might consider to use 16 external receivers and 16 external transmitter=
s and use the FPGA to mux the databuses. There are some Rx/Tx that support =
DDR on the databuses, so this will get you 16pins per Rx/TX (12b+HD+VD+DE+C=
lk) x 32 =3D 512 Pins Total. There are at least low cost Cyclone IV that ha=
ve so many IOs (CE30/CE40).

But I have not checked if this DDR-style Rx/Tx are also available for HDMI1=
.4 and how this solution compares to this crosspoint switches.

Regards,

Thomas

Article: 155091
Subject: Re: MISC - Stack Based vs. Register Based
From: "Rod Pemberton" <do_not_have@notemailnotq.cpm>
Date: Mon, 8 Apr 2013 23:55:50 -0400
Links: << >> << T >> << A >>

"Albert van der Horst" <albert@spenarnc.xs4all.nl> wrote in
message news:515e262b$0$26895$e4fe514c@dreader37.news.xs4all.nl...

> The existance of an XLAT instruction (to name an example)
> OTOH does virtually nothing to make the life of an
> assembler programmer better.
>

Why do you say that?

It seems good for 256 byte (or less) lookup tables, 8-bit
character translation, simple decompression algorithms, etc.  You
can even use it for multiple tables at once, e.g., using XCHG to
swap BX.  It's definately difficult for a compiler implementer to
determine when to use such a CISC instruction.


Rod Pemberton

Article: 155092
Subject: Re: FPGA for large HDMI switch
From: David Brown <david@westcontrol.removethisbit.com>
Date: Tue, 09 Apr 2013 11:02:08 +0200
Links: << >> << T >> << A >>

On 08/04/13 17:58, thomas.entner99@gmail.com wrote:
> You might consider to use 16 external receivers and 16 external
> transmitters and use the FPGA to mux the databuses. There are some
> Rx/Tx that support DDR on the databuses, so this will get you 16pins
> per Rx/TX (12b+HD+VD+DE+Clk) x 32 = 512 Pins Total. There are at
> least low cost Cyclone IV that have so many IOs (CE30/CE40).
> 
> But I have not checked if this DDR-style Rx/Tx are also available for
> HDMI1.4 and how this solution compares to this crosspoint switches.
> 
> Regards,
> 
> Thomas
> 

Unfortunately, the numbers are bigger than that.  HDMI receivers and
transmitters that I have seen have SDR on the databus, but for HDMI1.4
that would be 36 lines at 340 Mbps.  So for 16 channels in and 16
channels out, that would be 36*16*2 = 1152 pins, all running at 340
Mbps.  That's a lot of pins - and even if we got an FPGA big enough,
designing such a board and getting matched lengths on all the lines
needed would be a serious effort.

The crosspoint switches mentioned by another poster are one likely
choice.  The other realistic architecture is to use large numbers of
4-to-1 HDMI multiplexers.

Article: 155093
Subject: Re: IP for SDIO serial port
From: Ulf Samuelsson <ulf@notvalid.emagii.com>
Date: Thu, 11 Apr 2013 20:06:45 +0200
Links: << >> << T >> << A >>

On 2013-03-03 18:23, hamilton wrote:
> I have been looking for an SDIO serial port.
>
> A single chip would work as well.
>
> SDIO <-> async serial port Txd, Rxd, CTS, RTS
>
> Also Linux and Win drivers.
>
> Are these still around ?

You could do this with the right Atmel SAM3 Cortex-M3/M4 controller
Best Regards
Ulf Samuelsson

Article: 155094
Subject: Re: IP for SDIO serial port
From: hamilton <hamilton@nothere.com>
Date: Thu, 11 Apr 2013 14:25:51 -0600
Links: << >> << T >> << A >>

On 4/11/2013 12:06 PM, Ulf Samuelsson wrote:
> On 2013-03-03 18:23, hamilton wrote:
>> I have been looking for an SDIO serial port.
>>
>> A single chip would work as well.
>>
>> SDIO <-> async serial port Txd, Rxd, CTS, RTS
>>
>> Also Linux and Win drivers.
>>
>> Are these still around ?
>
> You could do this with the right Atmel SAM3 Cortex-M3/M4 controller
> Best Regards
> Ulf Samuelsson

1> The serial port that this SDIO port need to connect to is another 
processor. (redesign is not an option)

2> Which SAM device has SDIO device side interface ? ( not Host side)

Article: 155095
Subject: Programming the old Spartan S3E Sample Board
From: dave.g4ugm@gmail.com
Date: Fri, 12 Apr 2013 10:37:45 -0700 (PDT)
Links: << >> << T >> << A >>

Folks,

 I bought a couple of these on Ebay as they were cheap but I am having fun =
trying to program them. The program is normally stored in a Strata Flash me=
mory on the board but the Digilent Adapt software doesn't talk to that type=
 of flash.

  I can see from an old post that Antti Lukats created some software to pro=
gram the flash using the JTAG interface, but the link on xilant.com seems d=
ead. Does any one know if this software is available elsewhere for download=
, or is it possible some one can send me a copy?

Dave

Article: 155096
Subject: Re: Programming the old Spartan S3E Sample Board
From: chrisabele <ccabele@yahoo.com>
Date: Fri, 12 Apr 2013 21:28:50 -0400
Links: << >> << T >> << A >>

On 4/12/2013 1:37 PM, dave.g4ugm@gmail.com wrote:
> Folks,
>
>   I bought a couple of these on Ebay as they were cheap but I am having fun trying to program them. The program is normally stored in a Strata Flash memory on the board but the Digilent Adapt software doesn't talk to that type of flash.
>
>    I can see from an old post that Antti Lukats created some software to program the flash using the JTAG interface, but the link on xilant.com seems dead. Does any one know if this software is available elsewhere for download, or is it possible some one can send me a copy?
>
> Dave
>

I did some work with those those boards several years ago. I have an 
archive of information that includes the Digilent "S3ESP Configurator 
Setup.msi", which I believe I used to program them. As I recall it would 
only work with the Digilent parallel port programming cable (with a 
real, built-in LPT port, not a USB version). The archive (which includes 
PDF user guide and sample projects) is about 13MB, the Setup file is 
less than 1MB. Would you like me to send one of those to you?

Chris

Article: 155097
Subject: XILINX Artix-7
From: "Bodo" <bodo_rauhut@web.de>
Date: Sat, 13 Apr 2013 08:27:56 +0200
Links: << >> << T >> << A >>

Hallo,
hat jemand Erfahrungen mit der neuen 7-er Serie von XILINX, speziell mit dem 
Artix-7?
Hat schon jemand Erfahrungen mit dem Demo-Board des Artix-7?
Grüße
Bodo

Article: 155098
Subject: Re: XILINX Artix-7
From: muzaffer.kal@gmail.com
Date: Sat, 13 Apr 2013 10:52:18 -0700 (PDT)
Links: << >> << T >> << A >>

On Friday, April 12, 2013 11:27:56 PM UTC-7, Bodo wrote:
> Hallo,
> hat jemand Erfahrungen mit der neuen 7-er Serie von XILINX, speziell mit dem 
> Artix-7?
> Hat schon jemand Erfahrungen mit dem Demo-Board des Artix-7?

I have a Kintex-7 and a Zynq board with which I am working now. No Artix-7 yet but the fabric in them are not that different from the Kintex-7 fabric. Kintex-7 board is very similar to other dev boards and Zynq is a delight to work with.

Article: 155099
Subject: Re: Ray Andraka's Book?
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Sun, 14 Apr 2013 18:14:23 +0100
Links: << >> << T >> << A >>

jonesandy@comcast.net wrote:
> If I had a nickel for every time I've heard throughout my career about this or that technology no longer being relevant...
>
> Technology is like fashion: whatever is old will be new again someday, with a new spin and a new relevance. Don't throw it away; just keep it in the back of your closet, and you will be able to use it again. And for those that missed it the first time around, the second-hand stores are always full of these still-useful articles from bygone times.
>
> I remember my college digital design coursework included implementing boolean logic functions with multiplexers and decoders. Then PALs came along and changed that to sum-of-products. Then FPGAs came along and changed it back (pre-HDL). Then HDL came along and changed it again.
>
> The Cordic algorithms were not new when FPGAs came along. They were dusted off from the ancient spells of the priests of the order of multiplierless microprocessors and "pieces of eight". And those priests were probably taught their craft by the wizards of relays and vacuum tubes.

Yes indeed; CORDIC was old when I used it in 1976 on 6800s.

The earliest papers I have date from
   1962: J E Meggit, Pseudo division and pseudo multiplication processes, IBM Journal April 1962
   1959: Jack E Volder, The CORDIC trigonometric computing technique, IRE Trans Electron Comput ec-8:330-334

Neither reference anything from the time when "computer" was a job title.

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search