[maemo-developers] Java acceleration/Jazelle

From: Simon Pickering S.G.Pickering at bath.ac.uk
Date: Wed Aug 15 14:43:43 EEST 2007
Thank you for the links, these are things I've not seen before. 

> So let me dump the stuff I turned up so far:
> URL: <http://www.scratchpost.org/patches/jazelle-disassembly.png>
> Here you can see the size and alignment of the java instructions.
> (the entire document is 
> <http://www.arm.com/pdfs/DUI0066D_ADS1_2_AXD_armsd.pdf>)

Looking at the Memory Processor view in Jazelle state (fig 5-39 on page 5-33 of
the pdf), the left-hand column showing the Address of the bytecodes indicates
that bytecodes are byte-length (or variable length depending on their
arguments), not 32bit as we were thinking. This does assume that the Address
column is showing the address in terms of bytes and not some other unit, but I
think this is a fair assumption.

The same thing is seen in the disassembler shown in Fig 5-52 on page 5-41.
Section 6.5 on page 6-9 specifically states that Jazelle assembly instructions
are 8-bit. So we can conclude that they are byte aligned rather than word
aligned. I wonder why the word aligned code appeared to work? 

> <


> In Java state, the processor assigns several ARM registers to 
> functions specific to the Java machine (for example, R6 = 
> stack pointer, R0-R3 = top elements of stack, R4 = local 
> variable 0). This hardware reuse contributes to the small 
> size of the additional logic (12,000 gates) required to 
> implement the Java machine, and keeps all of the states 
> required by the Jazelle extension in ARM registers, In 
> addition, it ensures compatibility with existing operating 
> systems, interrupt handlers and exception code.
> Keeping the top four elements of the stack in ARM registers [...]. 
> The extension we've added divides Java byte codes into three 
> classes: directly executed, emulated and undefined. The 
> majority of the Java byte codes (138 on the ARM926EJ-S 
> microprocessor core) are executed directly in hardware; the 
> remainder are emulated by short sequences of highly optimized 
> ARM instructions. 
> --------------

So we now have the following register mappings:

Top elements of stack - probably R0, R1, R2, R3
logical variable 0 - might be R4
Pointer to exception table - ??
Pointer to Java stack - ??
Pointer to Java variables area - ??
Pointer to the constant pool - ??

Do my original R12 and R14 mappings mean anything I wonder (see last section of
this email), or were they just random names for the patent?

I suppose we could try testing some of these other register mappings by pushing
things to the stack and setting the value of local variable 0 and then looking
at the registers once the code returns from the BXJ call. This assumes that
these values are not altered when the exception occurs. I've looked at this in
passing and it doesn't seem to show anything (that I expected - see my previous
long email to see for yourselves) in the registers after the exception handler
has been run. This may be an effect of the ARM exception handler overwriting

Obviously we ought to be setting the pointer to the Jazelle exception table (if
we knew which register to put it in and what form it takes!), but do we also
need to setup things like the Java stack pointer, pointer to variables area and
constant pool pointer? Even if we don't need to actually initialise the data at
these addresses, do we need to allocate some memory and then provide pointers?

There's another interesting bit in this article:

"The key to making this approach work lies in a single new ARM instruction, "BXJ
Rm," for entering Java state. This instruction first performs a test on one of
the condition codes. If the condition is met, it then stores the current program
counter (PC), puts the processor into Java state, branches to the specified
target address and begins executing Java byte codes."

Performs a test on one of the condition codes.... Which one I wonder? Or is this
where a Java flag is checked (I'll have to take another look in the chip manual
pdf). Anyone have any thoughts? 

My understanding is that condition codes are N(egative), Z(ero), C(arried over)
and (o)V(erflow) and that the J bit, which is also in CPSR (and isn't a
condition code afaik), is set by the BXJ instruction, rather than needing to be
set before the BXJ instruction. In fact setting this bit is explicitly advised
against wherever it's mentioned. Therefore do we need to do a CMP before the BXJ
to get it to do something?

I created some test code for this:

I don't know whether the BXJ instruction requires the condition code suffix, but
it certainly compiles without complaint.

The output is:

1: x/i $pc  0x841c <main+108>:  bxjne   r0
(gdb) info registers
r0             0xbef68640       -1091140032
r1             0x8428   33832
r2             0x8428   33832
r3             0x8428   33832
r4             0x8428   33832
r5             0x8428   33832
r6             0x8428   33832
r7             0x8428   33832
r8             0x8428   33832
r9             0x8428   33832
r10            0x8428   33832
r11            0x8428   33832
r12            0x8428   33832
sp             0x8428   33832
lr             0x8428   33832
pc             0x841c   33820
fps            0x1001000        16781312
cpsr           0x20000010       536870928
(gdb) si

Program received signal SIGILL, Illegal instruction.
0xbef68640 in ?? ()
1: x/i $pc  0xbef68640: undefined instruction 0xffffff10
(gdb) info registers
r0             0xbef68640       -1091140032
r1             0x8428   33832
r2             0x8428   33832
r3             0x8428   33832
r4             0x8428   33832
r5             0x8428   33832
r6             0x8428   33832
r7             0x8428   33832
r8             0x8428   33832
r9             0x8428   33832
r10            0x8428   33832
r11            0x8428   33832
r12            0x8428   33832
sp             0x8428   33832
lr             0x8428   33832
pc             0xbef68640       -1091140032
fps            0x1001000        16781312
cpsr           0x20000010       536870928

Note that the BXJ instruction appears to have made the PC jump to the location
of the Java bytecodes, but it's tried interpreting them as ARM instructions. Is
this what's been happening all along? Do we need to set some bit to enable

I'm assuming that this isn't a case of the Jazelle hardware falling back to ARM
mode after trying to run in Jazelle mode (both because this is the first
bytecode instruction and because it should be handlable). 

I think it would be odd logic if we've not switched to Jazelle mode (because of
some condition flag or other) and therefore performed a standard BX (jump), but
this may be the case. Any ideas?

Might be worth looking at the presence (and accessibility) of Jazelle enable
bits again.

> <http://www.elecdesign.com/Articles/Index.cfm?ArticleID=4841&pg=2>:

(Best try this link which shows all the document on one page:

"Consequently, calling the Java mode is exactly like calling a subroutine. The
return (from subroutine) is fairly straightforward. There are a number of unused
Java byte codes. All of the unused byte codes are handled as exceptions. One of
the unused byte codes is used as the means to return to the calling program.
Whenever this byte code is encountered, the hardware takes an exception because
it's an undefined byte code. The exception handler recognizes that byte code as
a "return me to the calling program" instruction, and it will do that."

This confirms Scott's idea. From the wording it looks like it's up to the
handler software to actually perform the return operation, rather than the
Jazelle hardware doing it itself (we still don't know what form the handler
takes, nor where to pass its location, etc.).

And a confirmation that BXJ is a conditional instruction:

"BXJ is a conditional instruction. If a condition is false, nothing will happen.
If a condition is true-which could be a zero condition, carry condition, or
whatever-the branch will be taken. Before the branch is taken, the current
program counter (PC) is stored and the J bit is set. Engineers can save three
program steps when the program enters the Java state because the BXJ instruction
performs three operations. First, it checks the condition. If the condition is
true, it will store it in the PC and load a new PC. Then, it sets the Java state
and takes a branch."

Interesting wording here: "If the condition is true, it will store it in the PC
and load a new PC." Does this mean the condition needs to return the address of
the Java bytecodes? Any ideas as to how to do this?

I found another little titbit here:
http://www.itee.uq.edu.au/~esg/about/public/arm-intro.ppt on page 12, in the
notes at the bottom of the page it says: "In Jazelle state, the processor
doesn't perform 8-bit fetches from memory.  Instead it does aligned 32-bit
fetches (4-byte prefetching) which is more efficient.  Note we don't mention the
PC in Jazelle state because the 'Jazelle PC' is actually stored in r14 - this is
technical detail that is not relevant as it is completely hidden by the Jazelle
support code."

So we know(?) R14 is the bytecode PC.

Sorry for the length and slightly random arrangement, but I also wanted to write
it down while it was fresh in my mind,



More information about the maemo-developers mailing list