100 Languages Speedrun: Episode 77: JVM Assembly with Jasmin
Java Virtual Machine (JVM) works in a weird way. First there's some source code in Java or Kotlin or whatnot. It gets compiled into JVM Assembly as intermediate form. Then that gets compiled into actual executable machine code.
In a way that JVM Assembly level is not necessary, and for example Android uses a different flow, and not even the same one between versions.
Anyway, let's see how the classic JVM Assembly for a regular JVM looks like. JVM doesn't include tools for human-readable assembly, so for that I'll be using Jasmin. There are a few other JVM assembly programs with slight differences, but none of the differences matter for our simple use case.
Hello, World!
It's going to be helpful if you know some basics about JVM - either Java or one of the other JVM languages. But if not, I'll still try to explain everything step by step.
Let's start with Hello, World! Here's Hello.j
:
.class public Hello
.super java/lang/Object
.method public static main([Ljava/lang/String;)V
.limit locals 1
.limit stack 2
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "Hello, World!"
invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
return
.end method
We can compile it with jasmin
into Hello.class
:
$ jasmin Hello.j
Generated: Hello.class
And then run by passing the class name to java
:
$ java Hello
Hello, World!
In case it's confusing, java
command is not about Java the language, it's about Java Virtual Machine only. javac
command deals with Java the language.
So what's going on, let's first look at structure of the Hello.j
file:
- code needs to be in a class, and class generally should match the file name - for
Hello.j
we start by defining.class public Hello
- every class has a superclass, and the default one is
Object
, internally known asjava/lang/Object
- we then define a method
main
- this is what will run when you run class from command line..method public static methodname ... .end method
is a method definition, inside the method are various instructions that will run when the method runs - if there were multiple methods, we'd have multiple
.method ... .end method
blocks - declaring a method
.method public static
means it's a class method, and is not bound to any specific instance
Name Mangling
So far that makes sense. The first question you might have is what the hell are those names:
main([Ljava/lang/String;)V
Ljava/io/PrintStream;
java/io/PrintStream/println(Ljava/lang/String;)V
That's the "name mangling". In Java and a few other languages, you can have multiple functions or methods with the same name, as long as their types are different.
I'm not sure what's the easiest way to figure out the "mangled" name.
For example main([Ljava/lang/String;)V
means int main(java.lang.String[]);
:
- name
main
goes first - then
(
starts argument list [
means array ofLjava/lang/String;
means object of typejava.lang.String
- the semicolon is there to show where the name finishes)
closes arguments listV
after it means it returns anvoid
, that is nothing
Name mangling depends on the language we use. Other JVM languages use Java-compatible mangling for the kinds of things Java supports, but have to come up with their own name mangling schemes for any extra features they need. Inside JVM these mangled names are all just flat strings, but Jasmin still cares about them to setup everything properly, so we need to follow Java name mangling rules here.
Inside the Hello, World! method
OK, let's look inside the method.
First we start with some .limit
definitions.
.limit locals 1
.limit stack 2
These specify how many local variables the function has, and how much stack space it needs at most. JVM is a stack machine, so most the instructions push thing on stack, or pop them off the stack to perform various operations.
I'm reasonably sure Jasmin could do this calculation automatically for us - because JVM sure does and will refuse to load a class if you specify numbers that are too low.
Now we need to push some values on stack:
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "Hello, World!"
getstatic
gets a static value of certain type - it this case java.lang.System.out
of type java.io.PrintStream
- and pushes reference to it on top of the stack.
Then ldc "Hello, World!"
gets a constant value and pushes reference to it on top of the stack.
Then we call a method:
invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
invokevirtual
calls a virtual
method - that is one that can potentially be overloaded by a subclass. We need to specify full mangled name of the method we're passing. JVM knows how many arguments from top of the stack it will use based on its type signature.
Finally:
return
returns from the method.
For a minor technical point, all class, method, and type names, constant strings, and so on, are stored in a "constant pool" not in the bytecode. The bytecode actually refers to "constant #7 from the constant pool" or such. But it would be really tedious to write this way, so Jasmin does at least that much for us.
Loop
Let's do something slightly more complicated, a method that loops some number of times and prints numbers as it does so.
Here's a loop printing values 1 to 10:
.class public Loop
.super java/lang/Object
.method public static main([Ljava/lang/String;)V
.limit locals 1
.limit stack 2
iconst_1 ; push value 1 on stack
istore 0 ; save that to local variable #0
loop:
iload 0 ; push local #0 to stack
bipush 10 ; push byte value 10 on stack
if_icmpgt end_loop ; if local #0 > 10, goto end_loop
getstatic java/lang/System/out Ljava/io/PrintStream; ; push System.out on stack
iload 0 ; load local #0
invokevirtual java/io/PrintStream/println(I)V ; print it
iinc 0 1 ; increase local variable #0 by 1
goto loop
end_loop:
return
.end method
Let give it a go:
$ jasmin Loop.j
Generated: Loop.class
$ java Loop
1
2
3
4
5
6
7
8
9
10
There's a few new opcodes:
iconst_1
- pushes1
on stack - a few numbers are so common they got their own opcodesbipush 10
- pushes10
on stack - 8bit bit bigger numbers are pushed bybipush
, there's alsosipush
for 16bit - for anything bigger, it's loaded from the constant poolistore 0
andiload 0
- store and load local variablesiinc 0 1
- increment local variable #0 by 1 - can be negative as well to decrementgoto label
- jump to labelif_icmpgt label
- go to label if top two values ale greater than one another (icmpgt
forInteger CoMPare Greater Than
)- notice that the method we're calling changed from
java/io/PrintStream/println(Ljava/lang/String;)V
(print a string) tojava/io/PrintStream/println(I)V
(print an int) - these are totally separate and unrelated methods as far as JVM is concerned; in every JVM language we'd just sayprintln(...)
and it would figure it out for us which of them we meant; but that needs to be disambiguated before JVM gets to it
FizzBuzz
We now have everything we needed to create FizzBuzz
.
.class public FizzBuzz
.super java/lang/Object
.method public static main([Ljava/lang/String;)V
.limit locals 1
.limit stack 2
iconst_0 ; push value 0 on stack
istore 0 ; save that to local variable #0
loop:
iinc 0 1 ; increase local variable #0 by 1
iload 0 ; push local #0 to stack
bipush 100 ; push byte value 10 on stack
if_icmpgt end_loop ; if local #0 > 100, goto end_loop
iload 0
bipush 15
irem
ifeq fizzbuzz
iload 0
iconst_5
irem
ifeq buzz
iload 0
iconst_3
irem
ifeq fizz
print_number:
getstatic java/lang/System/out Ljava/io/PrintStream; ; push System.out on stack
iload 0 ; load local #0
invokevirtual java/io/PrintStream/println(I)V ; print it
goto loop
fizz:
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "Fizz"
invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
goto loop
buzz:
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "Buzz"
invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
goto loop
fizzbuzz:
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "FizzBuzz"
invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
goto loop
end_loop:
return
.end method
The only new operations we'll need are:
iconst_X
opcodes go up to 5 so we can use optimizediconst_3
andiconst_5
- but for bigger numbers we needbipush 15
irem
is%
operationifeq
and otherifXX
opcodes compare with0
,if_icmpXX
opcodes compare two integer values
Fibonacci
Let's do the next usual thing, and define Fibonacci, calculated recursively with an equivalent of a public static int fib(int n)
function.
.class public Fib
.super java/lang/Object
.method public static fib(I)I
.limit stack 3
iload_0
iconst_2
if_icmple small_fib
big_fib:
iload_0
iconst_1
isub
invokestatic Fib/fib(I)I ; push fib(i-1) to stack
iload_0
iconst_2
isub
invokestatic Fib/fib(I)I ; push fib(i-2) to stack
iadd
ireturn ; return fib(i-1) + fib(i-2)
small_fib:
iconst_1
ireturn ; return 1
.end method
.method public static main([Ljava/lang/String;)V
.limit locals 1
.limit stack 2
iconst_1 ; push value 1 on stack
istore 0 ; save that to local variable #0
loop:
iload 0 ; push local #0 to stack
bipush 30 ; push byte value 10 on stack
if_icmpgt end_loop ; if local #0 > 10, goto end_loop
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "fib("
invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V ; print "fib("
getstatic java/lang/System/out Ljava/io/PrintStream;
iload 0 ; load local #0
invokevirtual java/io/PrintStream/print(I)V ; print i
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc ")="
invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V ; print ")="
getstatic java/lang/System/out Ljava/io/PrintStream; ; push System.out on stack
iload 0 ; load local #0
invokestatic Fib/fib(I)I
invokevirtual java/io/PrintStream/println(I)V ; print fib(i)
iinc 0 1 ; increase local variable #0 by 1
goto loop
end_loop:
return
.end method
$ jasmin Fib.j
Generated: Fib.class
$ java Fib
fib(1)=1
fib(2)=1
fib(3)=2
fib(4)=3
fib(5)=5
fib(6)=8
fib(7)=13
fib(8)=21
fib(9)=34
fib(10)=55
fib(11)=89
fib(12)=144
fib(13)=233
fib(14)=377
fib(15)=610
fib(16)=987
fib(17)=1597
fib(18)=2584
fib(19)=4181
fib(20)=6765
fib(21)=10946
fib(22)=17711
fib(23)=28657
fib(24)=46368
fib(25)=75025
fib(26)=121393
fib(27)=196418
fib(28)=317811
fib(29)=514229
fib(30)=832040
Let's go through it step by step:
- the main function has a loop, with various
print(String)
,print(int)
, andprintln(int)
calls in it invokestatic Fib/fib(I)I
invokes static functionint fib(int)
in classFib
- the one we're currently in- inside
fib
we do recursive calls toinvokestatic Fib/fib(I)I
iload_0
pushes method's first argument to stack (arguments just become local variables, so they share same numbers)iadd
andisub
do integer addition and subtractionireturn
returns an integer
Java Disassembler javap
A popular related package is Java Disassembler javap
which can turn .class
file back into JVM assembly. Unfortunately javap
and jasmin
don't really agree on the details much:
$ javap -c Fib.class
Compiled from "Fib.j"
public class Fib {
public static int fib(int);
Code:
0: iload_0
1: iconst_2
2: if_icmple 19
5: iload_0
6: iconst_1
7: isub
8: invokestatic #21 // Method fib:(I)I
11: iload_0
12: iconst_2
13: isub
14: invokestatic #21 // Method fib:(I)I
17: iadd
18: ireturn
19: iconst_1
20: ireturn
public static void main(java.lang.String[]);
Code:
0: iconst_1
1: istore 0
3: iload 0
5: bipush 30
7: if_icmpgt 54
10: getstatic #15 // Field java/lang/System.out:Ljava/io/PrintStream;
13: ldc #32 // String fib(
15: invokevirtual #24 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
18: getstatic #15 // Field java/lang/System.out:Ljava/io/PrintStream;
21: iload 0
23: invokevirtual #29 // Method java/io/PrintStream.print:(I)V
26: getstatic #15 // Field java/lang/System.out:Ljava/io/PrintStream;
29: ldc #14 // String )=
31: invokevirtual #24 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
34: getstatic #15 // Field java/lang/System.out:Ljava/io/PrintStream;
37: iload 0
39: invokestatic #21 // Method fib:(I)I
42: invokevirtual #9 // Method java/io/PrintStream.println:(I)V
45: iinc_w 0, 1
51: goto 3
54: return
}
As you can see, javap
uses demangled names, it has explicit references to constant pool by number, and some of the opcodes are different (like iinc_w 0, 1
vs iinc 0 1
).
Person Class
It would make no sense to end this episode without defining a small class. For this let's just define Person
with two string fields (name
, surname
), and one toString
method. We'll also have static main
to test it.
I put comments inside the code. For non-static methods this
is passed as extra first argument, so local variables from JVM point of view might look like this:
- local 0 -
this
- local 1 - first argument
- local 2 - second argument
- local 3 - first local variable
- local 4 - second local variable
.class public Person
.super java/lang/Object
.field public name Ljava/lang/String;
.field public surname Ljava/lang/String;
.method public <init>(Ljava/lang/String;Ljava/lang/String;)V
.limit locals 4
.limit stack 4
; local 0 - this
; local 1 - argument name
; local 2 - argument surname
; call super this.<init>();
aload_0
invokespecial java/lang/Object/<init>()V
; this.name = argument_name
aload_0
aload_1
putfield Person/name Ljava/lang/String;
; this.surname = argument_surname
aload_0
aload_2
putfield Person/surname Ljava/lang/String;
return
.end method
.method public toString()Ljava/lang/String;
.limit locals 4
.limit stack 4
; local 0 - this
; push this.name
aload_0
getfield Person/name Ljava/lang/String;
; push " "
ldc " "
; call String.concat, getting: this.name + " "
invokevirtual java/lang/String/concat(Ljava/lang/String;)Ljava/lang/String;
; push this.surname
aload_0
getfield Person/surname Ljava/lang/String;
; call String.concat, getting: this.name + " " + this.surname
invokevirtual java/lang/String/concat(Ljava/lang/String;)Ljava/lang/String;
areturn
.end method
.method public static main([Ljava/lang/String;)V
.limit locals 4
.limit stack 4
; local Person a = new Person("Alice", "Smith")
new Person
dup
ldc "Alice"
ldc "Smith"
invokespecial Person/<init>(Ljava/lang/String;Ljava/lang/String;)V
astore_1
getstatic java/lang/System/out Ljava/io/PrintStream;
; push a.toString()
aload_1
invokevirtual Person/toString()Ljava/lang/String;
invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
return
.end method
$ jasmin Person.j
Generated: Person.class
$ java Person
Alice Smith
Should you use JVM Assembly?
It's meant for human use even less than regular assembly, so definitely not.
There's an additional problem that unlike regular assembly or LLVM assembly where there's some fully supported standard format, Jasmin is a third party program and different JVM assemblers and disassemblers disagree on so many things. There are also some newer assemblers and disassemblers like Krakatau you could try instead. Krakatau has different syntax than Jasmin or javap.
It could be helpful to have a general idea how it works if you're developing a new language for the JVM, but that's about it.
Another way to familiarize yourself with JVM assembly is with GodBolt compiler site, but that just compiles your language (Java, Kotlin etc.) and runs javap
on the output, so you can do it locally too.
Code
All code examples for the series will be in this repository.
Code for the JVM Assembly with Jasmin episode is available here.