100 Languages Speedrun: Episode 77: JVM Assembly with Jasmin

Java Virtual Machine (JVM) works in a weird way. First there's some source code in Java or Kotlin or whatnot. It gets compiled into JVM Assembly as intermediate form. Then that gets compiled into actual executable machine code.

In a way that JVM Assembly level is not necessary, and for example Android uses a different flow, and not even the same one between versions.

Anyway, let's see how the classic JVM Assembly for a regular JVM looks like. JVM doesn't include tools for human-readable assembly, so for that I'll be using Jasmin. There are a few other JVM assembly programs with slight differences, but none of the differences matter for our simple use case.

Hello, World!

It's going to be helpful if you know some basics about JVM - either Java or one of the other JVM languages. But if not, I'll still try to explain everything step by step.

Let's start with Hello, World! Here's Hello.j:

.class public Hello
.super java/lang/Object

.method public static main([Ljava/lang/String;)V
    .limit locals 1
    .limit stack 2

    getstatic java/lang/System/out Ljava/io/PrintStream;
  ldc "Hello, World!"
    invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V

    return

.end method

We can compile it with jasmin into Hello.class:

$ jasmin Hello.j
Generated: Hello.class

And then run by passing the class name to java:

$ java Hello
Hello, World!

In case it's confusing, java command is not about Java the language, it's about Java Virtual Machine only. javac command deals with Java the language.

So what's going on, let's first look at structure of the Hello.j file:

  • code needs to be in a class, and class generally should match the file name - for Hello.j we start by defining .class public Hello
  • every class has a superclass, and the default one is Object, internally known as java/lang/Object
  • we then define a method main - this is what will run when you run class from command line. .method public static methodname ... .end method is a method definition, inside the method are various instructions that will run when the method runs
  • if there were multiple methods, we'd have multiple .method ... .end method blocks
  • declaring a method .method public static means it's a class method, and is not bound to any specific instance

Name Mangling

So far that makes sense. The first question you might have is what the hell are those names:

  • main([Ljava/lang/String;)V
  • Ljava/io/PrintStream;
  • java/io/PrintStream/println(Ljava/lang/String;)V

That's the "name mangling". In Java and a few other languages, you can have multiple functions or methods with the same name, as long as their types are different.

I'm not sure what's the easiest way to figure out the "mangled" name.

For example main([Ljava/lang/String;)V means int main(java.lang.String[]);:

  • name main goes first
  • then ( starts argument list
  • [ means array of
  • Ljava/lang/String; means object of type java.lang.String - the semicolon is there to show where the name finishes
  • ) closes arguments list
  • V after it means it returns an void, that is nothing

Name mangling depends on the language we use. Other JVM languages use Java-compatible mangling for the kinds of things Java supports, but have to come up with their own name mangling schemes for any extra features they need. Inside JVM these mangled names are all just flat strings, but Jasmin still cares about them to setup everything properly, so we need to follow Java name mangling rules here.

Inside the Hello, World! method

OK, let's look inside the method.

First we start with some .limit definitions.

    .limit locals 1
    .limit stack 2

These specify how many local variables the function has, and how much stack space it needs at most. JVM is a stack machine, so most the instructions push thing on stack, or pop them off the stack to perform various operations.

I'm reasonably sure Jasmin could do this calculation automatically for us - because JVM sure does and will refuse to load a class if you specify numbers that are too low.

Now we need to push some values on stack:

    getstatic java/lang/System/out Ljava/io/PrintStream;
  ldc "Hello, World!"

getstatic gets a static value of certain type - it this case java.lang.System.out of type java.io.PrintStream - and pushes reference to it on top of the stack.

Then ldc "Hello, World!" gets a constant value and pushes reference to it on top of the stack.

Then we call a method:

    invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V

invokevirtual calls a virtual method - that is one that can potentially be overloaded by a subclass. We need to specify full mangled name of the method we're passing. JVM knows how many arguments from top of the stack it will use based on its type signature.

Finally:

return returns from the method.

For a minor technical point, all class, method, and type names, constant strings, and so on, are stored in a "constant pool" not in the bytecode. The bytecode actually refers to "constant #7 from the constant pool" or such. But it would be really tedious to write this way, so Jasmin does at least that much for us.

Loop

Let's do something slightly more complicated, a method that loops some number of times and prints numbers as it does so.

Here's a loop printing values 1 to 10:

.class public Loop
.super java/lang/Object

.method public static main([Ljava/lang/String;)V
    .limit locals 1
    .limit stack 2

  iconst_1 ; push value 1 on stack
    istore 0 ; save that to local variable #0

loop:

    iload 0   ; push local #0 to stack
    bipush 10 ; push byte value 10 on stack
    if_icmpgt end_loop ; if local #0 > 10, goto end_loop

    getstatic java/lang/System/out Ljava/io/PrintStream; ; push System.out on stack
  iload 0 ; load local #0
    invokevirtual java/io/PrintStream/println(I)V ; print it

    iinc 0 1 ; increase local variable #0 by 1

    goto loop

end_loop:

    return

.end method

Let give it a go:

$ jasmin Loop.j
Generated: Loop.class
$ java Loop
1
2
3
4
5
6
7
8
9
10

There's a few new opcodes:

  • iconst_1 - pushes 1 on stack - a few numbers are so common they got their own opcodes
  • bipush 10 - pushes 10 on stack - 8bit bit bigger numbers are pushed by bipush, there's also sipush for 16bit - for anything bigger, it's loaded from the constant pool
  • istore 0 and iload 0 - store and load local variables
  • iinc 0 1 - increment local variable #0 by 1 - can be negative as well to decrement
  • goto label - jump to label
  • if_icmpgt label - go to label if top two values ale greater than one another (icmpgt for Integer CoMPare Greater Than)
  • notice that the method we're calling changed from java/io/PrintStream/println(Ljava/lang/String;)V (print a string) to java/io/PrintStream/println(I)V (print an int) - these are totally separate and unrelated methods as far as JVM is concerned; in every JVM language we'd just say println(...) and it would figure it out for us which of them we meant; but that needs to be disambiguated before JVM gets to it

FizzBuzz

We now have everything we needed to create FizzBuzz.

.class public FizzBuzz
.super java/lang/Object

.method public static main([Ljava/lang/String;)V
    .limit locals 1
    .limit stack 2

  iconst_0 ; push value 0 on stack
    istore 0 ; save that to local variable #0

loop:

    iinc 0 1 ; increase local variable #0 by 1

    iload 0   ; push local #0 to stack
    bipush 100 ; push byte value 10 on stack
    if_icmpgt end_loop ; if local #0 > 100, goto end_loop

    iload 0
    bipush 15
    irem
    ifeq fizzbuzz

    iload 0
    iconst_5
    irem
    ifeq buzz

    iload 0
    iconst_3
    irem
    ifeq fizz

print_number:

    getstatic java/lang/System/out Ljava/io/PrintStream; ; push System.out on stack
  iload 0 ; load local #0
    invokevirtual java/io/PrintStream/println(I)V ; print it
    goto loop

fizz:

    getstatic java/lang/System/out Ljava/io/PrintStream;
  ldc "Fizz"
    invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
    goto loop

buzz:

    getstatic java/lang/System/out Ljava/io/PrintStream;
  ldc "Buzz"
    invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
    goto loop

fizzbuzz:

    getstatic java/lang/System/out Ljava/io/PrintStream;
  ldc "FizzBuzz"
    invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
    goto loop

end_loop:

    return

.end method

The only new operations we'll need are:

  • iconst_X opcodes go up to 5 so we can use optimized iconst_3 and iconst_5 - but for bigger numbers we need bipush 15
  • irem is % operation
  • ifeq and other ifXX opcodes compare with 0, if_icmpXX opcodes compare two integer values

Fibonacci

Let's do the next usual thing, and define Fibonacci, calculated recursively with an equivalent of a public static int fib(int n) function.

.class public Fib
.super java/lang/Object

.method public static fib(I)I
    .limit stack 3
  iload_0
    iconst_2
    if_icmple small_fib

big_fib:
    iload_0
    iconst_1
    isub
    invokestatic Fib/fib(I)I  ; push fib(i-1) to stack

    iload_0
    iconst_2
    isub
    invokestatic Fib/fib(I)I  ; push fib(i-2) to stack

    iadd
    ireturn ; return fib(i-1) + fib(i-2)

small_fib:
    iconst_1
    ireturn ; return 1
.end method

.method public static main([Ljava/lang/String;)V
    .limit locals 1
    .limit stack 2

  iconst_1 ; push value 1 on stack
    istore 0 ; save that to local variable #0

loop:

    iload 0   ; push local #0 to stack
    bipush 30 ; push byte value 10 on stack
    if_icmpgt end_loop ; if local #0 > 10, goto end_loop

    getstatic java/lang/System/out Ljava/io/PrintStream;
    ldc "fib("
    invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V ; print "fib("

    getstatic java/lang/System/out Ljava/io/PrintStream;
  iload 0 ; load local #0
    invokevirtual java/io/PrintStream/print(I)V ; print i

    getstatic java/lang/System/out Ljava/io/PrintStream;
    ldc ")="
    invokevirtual java/io/PrintStream/print(Ljava/lang/String;)V ; print ")="

    getstatic java/lang/System/out Ljava/io/PrintStream; ; push System.out on stack
  iload 0 ; load local #0
    invokestatic Fib/fib(I)I
    invokevirtual java/io/PrintStream/println(I)V ; print fib(i)

    iinc 0 1 ; increase local variable #0 by 1

    goto loop

end_loop:

    return

.end method
$ jasmin Fib.j
Generated: Fib.class
$ java Fib
fib(1)=1
fib(2)=1
fib(3)=2
fib(4)=3
fib(5)=5
fib(6)=8
fib(7)=13
fib(8)=21
fib(9)=34
fib(10)=55
fib(11)=89
fib(12)=144
fib(13)=233
fib(14)=377
fib(15)=610
fib(16)=987
fib(17)=1597
fib(18)=2584
fib(19)=4181
fib(20)=6765
fib(21)=10946
fib(22)=17711
fib(23)=28657
fib(24)=46368
fib(25)=75025
fib(26)=121393
fib(27)=196418
fib(28)=317811
fib(29)=514229
fib(30)=832040

Let's go through it step by step:

  • the main function has a loop, with various print(String), print(int), and println(int) calls in it
  • invokestatic Fib/fib(I)I invokes static function int fib(int) in class Fib - the one we're currently in
  • inside fib we do recursive calls to invokestatic Fib/fib(I)I
  • iload_0 pushes method's first argument to stack (arguments just become local variables, so they share same numbers)
  • iadd and isub do integer addition and subtraction
  • ireturn returns an integer

Java Disassembler javap

A popular related package is Java Disassembler javap which can turn .class file back into JVM assembly. Unfortunately javap and jasmin don't really agree on the details much:

$ javap -c Fib.class
Compiled from "Fib.j"
public class Fib {
  public static int fib(int);
    Code:
       0: iload_0
       1: iconst_2
       2: if_icmple     19
       5: iload_0
       6: iconst_1
       7: isub
       8: invokestatic  #21                 // Method fib:(I)I
      11: iload_0
      12: iconst_2
      13: isub
      14: invokestatic  #21                 // Method fib:(I)I
      17: iadd
      18: ireturn
      19: iconst_1
      20: ireturn

  public static void main(java.lang.String[]);
    Code:
       0: iconst_1
       1: istore        0
       3: iload         0
       5: bipush        30
       7: if_icmpgt     54
      10: getstatic     #15                 // Field java/lang/System.out:Ljava/io/PrintStream;
      13: ldc           #32                 // String fib(
      15: invokevirtual #24                 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
      18: getstatic     #15                 // Field java/lang/System.out:Ljava/io/PrintStream;
      21: iload         0
      23: invokevirtual #29                 // Method java/io/PrintStream.print:(I)V
      26: getstatic     #15                 // Field java/lang/System.out:Ljava/io/PrintStream;
      29: ldc           #14                 // String )=
      31: invokevirtual #24                 // Method java/io/PrintStream.print:(Ljava/lang/String;)V
      34: getstatic     #15                 // Field java/lang/System.out:Ljava/io/PrintStream;
      37: iload         0
      39: invokestatic  #21                 // Method fib:(I)I
      42: invokevirtual #9                  // Method java/io/PrintStream.println:(I)V
      45: iinc_w        0, 1
      51: goto          3
      54: return
}

As you can see, javap uses demangled names, it has explicit references to constant pool by number, and some of the opcodes are different (like iinc_w 0, 1 vs iinc 0 1).

Person Class

It would make no sense to end this episode without defining a small class. For this let's just define Person with two string fields (name, surname), and one toString method. We'll also have static main to test it.

I put comments inside the code. For non-static methods this is passed as extra first argument, so local variables from JVM point of view might look like this:

  • local 0 - this
  • local 1 - first argument
  • local 2 - second argument
  • local 3 - first local variable
  • local 4 - second local variable
.class public Person
.super java/lang/Object

.field public name Ljava/lang/String;
.field public surname Ljava/lang/String;

.method public <init>(Ljava/lang/String;Ljava/lang/String;)V
  .limit locals 4
  .limit stack 4
  ; local 0 - this
  ; local 1 - argument name
  ; local 2 - argument surname

  ; call super this.<init>();
  aload_0
  invokespecial java/lang/Object/<init>()V

  ; this.name = argument_name
  aload_0
  aload_1
  putfield Person/name Ljava/lang/String;

  ; this.surname = argument_surname
  aload_0
  aload_2
  putfield Person/surname Ljava/lang/String;

  return
.end method

.method public toString()Ljava/lang/String;
  .limit locals 4
  .limit stack 4
  ; local 0 - this

  ; push this.name
  aload_0
  getfield Person/name Ljava/lang/String;

  ; push " "
  ldc " "

  ; call String.concat, getting: this.name + " "
  invokevirtual java/lang/String/concat(Ljava/lang/String;)Ljava/lang/String;

  ; push this.surname
  aload_0
  getfield Person/surname Ljava/lang/String;

  ; call String.concat, getting: this.name + " " + this.surname
  invokevirtual java/lang/String/concat(Ljava/lang/String;)Ljava/lang/String;

  areturn
.end method

.method public static main([Ljava/lang/String;)V
    .limit locals 4
    .limit stack 4

  ; local Person a = new Person("Alice", "Smith")
  new Person
  dup
  ldc "Alice"
  ldc "Smith"
  invokespecial Person/<init>(Ljava/lang/String;Ljava/lang/String;)V
  astore_1

    getstatic java/lang/System/out Ljava/io/PrintStream;

  ; push a.toString()
  aload_1
  invokevirtual Person/toString()Ljava/lang/String;

    invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V

    return

.end method
$ jasmin Person.j
Generated: Person.class
$ java Person
Alice Smith

Should you use JVM Assembly?

It's meant for human use even less than regular assembly, so definitely not.

There's an additional problem that unlike regular assembly or LLVM assembly where there's some fully supported standard format, Jasmin is a third party program and different JVM assemblers and disassemblers disagree on so many things. There are also some newer assemblers and disassemblers like Krakatau you could try instead. Krakatau has different syntax than Jasmin or javap.

It could be helpful to have a general idea how it works if you're developing a new language for the JVM, but that's about it.

Another way to familiarize yourself with JVM assembly is with GodBolt compiler site, but that just compiles your language (Java, Kotlin etc.) and runs javap on the output, so you can do it locally too.

Code

All code examples for the series will be in this repository.

Code for the JVM Assembly with Jasmin episode is available here.