For this project, you will be completing the implementation of the Hawkbeans Java Virtual Machine. Namely, you will be completing this project in four parts by implementing components of the JVM that deal with:
The goals of this project are to:
While working on Hawkbeans, you will no doubt notice many similarities in the code structure to the emulator you previously completed. You should leverage the wisdom you gained from that project when making design decisions for this one. You will also identify some major differences; some coding tasks here are more open-ended than others, so you will need to think creatively.
Keep in mind that this project is still in its experimental stages. If we encounter bugs or hiccups (likely!), we will work through them together.
Hawkbeans is a small, fairly limited JVM written by your instructor. At this point, simple Java programs will run, but anything involving significant Java runtime support or double precision floating point math will likely not work. This is partly by design; this JVM will eventually be ported to the Hawknest 6502v emulator, which you previously completed. Below is a list of features not currently supported by this JVM:
You will first need to download the Hawkbeans skeleton code. As in the first project, you'll really want to be working on a Linux box or VM. First clone the skeleton code:
This will fetch the code from a public git repo.
$> git clone https://github.com/khale/hawkbeans-skeleton.git
make. This builds the actual JVM. You will also want to build the Java standard libraries which Hawkbeans will use for runtime support. These standard libraries have been borrowed from the TinyVM project:
This will create two new directories,
$> make jlibs
java, both of which contain several class files. You shouldn't worry about how these are used for now.
The skeleton code you're given will compile and run, but will not successfully run a Java program yet. Your job is to get it to that point. In order to do that, you'll need to implement some major missing functionality.
To see how to use Hawkbeans, you can run it with the
option, or without any arguments. In general, you should invoke it like
This is exactly the same way you run Java programs using the
$> bin/hawkbeans-clang-debug SomeClassFile
javacommand. There are a few options that are worth mentioning in more detail. The
-Hoption will allow you to set the JVM's heap size manually. The default heap size is 1MB, which is pretty small. You may need to change this when you're testing your garbage collector later on. The
-toption allows you to see stats gathered by the garbage collector, i.e. how many objects were collected, how much heap space was recovered, how long GC took etc. By default, the GC will collect garbage every 20ms. You can change this manually by providing a number in ms using the
-ioption performs like it did with Hawknest, see the Debugging section for more details.
Java class files are laid out in a well-structured format which makes them relatively easy to parse. A major component of any JVM is a linker/loader component which loads these class files into memory, parses them, and creates internal representations of the data they encode. Every reference to a constant value, a class, or a method in a Java program is encoded in the class file symbolically by the Java compiler. Essentially this means that each such reference consists of a pointer to a descriptor which gives enough information on how to resolve it. As an example, a method reference will simply be encoded as a number which is to be interpreted as an offset into the constant pool. A look up to that constant pool entry will discover a pointer to a descriptor that has more such constant pool offsets. One will be an offset to a String descriptor which contains the bytes for the UTF-8 encoded method name, and the other will be an offset to another String descriptor which describes the method's parameters and return value using a character-based encoding. One thing to note is that this is an analog to how dynamic linking is done for traditional compiled programs.
two approaches for resolving these symbolic references. We can either resolve them
eagerly when we load and parse the class, or we can resolve them lazily,
i.e. not until they're used. The former method will require more work up
front, and has the drawback that classes that aren't actually used may be
loaded into memory. The latter is our technique of choice. That is, we will
resolve these references the first time they're actually used. This means
that the various instructions that rely on symbolic references (e.g.
getfield, etc.) will not work correctly
until symbolic resolution is complete.
Your first task is to complete method and class resolution in Hawkbeans.
This will involve filling in the implementations for two functions in
latter relies on the former, so start with
hb_resolve_class. You'll want to use the Oracle JVM docs
resolution and method
resolution as references when writing your implementation.
In general, when a constant pool entry is resolved, the implementer of
the JVM is given flexibility on what this means in practice. For our
purposes, the pointer to the
constant info structure (whether it be for a class, a method, a field,
etc.) is replaced with an internal pointer to a structure representing
the appropriate entry. To indicate that the entry is resolved, we rely
on some insider information about virtual addresses and set the
highest bit of the pointer's address to 1 (sometimes this is called a tagged
pointer. For example, for classes, this means that
CONST_Class_info_t* is replaced with
java_class_t*. For methods, it means
CONST_Methodref_info_t* is replaced with
method_info_t* (both structs are defined in
This function is used by other parts of the JVM to resolve symbolic
references to classes, for example to access their fields or methods. It
relies on an existing class loader (there is currently only one, in
jvm/arch/x64-linux/bootstrap_loader.c). The goal here is
to transform a constant pool entry (a pointer to a descriptor) into
a pointer to our internal representation of classes (see the definition
Note that this function takes two arguments, an index into the constant pool and a pointer to the source class. This pointer should correspond to the currently used class, i.e. the one that was specified at the command line. Your code should do the following:
CONSTANT_Class_info_tstruct, which is defined in
MARK_RESOLVED()macros). If it's already resolved, you should mask off the high-order bit and return the class pointer directly.
hb_get_const_str()function. You can lookup in the classmap using
hb_init_class(), all defined in
Method resolution is slightly more subtle, as what we are doing here is
implementing the machinery to support polymorphism! We cannot simply
resolve a method by just its name. We also have to consider on which
Object it was invoked and its method signature, i.e. the number
and types of its arguments. This is why a descriptor for a method
CONSTANT_Methodref_info_t) contains two constant pool
offsets, one for the name of the method, and another for its signature.
The general flow of this function is that we should first look and see
which class the method in the method descriptor (determined by its
class_idx) belongs to. This class may or may not need
to be resolved, hence the order of implementation. Once it is resolved,
we iterate through each method descriptor of that class and try to
match it to the descriptor given to us (namely, do both the method name
and method parameter descriptor match?). If so, we're done, and we
set the constant pool entry to resolved and return the
method_info_t*. Otherwise, we cannot just fail. We now must
search up the class hierarchy to see if the method is implemented by
this class's superclasses. This should give you a hint about
the algorithmic technique you choose.
Note that this method adds an extra argument, the
target_cls argument. This should help with
the above hint. The initial call to
hb_resolve_method() should provide
this argument. If it is not, it means we've failed the initial search
on the method's attached class, and we're now trying to trace back
through its superclasses.
Note, you can follow the instructions here, but you don't need to worry about implementing access permissions or exceptions. You really only need to implement the steps outlined in point (2).
A significant component of any JVM (or any other VM architecture which implements a virtual ISA for that matter) is an interpreter, in this case a Java bytecode interpreter. The purpose of this component of the JVM is to decode instructions based on their opcode, and dispatch handler functions for each opcode. This should sound very familiar to you after writing your 6502 emulator. One major difference here is that your instruction handlers will need to interact much more closely with other parts of the JVM (namely, class loading, resolution, memory management, exception throwing, etc.). Another is that because the JVM is a stack machine, you do not have to deal with registers, making interpretation somewhat easier.
A significant portion of the bytecode interpreter is provided for you.
You can find this implementation in
src/bc_interp.c. Note that
interpretation is only done within the context of a method! That is,
there is no way to execute bytecode in this JVM without invoking a Java
method. This is a very important distinction from other types of virtual machines.
Your task here is to add some important functionality to the interpreter. Namely, you will complete the implementations of the following functions:
You should use the Java ISA reference to complete the implementation of these instructions. I will not give you much more information here, but some existing support functions in the Hawkbeans codebase that you will want to leverage include:
pop_val()functions. These are used to manipulate the current frame's operand stack.
hb_resolve_class()you will need for
gc_obj_allocyou will also need for
bc_interp.cyou can access the current stack frame using
cur_threadis a globally scoped pointer that refers to the currently executing thread (remember, there is only one at this point).
Note that in some cases you might need to look at the C structure
representing a Java object. This is the
include/class.h. You will need this at least
One pitfall to avoid here is to be very careful with the types used by
the elements in a
var_t (defined in
include/class.h). For example, you may have code like:
This is actually a bug. The result will actually end up being interpreted as
var_t x; x.int_val = 1; x.int_val = -x.int_val;
0xffffffffinstead of just
int_valelement of a
var_tis actually defined as an unsigned type (for various reasons dealing with class loading). Be careful of this when writing your integer arithmetic routines!
Some of the required actions for these instructions have similarities that might point towards using macros to avoid repeating code (see DRY).
No JVM is complete without exceptions. If you've ever had to write complex error handling code in C using gotos and labels, you will appreciate just how powerful the abstraction of software exceptions is. In this part, you will add exception support to Hawkbeans. There is a skeleton in place for you; you must just fill in the exception interface and the missing usage of it in the bytecode interpreter.
The interface to exceptions used by the rest of the JVM is defined
include/exceptions.h. The two main functions are:
hb_throw_exception(): Used to throw an exception given a reference to an existing Exception object (which ultimately derives from the
Throwableclass). This is the function used by the JVM when a programmer explicitly throws an exception (see
testcode/TestNullPointerThrow.java). This ultimately turns into the execution of the
athrowbytecode instruction (which you will also implement).
hb_throw_and_create_excp(): Used to throw an exception without an object reference. This is used when the JVM throws exceptions indirectly caused by the execution of some instruction. For example, if an access to an array element with an index past the length of an array is attempted, the JVM would invoke this method with the
EXCP_NULL_PTRargument (defined in
Your task here is to implement
src/exceptions.c, as well as the handler for the
athrow instruction in
I'm not going to give you too much instruction here, but the general
principle is that you need to scan the exception table for method
corresponding to the current frame and look for a matching exception
entry. If the entry is found, we invoke the handler for that entry. If
not, we pop a frame and continue searching. If we reach the base frame
and still have not found a matching exception table entry, it means that
no method explicitly provided a handler for the exception that was
thrown, so we must kill the current thread and exit. You can expect that
if the current code (which threw the exception) has a
clause somewhere, you should be finding its exception table within the
scope of that method.
Note, you'll want to look at and leverage these functions which already exist:
You'll notice that I've given you less help in terms of skeleton code here. This is intentional. Reading the documentation is your best bet here. You'll probably want to break down your implementation into several support functions that handle each subtask of exception throwing rather than trying to throw it all into one function.
With the current state of Hawkbeans, if you were to write a Java program
that created many objects or arrays, you would see an
OutOfMemoryException thrown pretty quickly. This is mostly
because the default heap size is so small. But also because in Java,
there's no way to explicitly
free() objects that we've
allocated. This is because the JVM spec requires that there be some
automatic memory management component present in the JVM, namely a garbage
The interesting thing is that the JVM spec does not give any prescriptions on the nature of the GC, only that it must be present. Therefore, we have significant flexibility when implementing our GC. For this part, I will describe a simple GC scheme which you will implement.
You will implement a precise garbage collector using the simple but
effective Mark-and-Sweep algorithm which we discussed in class. Recall
that a precise collector knows what a reference looks like
vs. some other primitive data type. If we don't have this information, GC
becomes a bit harder because we have to figure out which things are
references and which are not. For example, everything we're interested in
scanning for GC in Hawkbeans will be a
var_t. Our goal is to
figure out whether a given
var_t is a reference or just some
primitive type, e.g. an
int. It is not enough just to look at
the bits to tell the difference. One strategy is that we could add some
information to a
var_t which would tag the type. This takes up
more memory, however, and complicates the rest of the JVM. We will use a
Our GC will use a reference table to track live object references in the JVM. This is essentially a hashtable which uses the address of a reference as its key. When objects are allocated, and new references to them are created, those references will be inserted into this hashtable. The GC is invoked periodically by the bytecode interpreter after a predetermined amount of time has passed. When the GC is invoked, a set of root objects are scanned. The GC must look at every variable in each root object and check if it exists in the reference table. If it does, it will mark the object the reference refers to as live and will then scan that object for more references. This process continues until no more references are found. This is essentially the mark phase of the GC algorithm.
After all root objects have been traversed, the sweep phase will iterate through the reference table, and see which references were still around during the mark phase. Those which were not are now considered garbage and must be collected. The details are discussed below.
There is significantly more skeleton code for you to work with for this
part. You should start by taking a look at
src/gc.c. This is where the GC is initialized. The
important bits here that you should look at are the calls to
init_ref_tbl(). Each call
add_root() is registering a root object which the GC
shall scan during the mark phase. Each root is associated with
a callback function for scanning that particular root. Your job will
be to implement these scanning functions. The framework for them
is there, you just must fill them in.
Here are the important roots:
Note that the framework for invoking the scanning functions is already
in place (see
You must first finish the following functions:
clear_ref_tbl()function. That way reference that are not live will be in that state. When you scan your roots, you should set any references you find to the state
mark_ref()function. You can determine that a variable is indeed a reference using the
One programming idiom you will see here that you might not recognize is
the pattern involving opaque pointers and callback functions. The
scan_X functions are considered callbacks because we pass
them as function pointers to the
function at initialization time. When the GC is invoked, it will call
each of these functions in turn. This pattern makes it easy to add
other GC roots later on if we want. See how each of the functions
above each takes a
void * priv_data as an argument. This
is there so that when the callback function is invoked, it can re-derive
whatever state it needs from that pointer. This state is passed in
during callback registration (
add_root()). You can see
that each callback will be given a pointer to the GC's state, i.e.
You now must complete the sweep phase of the collector. This
means you must fill in the
This function must iterate through the reference table (see
an example of how to do this in
and collect all entries whose status is set to
GC_REF_ABSENT. To collect an object, you must first
free the reference table entry for it, then you must free the object
using the memory allocator's
Make sure to update the collectors stats (namely
after the object has been freed, you should
the reference itself.
This is a big enough codebase to work with that there's really no way to
make progress without testing. You should be testing your code
frequently, and even add your own tests. For example, for each instruction that you add in Part 4, you
should write a Java test program to make sure it works. This may be
difficult for Part 3 since there are a lot of moving parts. Your best bet
there is to add a lot of debugging prints (using
to get an idea of what the JVM is doing. To compile a new Java program,
you'll have to make sure you have the JDK 1.8 environment installed. Once
you do, you can compile a Java program by running:
$> javac -g MyProgram.java
-gflag here tells the Java compiler to add debugging information like line numbers and local variable names to the class binary. Don't forget that the class name in your program must match the filename.
You'll also probably want to get familiar with classfile dumps. Probably the most informative is:
This dumps out pretty much everything in the class file, including the constant pool items, class methods (including constructors and class initializers), and static and instance fields.
$> javap -v ClassFile
To enable debugging prints within the Hawkbeans JVM, you can pass environment
variables to the compiler just like you did with Hawknest, only you can do so
at a finer granularity here. If you'd like to see general debugging prints,
MODE=DEBUG when invoking
make. You can then set
the remaining flags depending on which component of the JVM you'd like to
debug. For example, if I'm working in my bytecode interpreter and I want
to enable debugging there, I'd set
DEBUG_INTERP to 1, then
in my code in
src/bc_interp.c I'd use the
BC_DEBUG() print macro to add debugging printouts specific
to that component. These
prints will only appear when debugging is enabled. You can take a look
see which subsystems have component-specific debugging prints.
I've added a line-oriented debugger much like you used in Hawknest
for Hawkbeans. This will allow you to step through a Java program,
set breakpoints, dump state etc. It isn't complete by any means, but
it should help you to find problems. Just like with Hawknest,
start Hawkbeans with the
-i option to drop immediately
into the debugger shell. Type
help at that shell to
see what commands you can invoke. Note that these are inspired by
both gdb and jdb, the Java debugger that ships with OpenJDK.
If you're having trouble, do not hesitate to get help. Your first bet is to come to office hours or post to the class Piazza. If you need to set up an appointment for help, I will do my best to accommodate you.
There are several extra credit opportunities that you may want to explore if you have time/are interested. These tasks can add up to 10 points on your project grade. If you'd like to pursue any of these please check with me and we can discuss. Potential extra credit opportunities include, but are not limited to:
You will hand in your code to me, as in the previous project, using our handin script. Your code will be graded based primarily on how well it executes a series of Java tests of my choosing. A small portion of your grade will be determined by the cleanliness, style, and documentation of your code. I will definitely penalize you if your code is poorly structured and not broken down cleanly using abstraction and modularity.
Make sure to commit often. Your commits should encapsulate some logical
component that you added to the codebase. For example, once I complete my
iadd, I would create a commit for that.
Make sure your commit messages are sane and describe succinctly the nature
of the additions.
Once you've finished, you can can hand in your code using our scripts. Only one person from each group should submit a handin. Make sure you're either directly connected to the IIT network or you are using a VPN before running this:
$> make handin