4 releases (2 breaking)

new 0.3.0 May 3, 2021
0.2.0 Mar 15, 2021
0.1.1 Feb 20, 2021
0.1.0 Feb 12, 2021

#125 in Parser implementations

35 downloads per month

MIT license

85KB
1.5K SLoC

Rust 1.5K SLoC // 0.0% comments Java 188 SLoC // 0.1% comments

Jaded - Java Deserialization for Rust

Java has a much maligned (for good reason) serialization system built into the standard library. The output is a binary stream mapping the full object hierarchy and the relations between them.

The stream also includes definitions of classes and their hierarchies (super classes etc). The full specification is defined here.

In any new application there are probably better ways to serialize data with fewer security risks but there are cases where a legacy application is writing stuff out and we want to read it in again. If we want to read it in a separate application it'd be good if we weren't bound to Java.

Example

In Java

import java.io.FileOutputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
public class Demo implements Serializable {
    private static final long serialVersionUID = 1L;
    private String message;
    private int i;
    public Demo(String message, int count) {
        this.message = message;
        this.i = count;
    }
    public static void main(String[] args) throws Exception {
        Demo d = new Demo("helloWorld", 42);
        try (FileOutputStream fos = new FileOutputStream("demo.obj", false);
                ObjectOutputStream oos = new ObjectOutputStream(fos);) {
            oos.writeObject(d);
        }
    }
}

From Rust

use std::fs::File;
use jaded::{Parser, Result};

fn main() -> Result<()> {
    let sample = File::open("demo.obj").expect("File missing");
    let mut parser = Parser::new(sample)?;
    println!("Read Object: {:#?}", parser.read()?);
    Ok(())
}

Output from Rust

Read Object: Object(
    Object(
        ObjectData {
            class: "Demo",
            fields: {
                "i": Primitive(
                    Int(
                        42,
                    ),
                ),
                "message": JavaString(
                    "helloWorld",
                ),
            },
            annotations: [],
        },
    ),
)

Conversion to Rust types

For most uses cases, the raw object representation is not very ergonomic to work with. For ease of use, types can implement FromJava, and can then be read directly from the stream.

In the majority of cases this implementation can be automatically derived by enabling the derive feature.

#[derive(Debug, FromJava)]
struct Demo {
    message: String,
    i: i32,
}

Demo objects can then be read directly by the parser

fn main() -> Result<()> {
    let sample = File::open("demo.obj").expect("File missing");
    let mut parser = Parser::new(sample)?;
    let demo: Demo = parser.read_as()?;
    println!("Read Object: {:#?}", demo);
    Ok(())
}

Output from rust

Read Object: Demo {
    message: "helloWorld",
    i: 42,
}

Objects with custom writeObject methods

Often classes, including many in the standard library, customise the way they are written using a writeObject method that complements the builtin serialization methods for fields. This data is written as an embedded stream of bytes and/or objects. These cannot be associated with fields without the original Java source so are included in the Annotations field of the ObjectData struct (empty in the example above).

As this stream often contains important data from the class, a mechanism is provided to read useful data from it using an interface similar to the ObjectInputStream that would be used in the Java class itself.

An example of custom serialization in Java is the ArrayList. The source for its writeObject methods can be seen here but the gist is that it writes the number of elements it contains, then writes each element in turn.

Because the embedded custom stream could contain anything we have to manually implement the methods to read from it but these can then be used by the derived implementation of FromJava:

In Java

import java.util.List;
import java.util.ArrayList;
import java.io.FileOutputStream;
import java.io.ObjectOutputStream;
public class Demo {
    public static void main(String[] args) throws Exception {
		List<String> keys = new ArrayList<>();
		keys.add("one");
		keys.add("two");
		keys.add("three");
        try (FileOutputStream fos = new FileOutputStream("demo.obj", false);
                ObjectOutputStream oos = new ObjectOutputStream(fos);) {
            oos.writeObject(keys);
        }
    }
}

In rust

use std::fs::File;
use jaded::{Parser, Result, FromJava, FromJava, AnnotationIter, ConversionResult};

#[derive(Debug, FromJava)]
struct ArrayList<T: FromJava> {
    // Size is written as a 'normal' field
    size: i32,
    // values are written to the custom stream so need attributes
    #[jaded(extract(read_values))]
    values: Vec<T>,
}

// extraction method must be callable as
//     function(&mut AnnotationIter) -> ConversionResult<T>
// Where T is the type of the field being assigned to.
fn read_values<T>(annotations: &mut AnnotationIter) -> ConversionResult<Vec<T>>
where
    T: FromJava
{
    (0..annotations.read_i32()?)
        .into_iter()
        .map(|_| annotations.read_object_as())
        .collect()
}


fn main() -> Result<()> {
    let sample = File::open("demo.obj").expect("File missing");
    let mut parser = Parser::new(sample)?;
    let array: ArrayList<String> = parser.read_as()?;
    println!("{:#?}", array);
    Ok(())
}

This gives the array list as expected

ArrayList {
    size: 3,
    values: [
        "one",
        "two",
        "three",
    ],
}

FromJava is implemented for Option<T> and Box<T> so that recursive structs can be deserialized and null fields in the serialized class can be handled. Note that if a field is null the coversion will fail unless that field is given as Option<T>. The example above would have failed if there was a null string in the serialized String. Changing values to be Vec<Option<T>> would allow it to still be read.

Renaming fields

In Java conventions, field names use camelCase whereas Rust field names use snake case. By default, the derive macro looks for a field named the same as the mapped field in Rust so to prevent Rust structs needing to use camelCase, fields can be given attributes to use a different field in the Java class.

#[derive(FromJava)]
struct Demo {
    #[jaded(field = "fooBar")]
    foo_bar: String,
}

If all fields are to be renamed, the renaming feature can be enabled and the struct can be given a 'rename' attribute. This will convert all field names to camelCase before reading them from Java. Individual fields can still be overridden if required. This feature adds an additional dependency on convert_case so is not enabled by default.

#[derive(FromJava)]
#[jaded(rename)]
struct Demo {
    foo_bar: String,
}

Features

derive

Allow FromJava to be derived automatically

renaming

Include the jaded_derive renaming feature to allow automatic conversion from camelCase to snake_case.

Limitations

Java Polymorphism

In Java, a field can be declared as an interface and the concrete implementation can be anything. This means that in Rust we can't reliably convert read objects to structs unless we know that a stream is going to be using a specific implementation. In future it is hoped that enums may allow us to specify multiple types that could be present but even in those cases, it would still only be a limited number. eg

enum List<T> {
    ArrayList(Vec<T>),
    LinkedList(Vec<T>),
    Vector(Vec<T>),
}

would cover most of the common cases but there is nothing stopping some client code creating a CustomList class that had a completely different serialized representation and using that in the class that is being read in Rust.

Ambiguous serialization

Unfortunately, there are also limits to what we can do without the original code that created the serial byte stream. The protocol linked above lists four types of object. One of which, classes that implement java.lang.Externalizable and use PROTOCOL_VERSION_1 (not been the default since v1.2), are not readable by anything other than the class that wrote them as their data is nothing more than a stream of bytes.

Of the remaining three types we can only reliably deserialize two.

  • 'Normal' classes that implement java.lang.Serializable without having a writeObject method

    These can be read as shown above

  • Classes that implement Externalizable and use the newer PROTOCOL_VERSION_2

    These can be read, although their data is held fully by the annotations fields of the ObjectData struct and the get_field method only returns None.

  • Serializable classes that implement writeObject

    These objects are more difficult. The spec above suggests that they have their fields written as 'normal' classes and then have optional annotations written afterwards. In practice this is not the case and the fields are only written if the class calls defaultWriteObject as the first call in their writeObject method. This is mentioned as a requirement in the spec so we can assume that this is correct for classes in the standard library but it is something to be aware of if user classes are being deserialized.

A consequence of this is that once we have found a class that we can't read, it is difficult to get back on track as it requires picking out the marker signifying the start of the next object from the sea of custom data.

Future plans

  • Extend derive mechanism so that multiple types can be deserialized to an enum. This would let rust types mirror the common interfaces in Java and specify a field as (eg) a Collection which would have variants for List Set, Queue etc. This would go part way to solving the polymorhpism problem outlined above.
  • Add implementations of FromJava for common Java and Rust types so that for instance ArrayList and HashMap can be read to the equivalent Vec and HashMap types in Rust.
  • Possible tie in with Serde. I've not yet looked into how the serde data model works but this seems like it would be a useful way of accessing Java data.

State of development

Very much a work in progress at the moment. I am writing this for another application I am working on so I imagine there will be many changes in the functionality and API at least in the short term as the requirements become apparent. As things settle down I hope things will become more stable.

Contributions

As this project it is still very much in a pre-alpha state, I imagine things being quite unstable for a while. That said, if you notice anything obviously broken or have a feature that you think would be useful that I've missed entirely, do open issues. I'd avoid opening PRs until it's been discussed in an issue as the current repo state may lag behind development.

Dependencies

~355–790KB
~19K SLoC