Wednesday, August 11, 2010

Serializing compiled Jython code objects (works on Google AppsEngine for Java)

In the process of working on the ShowSort project, and updating it to some of the more recent API changes for Google Web Toolkit (GWT) and Google AppEngine for Java (GAE/J), I ran into a dilemma -- I wanted the sorting algorithm processing to be handled through a series of Task Queue API tasks, and I wanted to break apart the tasks in such a way to provide the largest amount of time for a sorting algorithm to run -- which meant splitting up the actual execution of the sorting algorithm from the compiling of the sorting algorithm into bytecodes.

Well, as I was first using the JSR-223 implementation of Java scripting support (just for fun), I would make a compiled script using the Compilable interface in one step, save it to the memcache, and then stick a new task on the queue to execute the compiled script. Much to my surprise, this didn't work -- because the compiled script created by Jython isn't Serializable (a requirement for the memcache to work).

So, my next attempt was to use the Jython classes directly, specifically the PythonInterpreter class, and compile my script in the first step (task), save to memcache, execute on the next step (task). Once again, it failed to work, despite the PyCode object claiming to be Serializable, it wasn't able to be serialized, because some of the components of the object did not implement Serializable, specifically since the actual compiled object is of type PyTableCode, which inherits from PyBaseCode, which has a few elements that are not Serializable (like CompilerFlags for example).

Stumped again, I began googling for help (like anyone would do), and got to a lot of dead ends. Some people tried XML Serialization (which the prime Java one to use, XStream, doesn't work out of the box for GAE/J), others offered making secondary version of objects, inherited from the original, non-serializable objects, and constructing some elaborate conversion from a PyTableCode object into something home-brewed that was serializable... but that was a bit crazy.

My final idea, which actually worked, was to take a look at the Jython source code. Wandering a while in the source, I stumbled upon a class called: org.python.modules._py_compile -- the actual class that is used within Jython to do compiler operations. From that class, I derived a process that when it compiles, it doesn't create the PyTableCode object yet, instead it gives me a byte[] array -- which is very serializable. I can store the byte[] array (which is the compiled bytecodes of my script from the Jython compiler) into my object, am able to serialize the object, and then when I want to execute my code, I use another Jython object's method, called  org.python.core.BytecodeLoader.makeCode() to construct the PyTableCode object needed to be executed, using the byte[] array I had saved. The bulk of the time for compilation is still in the compile step, the only thing I can see the makeCode() method doing is wrapping it into something Jython's PythonInterpreter class can use. The final results of my attempt are below:

package org.darkhelm.showsort.server;

import java.io.ByteArrayOutputStream;
import java.io.Serializable;
import java.io.StringReader;
import java.util.Arrays;

import java.util.logging.Level;
import java.util.logging.Logger;

import org.python.compiler.Module;

import org.python.core.BytecodeLoader;
import org.python.core.CompileMode;
import org.python.core.CompilerFlags;
import org.python.core.ParserFacade;
import org.python.core.PyCode;

import org.python.util.PythonInterpreter;

/**
 *
 * @author cliffhill
 */
public class Code implements Serializable {

    private static Logger log = Logger.getLogger("ShowSort.Code");
    private static final String VAR_NAME = "array";
    private static final String LIB_PYTHON_PATH = "WEB-INF/lib-python/Lib.zip";
    private static final String FILENAME = "<script>";
    private static final String NAME = "sort$py";

    private String code;
    private byte[] compiledCode;

    public Code() {
        this(null);
    }

    public Code(String code) {
        setCode(code);
    }

    public String getCode() {
        return code;
    }

    public void setCode(String code) {
        this.code = code;
        this.compiledCode = null;
    }

    public void compile() {
        log.finest("Code to be compiled:\n" + code);
        String realCode = "import sys\n\nsys.path.insert(0, '" + LIB_PYTHON_PATH + "')\n\n" + code + "\nsort(" + VAR_NAME + ")";
        log.finest("Real Code to be compiled:\n" + realCode);

        StringReader codeReader = new StringReader(realCode);
        ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
        try {
            org.python.antlr.base.mod node;
            try {
                node = ParserFacade.parse(codeReader, CompileMode.exec, FILENAME, new CompilerFlags());
            } finally {
                codeReader.close();
            }
            Module.compile(node, byteStream, NAME, FILENAME, true, false, null, org.python.core.imp.NO_MTIME);
            compiledCode = byteStream.toByteArray();

            log.finest("Compiled code:\n" + Arrays.toString(compiledCode));
        } catch(Throwable t) {
            log.log(Level.SEVERE, null, t);
        }
    }

    public void exec(Array array) {
        PythonInterpreter pi = new PythonInterpreter();

        log.finest("Interpreter to use: " + pi);

        log.finest("setting local variable '" + VAR_NAME + "' to: " + array);

        pi.set(VAR_NAME, array);

        PyCode execCode = BytecodeLoader.makeCode(NAME, compiledCode, FILENAME);

        log.finest("Executable compiled code: " + execCode);

        pi.exec(execCode);
    }
}

I hope this helps someone else who has dealt with the limitations of the non-serializable Jython compiled script objects.

1 comment:

  1. I found this post very useful, since i want to load compiled Jython code from a Database. For that i adjusted this code slightly to have conversion functions from PyCode to byte[] and back. Only the meaning of the FILENAME and NAME is not 100% clear to me.
    Thanks,
    Enrico

    ReplyDelete