Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for binary chunks #12

Open
mjanicek opened this issue Oct 24, 2016 · 0 comments
Open

Add support for binary chunks #12

mjanicek opened this issue Oct 24, 2016 · 0 comments

Comments

@mjanicek
Copy link
Owner

Rembulan is currently only able to load text chunks. It might be a good idea to consider adding support for binary chunks.

Java bytecode (as generated by Rembulan), even packed in a JAR file, is probably too low-level for this purpose. Classes in Java bytecode are named, whereas function prototypes in the binary chunk should be (mostly) anonymous. Additionally, there is no way of controlling the origin of the JAR file (and hence its contents): most users would not want to load arbitrary bytecode if they expect a pre-compiled Lua function.

Therefore, these binary chunks should probably be based on the compiler's already-existing IR (intermediate representation).

Advantages

  • Loading binary chunks would eliminate the time spent in parsing, performing static analysis, and optimising the code.
  • string.dump could be implemented.
  • Function prototypes could be transmitted over the wire.
  • This feature would help in serialisation (Add support for execution state serialisation/deserialisation #4). (Upvalues need to be serialised as well, and at least in PUC-Lua, these are not included in a binary chunk -- an expectation that should probably not be broken in Rembulan.)

Disadvantages

  • This is strictly speaking not necessary to implement the Lua programming language. (The Lua bytecode is an implementation detail of PUC-Lua.)
  • The implementation would introduce additional complexity and source of bugs into the compiler.
  • The IR should not be considered stable at this point: there are still outstanding optimisations that may require changing the IR. Fixing the IR would slow down progress on that front.
  • Rembulan's IR is very different from the PUC-Lua bytecode, and its binary format would be too. (In other words, it would still not be possible to load a chunk compiled by PUC-Lua in Rembulan.)
  • Security and safety implications: a malformed binary chunk may crash the compiler. To do this right, this feature would probably require a verifier. (On the other hand, the IR does have some of the necessary checks in place already, and is immutable. The compiler could be made resilient to malformed IR input.)
  • Binary chunk loading is typically not permitted in sandboxed environments, so the usefulness in the niche Rembulan is explicitly targeting is limited.

How could this be done

The following steps are required in order to implement this feature such that it is usable/useful:

  • Define a binary format for the IR, and implement its writer and reader.
  • (Optionally, implement a verifier for security.)
  • Add an additional entry point into the compiler pipeline that accepts loaded IR.
  • Attach the binary representation to compiled functions in a way that is accessible at runtime. (The best way to go about this is probably using Java annotations.)
  • Implement string.dump.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant