Saturday, March 27, 2010

Updated backtrace patch

I’ve updated my backtrace patch to work with OCaml 3.11.x as well as 3.10.x. The patch provides

  • access to backtraces from within a program (this is already provided in stock 3.11.x)

  • backtraces for dynamically-loaded bytecode

  • backtraces in the (bytecode) toplevel

In addition there are a few improvements since the last version:

  • debugging events are allocated outside the heap, so memory use should be better with forking (on Linux at least, the data is shared on copy-on-write pages but the first GC causes the pages be copied)

  • fixed a bug that could cause spurious “unknown location” lines in the backtrace

  • a script to apply the patch (instead of the previous multi-step manual process)

See ocaml-backtrace-patch on Github or download the tarball.

Tuesday, March 23, 2010

Inside OCaml objects

In the ocamljs project I wanted to implement the OCaml object system in a way that is interoperable with Javascript objects. Mainly I wanted to be able to call Javascript methods with the OCaml method call syntax, but it is also useful to write objects in OCaml which are callable in the usual way from Javascript.

I spent some time a few months ago figuring out how OCaml objects are put together in order to implement this (it is in the unreleased ocamljs trunk—new release coming soon I hope). I got a bug report against it the other day, and it turns out I don’t remember much of what I figured out. So I am going to figure it out again, and write it down, here in this very blog post!

Objects are implemented mostly in the CamlinternalOO library module, with a few compiler primitives for method invocation. The compiler generates CamlinternalOO calls to construct classes and objects. Our main tool for figuring out what is going on is to write a test program, dump out its lambda code with -dlambda, and read the CamlinternalOO source to see what it means. I will explain functions from camlinternalOO.ml but not embed them in the post, so you may want it available for reference.

I have hand-translated (apologies for any errors) the lambda code back to pseudo-OCaml to make it more readable. The compiler-generated code works directly with the OCaml heap representation, and generally doesn’t fit into the OCaml type system. Where the heap representation can be translated back to an OCaml value I do that; otherwise I write blocks with array notation, and atoms with integers. Finally I have used OO as an abbreviation for CamlinternalOO.

Immediate objects

Here is a first test program, defining an immediate object:

let p = 
object 
  val mutable x = 0 
  method get_x = x 
  method move d = x <- x + d 
end 

And this is what it compiles to:

let shared = [|"move";"get_x"|] 
let p = 
  let clas = OO.create_table shared in 
  let obj_init = 
    let ids = OO.new_methods_variables clas shared [|"x"|] in 
    let move = ids.(0) in 
    let get_x = ids.(1) in 
    let x = ids.(2) in 
    OO.set_methods clas [| 
      get_x; OO.GetVar; x; 
      move; (fun self d -> self.(x) <- self.(x) + d); 
    |]; 
    (fun env -> 
       let self = OO.create_object_opt 0 clas in 
       self.(x) <- 0; 
       self) in 
  OO.init_class clas; 
  obj_init 0 

An object has a class, created with create_table and filled in with new_methods_variables, set_methods, and init_class; the object itself is created by calling create_object_opt with the class as argument, then initializing the instance variable.

A table (the value representing a class) has the following fields (and some others we won’t cover):

type table = { 
  mutable size: int; 
  mutable methods: closure array; 
  mutable methods_by_name: meths; 
  mutable methods_by_label: labs; 
  mutable vars: vars; 
} 

Each instance variable has a slot (its index in the block which represents the object); vars maps variable names to slots. The size field records the total number of slots (including internal slots, see below).

Each public method has a label, computed by hashing the method name. The methods field (used for method dispatch) holds each method of the class, with the label of the method at the following index (the type is misleading). Each method then has a slot (the index in methods of the method function); methods_by_name maps method names to slots, and the confusingly-named methods_by_label marks slots to whether it is occupied by a public method.

The create_table call assigns slots to methods, fills in the method labels in methods, and sets up methods_by_name and methods_by_label. The new_methods_variables call returns the slot of each public method and each instance variable in a block (which is unpacked into local variables).

The set_methods call sets up the method functions in methods. Its argument is a block containing alternating method slots and method descriptions (the description can take more than one item in the block). For some methods (e.g. move) the description is just an OCaml function (here you can see that self is passed as the first argument). For some the description is given by a value of the variant OO.impl along with some other arguments. For get_x it is GetVar followed by the slot for x. The actual function that gets the instance variable is generated by set_methods. As far as I understand it, the point of this is to reduce object code size by factoring out the common code from frequently occurring methods.

Finally create_object_opt allocates a block of clas.size, then fills in the first slot with the methods array of the class and the second with the object’s unique ID. (We will see below what the _opt part is about.)

Method calls

A public method call:

p#get_x 

compiles to:

send p 291546447 

where send is a built-in lambda term. The number is the method label. To understand how the method is applied we have to go a little deeper. In bytegen.ml there is a case for Lsend which generates the Kgetpubmet bytecode instruction to find the method function; the function is then applied like any other function. Next we look to the GETPUBMET case in interp.c to see how public methods are looked up in the methods block (stored in the first field of the object).

A couple details about methods we didn’t cover before: The first field contains the number of public methods. The second contains a bitmask used for method caching—briefly, it is enough bits to store offsets into methods. The rest of the block is method functions and labels as above, padded out so that the range of an offset masked by the bitmask does not overflow the block.

Returning to GETPUBMET, we first check to see if the method cache for this call site is valid. The method cache is an extra word at each call site which stores an offset into methods (but may be garbage—masking it takes care of this). If the method label at this offset matches the label we’re looking for, the associated method function is returned. Otherwise, we binary search methods to find the method label (methods are sorted in label order in transclass.ml), then store the offset in the cache and return the associated method function.

Classes

A class definition:

class point = 
object 
  val mutable x = 0 
  method get_x = x 
  method move d = x <- x + d 
end 
let p = new point 

compiles to:

let shared = [|"move";"get_x"|] 
let point = 
  let point_init clas = 
    let ids = OO.new_methods_variables clas shared [|"x"|] in 
    let move = ids.(0) in 
    let get_x = ids.(1) in 
    let x = ids.(2) in 
    OO.set_methods clas [| 
      get_x; OO.GetVar; x; 
      move; (fun self d -> self.(x) <- self.(x) + d); 
    |]; 
    (fun env self -> 
       let self = OO.create_object_opt self clas in 
       self.(x) <- 0; 
       self) in 
  OO.make_class shared point_init 
let p = (point.(0) 0) 

This is similar to the immediate object code, except that the class constructor takes the class table as an argument rather than constructing it itself, and the object constructor takes self as an argument. We will see that class and object constructors are each chained up the inheritance hierarchy, and the tables / objects are passed up the chain. The make_class call calls create_table and init_class in the same way we saw in the immediate object case, and returns a tuple, of which the first component is the object constructor. So the new invocation calls the constructor.

Inheritance

A subclass definition:

class point = ... (* as before *) 
class point_sub = 
object 
  inherit point 
  val mutable y = 0 
  method get_y = y 
end 

compiles to:

let point = ... (* as before *) 
let point_sub = 
  let point_sub_init clas = 
    let ids = 
      OO.new_methods_variables clas [|"get_y"|] [|"y"|] in 
    let get_y = ids.(0) in 
    let y = ids.(1) in 
    let inh = 
      OO.inherits 
        clas [|"x"|] [||] [|"get_x";"move"|] point true in 
    let obj_init = inh.(0) in 
    OO.set_methods clas [| get_y; GetVar; y |]; 
    (fun env self -> 
      let self' = OO.create_object_opt self clas in 
      obj_init self'; 
      self'.(y) <- 0; 
      OO.run_initializers_opt self self' clas) in 
  OO.make_class [|"move";"get_x";"get_y"|] point_sub_init 

The subclass is connected to its superclass through inherits, which calls the superclass constructor on the subclass (filling in methods with the superclass methods) and returns the superclass object constructor (and some other stuff). In the subclass object constructor, the superclass object constructor is called. (This is why the object is created optionally—the class on which new is invoked actually allocates the object; further superclass constructors just initialize instance variables.) In addition, we run any initializers, since some superclass may have them.

Self- and super-calls

A class with a self-call:

class point = 
object (s) 
  val mutable x = 0 
  method get_x = x 
  method get_x5 = s#get_x + 5 
end 

becomes:

let point = 
  let point_init clas = 
    ... (* as before *) 
    OO.set_methods clas [| 
      get_x; OO.GetVar; x; 
      get_x5; (fun self -> sendself self get_x + 5); 
    |] 
    ... 

Here sendself is a form of Lsend for self-calls, where we know the method slot at compile time. Instead of generating the Kgetpubmet bytecode, it generates Kgetmethod, which just does an array reference to find the method.

A class with a super-call:

class point = ... (* as before *) 
class point_sub = 
object 
  inherit point as super 
  method move1 n = super#move n 
end 

becomes:

let point = ... (* as before *) 
let point_sub = 
  let point_sub_init clas = 
    ... 
    let inh = 
      OO.inherits 
        clas [|"x"|] [||] [|"get_x";"move"|] point true in 
    let move = inh.(3) in 
    ... 
    OO.set_methods clas [| 
      move1; (fun self n -> move self n) 
    |]; 
    ... 

In this case, we are able to look up the actual function for the super-call in the class constructor (returned from inherits), so the invocation is just a function application rather than a slot dereference.

I don’t totally understand why we don’t know the function for self calls. I think it is because the superclass constructor runs before the subclass constructor, so the slot is assigned (this happens before the class constructors are called) but the function hasn’t been filled in yet. Still it seems like the knot could somehow be tied at class construction time to avoid a runtime slot dereference.

ocamljs implementation

The main design goal is that we be able to call methods on ordinary Javascript objects with the OCaml method call syntax, simply by declaring a class type giving the signature of the object. So if you want to work with the browser DOM you can say:

class type document = 
object 
  method getElementById : string -> #element 
  ... (* and so on *) 
end 

for some appropriate element type (see src/dom/dom.mli in ocamljs for a full definition), and say:

document#getElementById "content" 

to make the call.

These are always public method calls, so they use the Lsend lambda form. We don’t want to do method label dispatch, since Javascript already has dispatch by name, so first off we need to carry the name rather than the label in Lsend.

We have seen how self is passed as the first argument when methods are invoked. We can’t do that for an arbitrary Javascript function, but the function might use this, so we need to be sure that this points to the object.

There is no way to know at compile time whether a particular method invocation is on a regular Javascript object or an OCaml object. Maybe we could mark OCaml objects somehow and do a check at runtime, but I decided to stick with a single calling convention. So whatever OCaml objects compile to, they have to support the convention for regular Javascript objects—foo#bar compiles to foo.bar, with this set to foo.

As we have seen, self-calls are compiled to a slot lookup rather than a name lookup, so we also need to support indexing into methods.

So here’s the design: an OCaml object is represented by a Javascript object, with numbered slots containing the instance variables. There is a constructor for each class, with prototype set up so each method is accessible by name, and the whole methods block is accessible in a special field, so we can call by slot. (Since we don’t need method labels, methods just holds functions.)

The calling convention passes self in this, so we bind a local self variable to this on entry to each method. It doesn’t work to say this everywhere instead of self, because this in Javascript is a bit fragile. In particular, if you define and apply a local function (ocamljs does this frequently), this is null rather than the lexically-visible this.

For sendself we look up the function by slot in the special methods field. Finally, for super-calls, we know the function at class construction time. In this case the function is applied directly, but we need to take care to treat it as a method application rather than an ordinary function call, since the calling convention is different.

The bug

The OCaml compiler turns super-calls into function applications very early in compilation (during typechecking in typecore.ml). There is no difference in calling convention for regular OCaml, so it doesn’t matter that later phases don’t know that these function applications are super-calls. But in our case we have to carry this information forward to the point where we generate Javascript (in jsgen.ml). It is a little tricky without changing the “typedtree” intermediate language.

I had put in a hack to mark these applications with a special extra argument, and it worked fine for my test program, where the method had no arguments. I didn’t think through or test the case where the method has arguments though. I was able to fix it (I think!) with a different hack: super calls are compiled to self calls (that is, to Texp_send with Tmeth_val) but the identifier in Tmeth_val is marked with an unused bit to indicate that it binds a function rather than a slot, so we don’t need to dereference it.


Appendix: other features

It is interesting to see how the various features of the object system are implemented, but maybe not that interesting, so here they are as an appendix.

Constructor parameters

A class definition with a constructor parameter:

class point x_init = 
object 
  val mutable x = x_init 
  ... (* as before *) 
end 

compiles to:

let point = 
  let point_init clas = 
    ... (* as before *) 
    (fun env self x_init -> 
      OO.create_object_opt self clas; 
      self.(x) <- x_init; 
      self) in 
  ... (* as before *) 

So the constructor parameter in the surface syntax just turns into a constructor parameter internally. (There is a slightly funny interaction between constructor parameters and let-bound expressions after class but before object: if there is no constructor parameter the let is evaluated at class construction, but if there is a parameter it is evaluated at object construction, whether or not it depends on the parameter.)

Virtual methods and instance variables

A class definition with a virtual method:

class virtual abs_point = 
object 
  method virtual move : int -> unit 
end 

compiles to:

let abs_point = [| 
  0; 
  (fun clas -> 
    let move = OO.get_method_label clas "move" in 
    (fun env self -> 
      OO.create_object_opt self clas); 
  0; 0 
|] 

Since a virtual class can’t be instantiated, there’s no need to create the class table with make_class; we just return the tuple that represents the class, containing the class and object constructor. (I don’t understand the call to get_method_label, since its value is unused; possibly it is called for its side effect, which is to register the method in the class table if it does not already exist.)

A subclass implementing the virtual method inherits from the virtual class in the usual way.

A class declaration with a virtual instance variable:

class virtual abs_point2 = 
object 
  val mutable virtual x : int 
  method move d = x <- x + d 
end 

becomes:

let abs_point = [| 
  0; 
  (fun clas 
    let ids = 
      OO.new_methods_variables [|"move"|] [|"x"|] in 
    ... (* as before *)); 
  0; 0 
|] 

Again, a subclass providing the instance variable inherits from the virtual class in the usual way. By the time new_methods_variables is called in the superclass, the subclass has registered a slot for the variable.

Private methods

A class definition with a private method:

class point = 
object 
  val mutable x = 0 
  method get_x = x 
  method private move d = x <- x + d 
end 

compiles to:

let point = 
  let point_init clas = 
    let ids = 
      OO.new_methods_variables clas [|"move";"get_x"|] [|"x"|] in 
    ... (* as before *) 
  OO.make_class [|"get_x"|] point_init 

Everything is the same except that the private method is not listed in the public methods of the class. Since a private method is callable only from code in which the class of the object is statically known, there is no need for dispatch or a method label. The private method functions are stored in methods after the public methods and method labels.

If we expose a private method in a subclass:

class point = ... (* as before *) 
class point_sub = 
object 
  inherit point 
  method virtual move : _ 
end 

we get:

let point = ... (* as before *) 
let point_sub = 
  let point_sub_init clas = ... (* as before *) 
  OO.make_class [|"move";"get_x"|] point_sub_init 

Putting "move" in the call to make_class registers it as a public method, so later, when set_method is called for move in the superclass constructor, it puts the method and its label in methods for dispatch.

Tuesday, March 2, 2010

Reading Camlp4, part 5: filters

Hey, long time no see!

It is high time to get back to Camlp4, so I would like to pick up the thread by covering Camlp4 filters. We have previously considered the parsing and pretty-printing facilities of Camlp4 separately. But of course the most common way to use Camlp4 is as a front-end to ocamlc, where it processes files by parsing them into an AST and pretty-printing them back to text (well, not quite—we will see below how the AST is passed to ocamlc). In between we can insert filters to transform the AST.

A simple filter

So let’s dive into an example: a filter for type definitions that generates t_to_string and t_of_string functions for a type t, a little like Haskell’s deriving Show, Read. To keep it simple we handle only variant types, and only those where all the arms have no data. Here goes:

module Make (AstFilters : Camlp4.Sig.AstFilters) = 
struct 
  open AstFilters 

In order to hook into Camlp4’s plugin mechanism we define the filter as a functor. By opening AstFilters we get an Ast module in scope. Unfortunately this is not the same Ast we got previously from Camlp4.PreCast (although it has the same signature) so all our code that uses Ast (including all OCaml syntax quotations) needs to go inside the functor body.

  let rec filter si = 
    match wrap_str_item si with 
      | <:str_item< type $lid:tid$ = $Ast.TySum (_, ors)$ >> -> 
          begin 
            try 
              let cons = 
                List.map 
                  (function 
                     | <:ctyp< $uid: c$ >> -> c 
                     | _ -> raise Exit) 
                  (Ast.list_of_ctyp ors []) in 
              to_of_string si tid cons 
            with Exit -> si 
          end 
       | _ -> si 

The filter function filters Ast.str_items. (It is not actually recursive but we say let rec so we can define helper functions afterward). If a str_item has the right form we transform it by calling to_of_string, otherwise we return it unchanged. We match a sum type definition, then extract the constructor names (provided that they have no data) into a string list. (Recall that a TySum contains arms separated by TyOr; the call to list_of_ctyp converts that to a list of arms.)

  and wrap_str_item si = 
    let _loc = Ast.loc_of_str_item si in 
    <:str_item< $si$ >> 

For some reason, <:str_item< $si$ >> wraps an extra StSem / StNil around si, so in order to use the quotation syntax on the left-hand side of a pattern match we need to do the same wrapping.

  and to_of_string si tid cons = 
    let _loc = Ast.loc_of_str_item si in 
    <:str_item< 
      $si$;; 
      $to_string _loc tid cons$;; 
      $of_string _loc tid cons$;; 
    >> 

This str_item replaces the original one in the output, so we include the original one in additional to new ones containing the t_to_string and t_of_string functions.

  and to_string _loc tid cons = 
    <:str_item< 
      let $lid: tid ^ "_to_string"$ = function 
        $list: 
          List.map 
            (fun c -> <:match_case< $uid: c$ -> $`str: c$ >>) 
            cons$ 
    >> 

To convert a variant to a string, we match over its constructors and return the corresponding string.

  and of_string _loc tid cons = 
    <:str_item< 
      let $lid: tid ^ "_of_string"$ = function 
        $list: 
          List.map 
            (fun c -> <:match_case< 
       $tup: <:patt< $`str: c$ >>$ -> $uid: c$ 
     >>) 
            cons$ 
        | _ -> invalid_arg "bad string" 
    >> 

To convert a string to a variant, we match over the corresponding string for each constructor and return the constructor; we also need a catchall for strings that match no constructor. (What is this tup and patt business? A contrived bug which we will fix below.)

  ;; 
  AstFilters.register_str_item_filter begin fun si -> 
    let _loc = Ast.loc_of_str_item si in 
    <:str_item< 
      $list: List.map filter (Ast.list_of_str_item si [])$ 
    >> 
  end 

Now we register our filter function with Camlp4. The input str_item may contain many str_itemss separated by StSem, so we call list_of_str_item to get a list of individuals.

end 
module Id = 
struct 
  let name = "to_of_string" 
  let version = "0.1" 
end 
;; 
let module M = Camlp4.Register.AstFilter(Id)(Make) in () 

Finally we register the plugin with Camlp4. The functor application is just for its side effect, so the plugin is registered when its .cmo is loaded. We can compile the plugin with

ocamlfind ocamlc -package camlp4.quotations.o -syntax camlp4o \ 
  -c to_of_string.ml

and run it on a file (containing type t = Foo | Bar | Baz or something) with

camlp4o to_of_string.cmo test.ml
Ocamlc's AST

Looks pretty good, right? But something goes wrong when we try to use our plugin as a frontend for ocamlc:

ocamlc -pp 'camlp4o ./to_of_string.cmo' test.ml

We get a preprocessor error, “singleton tuple pattern”. It turns out that Camlp4 passes the processed AST to ocamlc not by pretty-printing it to text, but by converting it to the AST type that ocamlc uses and marshalling it. This saves the time of reparsing it, and also passes along correct file locations (compare to cpp’s #line directives). However, as we have seen, the Camlp4 AST is pretty loose. When converting to an ocamlc AST, Camlp4 does some validity checks on the tree. What can be confusing is that an AST that fails these checks may look fine when pretty-printed.

Here the culprit is the line

       $tup: <:patt< $`str: c$ >>$ -> $uid: c$ 

which produces an invalid pattern consisting of a one-item tuple. When pretty-printed, though, the tup just turns into an extra set of parentheses, which ocamlc doesn’t mind. What we wanted was

       $`str: c$ -> $uid: c$ 

This is a contrived example, but this kind of error is easy to make, and can be hard to debug, because looking at the pretty-printed output doesn’t tell you what’s wrong. One tactic is to run your code in the toplevel, which will print the constructors of the AST as usual. Another is to use a filter that comes with Camlp4 to “lift” the AST—that is, to generate the AST representing the original AST! Maybe it is easier to try it than to explain it:

camlp4o to_of_string.cmo -filter Camlp4AstLifter test.ml

Now compare the result to the tree you get back from Camlp4’s parser for the code you meant to write, and you can probably spot your mistake.

(If you tried to redirect the camlp4o command to a file or pipe it through less you got some line noise—this is the marshalled ocamlc AST. By default Camlp4 checks whether its output is a TTY; if so it calls the pretty-printer, if not the ocamlc AST marshaller. To override this use the -printer o option, or -printer r for revised syntax.)

Other builtin filters

This Camlp4AstLifter is pretty useful. What else comes with Camlp4? There are several other filters in camlp4/Camlp4Filters which you can call with -filter:

  • Camlp4FoldGenerator generates visitor classes from datatypes. Try putting class x = Camlp4MapGenerator.generated after a type definition. The idea is that you can override methods of the visitor so you can do some transformation on a tree without having to write the boilerplate to walk the parts you don’t care about. In fact, this filter is used as part of the Camlp4 bootstrap to generate vistors for the AST; you can see the map and fold classes in camlp4/Camlp4/Sig.ml.

  • Camlp4MetaGenerator generates lifting functions from a type definition—these functions are what Camlp4AstLifter uses to lift the AST, and it’s also how quotations are implemented. I’m planning to cover how to implement quotations / antiquotations (for a different language) in a future post, and Camlp4MetaGenerator will be crucial.

  • Camlp4LocationStripper replaces all the locations in an AST with Loc.ghost. I don’t know what this is for, but it might be useful if you wanted to compare two ASTs and be insensitive to their locations.

  • Camlp4Profiler inserts profiling code, in the form of function call counts. I haven’t tried it, and I’m not sure when you would want it in preference to gprof.

  • Camlp4TrashRemover just filters out a module called Camlp4Trash. Such a module may be found in camlp4/Camlp4/Struct/Camlp4Ast.mlast; I think the idea is that the module is there in order to generate some stuff, but the module itself is not needed.

  • Camlp4MapGenerator has been subsumed by Camlp4FoldGenerator.

  • Camlp4ExceptionTracer seems to be a special-purpose tool to help debug Camlp4.

OK, maybe not too much useful stuff here, but it is interesting to work out how Camlp4 is bootstrapped.

I think next time I will get into Camlp4’s extensible parsers, on the way toward syntax extensions.

Colophon

I wrote my previous posts in raw HTML, with highlighted code generated from a hightlighted Emacs buffer by htmlize.el. Iterating on this setup was unutterably painful. This post was written using jekyll with a simple template to approximate the Blogspot formatting, mostly so I can check that lines of code aren’t too long. Jekyll is very nice: you can write text with Markdown, and highlight code with Pygments.