At its core, rscc can be viewed as a dialect of ML. To get an idea of the meaning of that statement, consider the following simple language of booleans as it might be implemented in rscc:
expr: "true" expr: "false" expr: "if" expr "then" expr "else" expr
C code may manipulate this language using pattern-matching destructors and restructors:
if(ast ~ expr{true}) { ... } else if(ast ~ expr{if \a then \b else \c}) { ... } ast = `expr{if true then true else false}
The same concepts can be implemented in ML using the following constructs:
datatype expr = True | False | If of expr*expr*expr match ast with | True -> ... | If(a,b,c) -> ... let ast = If(True,True,False)
These are exactly the same syntactic elements, except that rscc has unified the language syntax and the meta-language syntax. Very cool. We should take advantage of this property whenever possible. Some possibilities:
Optimzations useful in ML compilers (for pattern matchings and datatype representations) can be directly useful in the rscc compiler.
ML does type inference; we can do shape inference [1]. For example, lets add statements and numbers to the simple language of booleans:
stmt: lval "=" expr ";" expr: num num: /[0-9]+/
Later we can pattern match:
if(a ~ stmt) { ... b = `expr{if \a then true else false}; }
This is clearly wrong because \a must be an expr, but we know it's a stmt and stmts cannot be substituted for exprs. The shape of a can be inferred to be stmt from the if guard, and then "shape-checked" at its use in b's restructor. Other examples can be found in the discussion of subshaping.
One difference between type-checking ML and shape-checking rscc is that rscc allows some ambiguity in AST shape (where as ML requires uniquess of types). The usual example is casting vs function calls:
ast ~ expr{(\a)(\b)}
Here, \a can be either a typeclass or expr, two completely unrelated shapes (there is no subshaping relationship). To resolve this issue, the compiler can require that a be checked prior to any use, or it can insert run-time checks. From a type-theory point-of-view, \a is a union type (or "union shape"; and specifically, the union of typeclass and expr) so it is only safe to use \a in contexts in which it is safe to use both typeclass and expr. (As an example, it would be safe to construct ast = `expr{(\a)(\c)}, where \c is another expr).
It has been noted that pattern-matching ast ~ expr{\a + 1} creates an ambiguous parse for the slot \a (should it be an expr slot or a num slot?). Rscc uses the shallowest parse, and this is exactly what ML does. In the example below, a will match an expr type (not a num type):
datatype expr = Add of expr*expr | Num of num datatype num = int match ast with | Add(a, Num 5) -> ...
I've proposed two extensions which steal ideas from ML. Their goal is to make extension writing easier.