Subshaping of ASTs

As mentioned elsewhere, each rscc grammar rule can be considered a type declaration. This information can be used to "shape-check" [1] AST constructors. For example:

if(ast ~ expr{\a + 5})
{
  // Obviously shape-checks
  ast = `expr{\a - 5};

  // Shape-checks but does not necessarily type-check
  // (it's really hard, if not impossible, to know at compile time if this will type-check)
  ast = `expr{\a(1,2,3)};
}

A more interesting example:

if(a ~ number)
{
  // This should also shape-check
  ast = `expr{\a - 5};
}

As pointed out by Alex Warth, the best way to think of this example is to consider number a "subshape" of expr. Then, the usual type-checking rules for subtypes apply and this is easy to shape-check.

Are Leadsto and Subshape the same thing?

Given the following definitions of number and expr, the subshaping relationship is clear:

number: /[0-9]+/
expr: number
expr: expr "+" expr

Consider instead the following definition:

fullnumber: sign? number expon?
number: /[0-9]+/
sign: "+"
sign: "-"
expon: "e" /[0-9]+/
expr: fullnumber
expr: expr "+" expr

if(a ~ number)
{
  // This should still shape-check
  ast = `expr{\a - 5};
}

In the above, expr =>* number (expr derives number). So if we consider number a subshape of expr, that implies whenever a =>* b, b <: a.

AST Compression via Subshaping

Consider the parse tree for an integer using the current rscc c grammar:

(expr
  (number
    (numd
      '123)))

This is a lot of data to represent an integer. In fact, the expr and number nodes are redundant since we know that numd <: number <: expr. Therefore, we can store integers as follows and implicitly expand to a number or expr when necessary using the subshaping relationship:

(numd '123)

As a further advantage, AST compression simplifies the following constructor:

if(a ~ numd)
  ast = `expr{\a + 5};

// Without compression, (expr (number _)) nodes need to be created to wrap around \a:
ast = (expr (expr
              (number
                \a))
            '+
            (expr
              (number
                (numd 5))))

// With compression, the representation is vastly simplified:
ast = (expr \a '+ (numd 5))

Footnotes

[1]I use "shape" to refer to the type of an AST node (such as expr or stmt), and use "type" when refering to the actual type system of the language being modeled (int, etc.).