Subshaping of ASTs

As mentioned elsewhere, each rscc grammar rule can be considered a type declaration. This information can be used to "shape-check" [1] AST constructors. For example:

if(ast ~ expr{\a + 5})
{
  // Obviously shape-checks
  ast = `expr{\a - 5};

  // Shape-checks but does not necessarily type-check
  // (it's really hard, if not impossible, to know at compile time if this will type-check)
  ast = `expr{\a(1,2,3)};
}

A more interesting example:

if(a ~ number)
{
  // This should also shape-check
  ast = `expr{\a - 5};
}

As pointed out by Alex Warth, the best way to think of this example is to consider number a "subshape" of expr. Then, the usual type-checking rules for subtypes apply and this is easy to shape-check.

Are Leadsto and Subshape the same thing?

Given the following definitions of number and expr, the subshaping relationship is clear:

number: /[0-9]+/
expr: number
expr: expr "+" expr

Consider instead the following definition:

fullnumber: sign? number expon?
number: /[0-9]+/
sign: "+"
sign: "-"
expon: "e" /[0-9]+/
expr: fullnumber
expr: expr "+" expr

if(a ~ number)
{
  // This should still shape-check
  ast = `expr{\a - 5};
}

In the above, expr =>* number (expr derives number). So if we consider number a subshape of expr, that implies whenever a =>* b, b <: a.

AST Compression via Subshaping

Consider the parse tree for an integer using the current rscc c grammar:

(expr
  (number
    (numd
      '123)))

This is a lot of data to represent an integer. In fact, the expr and number nodes are redundant since we know that numd <: number <: expr. Therefore, we can store integers as follows and implicitly expand to a number or expr when necessary using the subshaping relationship:

(numd '123)

As a further advantage, AST compression simplifies the following constructor:

if(a ~ numd)
  ast = `expr{\a + 5};

// Without compression, (expr (number _)) nodes need to be created to wrap around \a:
ast = (expr (expr
              (number
                \a))
            '+
            (expr
              (number
                (numd 5))))

// With compression, the representation is vastly simplified:
ast = (expr \a '+ (numd 5))

Footnotes

[1]	I use "shape" to refer to the type of an AST node (such as `expr` or `stmt`), and use "type" when refering to the actual type system of the language being modeled (`int`, etc.).