As mentioned elsewhere, each rscc grammar rule can be considered a type declaration. This information can be used to "shape-check" [1] AST constructors. For example:
if(ast ~ expr{\a + 5})
{
// Obviously shape-checks
ast = `expr{\a - 5};
// Shape-checks but does not necessarily type-check
// (it's really hard, if not impossible, to know at compile time if this will type-check)
ast = `expr{\a(1,2,3)};
}
A more interesting example:
if(a ~ number)
{
// This should also shape-check
ast = `expr{\a - 5};
}
As pointed out by Alex Warth, the best way to think of this example is to consider number a "subshape" of expr. Then, the usual type-checking rules for subtypes apply and this is easy to shape-check.
Given the following definitions of number and expr, the subshaping relationship is clear:
number: /[0-9]+/ expr: number expr: expr "+" expr
Consider instead the following definition:
fullnumber: sign? number expon?
number: /[0-9]+/
sign: "+"
sign: "-"
expon: "e" /[0-9]+/
expr: fullnumber
expr: expr "+" expr
if(a ~ number)
{
// This should still shape-check
ast = `expr{\a - 5};
}
In the above, expr =>* number (expr derives number). So if we consider number a subshape of expr, that implies whenever a =>* b, b <: a.
Consider the parse tree for an integer using the current rscc c grammar:
(expr
(number
(numd
'123)))
This is a lot of data to represent an integer. In fact, the expr and number nodes are redundant since we know that numd <: number <: expr. Therefore, we can store integers as follows and implicitly expand to a number or expr when necessary using the subshaping relationship:
(numd '123)
As a further advantage, AST compression simplifies the following constructor:
if(a ~ numd)
ast = `expr{\a + 5};
// Without compression, (expr (number _)) nodes need to be created to wrap around \a:
ast = (expr (expr
(number
\a))
'+
(expr
(number
(numd 5))))
// With compression, the representation is vastly simplified:
ast = (expr \a '+ (numd 5))