As mentioned elsewhere, each rscc grammar rule can be considered a type declaration. This information can be used to "shape-check" [1] AST constructors. For example:
if(ast ~ expr{\a + 5}) { // Obviously shape-checks ast = `expr{\a - 5}; // Shape-checks but does not necessarily type-check // (it's really hard, if not impossible, to know at compile time if this will type-check) ast = `expr{\a(1,2,3)}; }
A more interesting example:
if(a ~ number) { // This should also shape-check ast = `expr{\a - 5}; }
As pointed out by Alex Warth, the best way to think of this example is to consider number a "subshape" of expr. Then, the usual type-checking rules for subtypes apply and this is easy to shape-check.
Given the following definitions of number and expr, the subshaping relationship is clear:
number: /[0-9]+/ expr: number expr: expr "+" expr
Consider instead the following definition:
fullnumber: sign? number expon? number: /[0-9]+/ sign: "+" sign: "-" expon: "e" /[0-9]+/ expr: fullnumber expr: expr "+" expr if(a ~ number) { // This should still shape-check ast = `expr{\a - 5}; }
In the above, expr =>* number (expr derives number). So if we consider number a subshape of expr, that implies whenever a =>* b, b <: a.
Consider the parse tree for an integer using the current rscc c grammar:
(expr (number (numd '123)))
This is a lot of data to represent an integer. In fact, the expr and number nodes are redundant since we know that numd <: number <: expr. Therefore, we can store integers as follows and implicitly expand to a number or expr when necessary using the subshaping relationship:
(numd '123)
As a further advantage, AST compression simplifies the following constructor:
if(a ~ numd) ast = `expr{\a + 5}; // Without compression, (expr (number _)) nodes need to be created to wrap around \a: ast = (expr (expr (number \a)) '+ (expr (number (numd 5)))) // With compression, the representation is vastly simplified: ast = (expr \a '+ (numd 5))