Authoring Languages and Extensions
Writing Unsound languages to be easily extensible, and writing extensions to extend many languages (as opposed to just one specific one), is not straightforward. Here are a few techniques that make it easier.
Overview
An extension is a module that exports builder functions for one or more compiler phases:
export default {
name: "myext",
description: "My extension",
requires: ["core"],
$parse: ($) => {
/* extend parsing */
},
$compile: ($) => {
/* extend compilation */
},
$interpret: ($) => {
/* extend interpretation */
},
// Less commonly extended:
$emit: ($) => {
/* extend emission */
},
$analyze: ($) => {
/* extend analysis */
},
};
Each builder receives $, an object containing the current phase operations. Your builder mutates $ to add or override operations. Extensions are applied in order, so later extensions can override earlier ones.
Parsing
The parser is built from small, composable pieces. When extending parsing, follow these principles:
Break parsing into small pieces
Define separate parser functions for each syntactic element, and expose them on $:
$parse: ($) => {
// Bad: one big parser
$.myExpr = () => /* parses everything */;
// Good: composable pieces
$.myKeyword = () => $.keyword("mykw");
$.myParams = () => $.between($.token("("), $.sepBy($.ident(), $.token(",")), $.token(")"));
$.myBody = () => $.expr();
$.myExpr = () => $.seq($.myKeyword(), $.myParams(), $.myBody(), (kw, params, body) => ({
type: "MyExpr",
params,
body
}));
}
This allows subsequent extensions to override just $.myParams or $.myBody without reimplementing the whole expression.
Expose constants on $
If your extension uses constants (keywords, operator precedences, etc.), expose them on $ so other extensions can modify them:
$parse: ($) => {
// Expose configuration
$.myKeywords = ["mykw", "myother"];
$.myPrecedence = 5;
// Use the configuration
$.myExpr = () => {
const kw = $.myKeywords[0];
// ...
};
};
Another extension can then do $.myKeywords.push("extended") or $.myPrecedence = 10.
Extend existing parsers carefully
When overriding an existing parser like $.expr, save the original and call it as a fallback:
$parse: ($) => {
const baseExpr = $.expr;
$.expr = () =>
$.alt(
$.lazy(() => $.myExpr()), // Try new syntax first
baseExpr() // Fall back to base
);
};
Compilation
The compiler transforms AST nodes into IR. Follow these principles:
Compile to existing IR when possible
Prefer compiling to IR constructs that $emit and $interpret already handle. This means your extension doesn't need to provide $emit or $interpret builders:
$compile: ($) => {
$.compileMyExpr = (expr) => {
// Compile to existing $.let, $.lambda, $.call, etc.
return ir.$(
"let",
ir.var("$env"),
ir.lit(expr.name),
$.compileExpr(expr.value),
ir.arrow(["$env"], $.compileExpr(expr.body))
);
};
};
If you compile to a new IR operation like ir.$("myOp", ...), you'll need to provide $interpret (and possibly $emit) to handle it.
Handle extended expression types
When overriding $.compileExpr, your implementation may receive expression types from other extensions. Handle this by checking for your types and delegating others:
$compile: ($) => {
const baseCompileExpr = $.compileExpr;
$.compileExpr = (expr) => {
if (expr.type === "MyExpr") {
return $.compileMyExpr(expr);
}
// Delegate unknown types to base
return baseCompileExpr.call($, expr);
};
};
The Expr type limitation
TypeScript's Expr type is a union of known expression types. When you add a new expression type, TypeScript doesn't know about it. You'll often need casts:
$compile: ($) => {
const baseCompileExpr = $.compileExpr;
$.compileExpr = (expr) => {
// Cast to access your extended type
const e = expr as MyExpr | Expr;
if (e.type === "MyExpr") {
return $.compileMyExpr(e);
}
return baseCompileExpr.call($, expr);
};
}
This is unfortunate but necessary given TypeScript's type system. The recursive nature of Expr makes it difficult to properly extend.
Extract helper functions
If your compilation logic is complex, extract helper functions and expose them on $:
$compile: ($) => {
$.compileMyParams = (params) => /* ... */;
$.compileMyBody = (body) => /* ... */;
$.compileMyExpr = (expr) => {
const params = $.compileMyParams(expr.params);
const body = $.compileMyBody(expr.body);
// ...
};
}
Interpretation
The interpreter defines runtime semantics. When adding new operations:
Provide operations for new IR
If your compiler emits new IR operations, provide interpreter implementations:
$interpret: ($) => {
$.myOp = ($env, arg1, arg2) => {
// Implement runtime behavior
return /* result */;
};
};
Match the signature expected by emit
The emitter generates code like $.myOp($env, arg1, arg2). Your interpreter operation must match this signature.
Consider the environment
Many operations receive $env as the first argument. Use $env.lookup(name) to read variables, $env.extend(bindings) to create child scopes, and $env.mutate(name, value) to update existing bindings.
Testing
Write tests for your extension using the test file format:
# usc -x core -x myext
--- my feature works
myexpr(1, 2)
===
3
--- my feature handles errors
myexpr(null)
=== error
expected number
Test both successful cases and error cases. Test interaction with other extensions when relevant.