Support infix operator rules #1836

msujew · 2025-03-11T14:41:33Z

Adds support for a special infix operator syntax that aims to improve Langium on multiple fronts:

Improves readability/maintainability of Langium grammars
Simplify specifying precedence for infix binary expressions
Improves performance by parsing the binary expression in a flat list - the parser will automatically transform the resulting object into a tree structure.

Note that the same AST node is constructed as if you were to write the parser rules manually. Also note that this PR assumes that all binary operators are left-associative (i.e. A + B + C is parsed as (A + B) + C). We can revisit this decision in case we find a case for right-associative grammar rules. Note that this only requires changes in the tree restructuring routine.

aabounegm · 2025-03-11T17:51:01Z

Amazing effort on a long-requested feature! I just have a question: how would this work if I want to create a different AST node type for each operator (like here)? Would something like this work?

Expr: Equal;

infix Equal on Multiplication: '==';
infix Multiplication on Addition: '*';
infix Addition on Primary: '+';

Primary infers Expr: '(' Expression ')' | NumberLiteral;

msujew · 2025-03-11T18:04:07Z

@aabounegm Good question: Splitting the infix operator rule into multiple rules would result in decreased performance - we get a boost by reducing the amount of parser rules (and therefore subrule calls) we need to create.

Instead, I would propose something like this perhaps?

Expr: BinaryExpr;

infix BinaryExpr on Primary: {infer Equal} '==' > {infer Multiplication} '*' > {infer Addition} '+';

Primary infers Expr: '(' Expression ')' | NumberLiteral;

I think that should work pretty well. However, I would attempt to implement such a feature in a separate step/PR. This PR is complex enough as it is.

aabounegm · 2025-03-11T18:07:56Z

In my grammar (linked above), I already have multiple rules since the language is for educational purposes. Do you mean the performance will be even worse or the same as the way it is currently defined in my grammar?

msujew · 2025-03-11T18:10:41Z

Do you mean the performance will be even worse or the same as the way it is currently defined in my grammar?

It will be the same (maybe very slightly worse, due to the additional transformation step done to convert the flat list into an expression tree), but you won't gain anything from it. This PR allows to eliminate multiple levels of precedence in order to massively boost performance when parsing expressions.

ydaveluy · 2025-03-11T19:28:05Z

How does it works when types are not infered (using pre-defined interfaces)?

interface Expression {
}
interface BinaryOperation extends Expression {
	leftOperand: Expression
	rightOperand: Expression
	feature: string
}
interface UnaryOperation extends Expression {
	operand: Expression
	feature: string
}
// how to map leftOperand, rightOperand and feature ?
infix BinaryExpression on UnaryOperation: '%' > '^' > '*' | '/' > '+' | '-';

UnaryOperation returns Expression:
    {UnaryOperation} feature=("!" | "-" | "+" | "~")  operand=Literal
;

msujew · 2025-03-11T21:20:12Z

@ydaveluy Currently not supported, but I already plan to address this in a separate PR.

szarnekow · 2025-03-11T23:06:46Z

in order to massively boost performance when parsing expressions.

can you provide some numbers?

ydaveluy · 2025-03-12T06:32:05Z

@msujew The infix rule is very concise but contrary to all other rules it is not possible to infer the names of the attributes for the left operand, feature and right operand. If a future PR provides a syntax using pre-defined types it will mean maybe syntax very different.

Maybe a common syntax handling both cases would be more approriate, something like this for inferred types:

BinaryExpression: 
    left=BinaryExpression feature=( '%' > '^' > '*' | '/' > '+' | '-') right=BinaryExpression | PrimaryExpression;

and this for pre-defined types:

interface Expression {
}
interface BinaryExpression extends Expression {
	leftOperand: Expression
	rightOperand: Expression
	feature: string
}
BinaryExpression returns Expression: 
  {BinaryExpression}  left=BinaryExpression feature=( '%' > '^' > '*' | '/' > '+' | '-') right=BinaryExpression | PrimaryExpression;

I don't know if it is technically possible.

msujew · 2025-03-12T08:50:47Z

can you provide some numbers?

@szarnekow Sure, I just ran the branch through ~4100 lines of the arithmetics example code (parser only). On main, this currently takes 430ms, whereas this branch only requires 260ms. So we're talking about 40% improved performance wrt to expression parsing. I used this code and placed in the packages/arithmetics/test directory:

import { describe, expect, test } from 'vitest';
import { createArithmeticsServices } from '../src/language-server/arithmetics-module.js';
import { EmptyFileSystem } from 'langium';

describe('Test the arithmetics parsing', () => {

    const services = createArithmeticsServices(EmptyFileSystem);
    const parser = services.arithmetics.parser.LangiumParser;

    test('Simple expression', () => {
        let text = '1 + 2 ^ 3 * 4 - 5 / 6;';
        for (let i = 0; i < 12; i++) {
            text = text + text;
        }
        console.time('parse');
        const parseResult = parser.parse('module test ' + text);
        console.timeEnd('parse');
        expect(parseResult.parserErrors).toHaveLength(0);
    });
});

I don't know if it is technically possible.

@ydaveluy I'm not sure either. Your proposal still includes building a tree-like structure for infix operators - however, that's exactly what we want to prevent. Tranforming that into the expected flat structure we need for parsing is maybe possible, although you have so many degrees of freedom that it likely just becomes a bottomless hole. Instead I thought that something like this could work:

interface Expression {
}
interface BinaryExpression extends Expression {
	leftOperand: Expression
	rightOperand: Expression
	feature: string
}
infix BinaryExpression on Primary returns BinaryExpression {
  left: leftOperand,
  right: rightOperand,
  operator: feature
}: '%' > '^' > '*' | '/' > '+' | '-';

Or alternatively for inferred types:

infix BinaryExpression on Primary infers BinaryExpression {
  left: leftOperand,
  right: rightOperand,
  operator: feature
}: '%' > '^' > '*' | '/' > '+' | '-';

aabounegm · 2025-03-12T12:56:25Z

@ydaveluy's suggestion had me wondering, isn't the new infix Binary on Primary: '+'; syntax basically syntax sugar for Binary: Primary | Binary '+' Binary; I understand that left-recursion in general is not yet supported, but this infix rule seems to be a specific case of left-recursion. If my understanding is correct, would it also not be technically possible to just detect this specific pattern (Rule: Base | lhs=Rule op='+' rhs=Rule;) and perform the same transformation that this PR does to the infix rule, rather than introduce new syntax?

Lotes · 2025-03-16T21:59:42Z

examples/arithmetics/src/language-server/arithmetics.langium

-
-Modulo infers Expression:
-    PrimaryExpression ({infer BinaryExpression.left=current} operator='%' right=PrimaryExpression)*;
+infix BinaryExpression on PrimaryExpression: '%' > '^' > '*' | '/' > '+' | '-';


Normally the precedence of > is greater than the one of |. It reads a bit unnatural.
Adding parentheses could be an option to enumerate all alternatives.
Or maybe a equal sign instead of the |.
WDYT?

And why not using the comma operator instead of '>' ?

'%' , '^' , '*' | '/' , '+' | '-';

‚>‘ implies an order between the operators, that makes totally sense to me. Having a comma, you do not know what comes first: low or high precedence. But you can assume some convention, true.

You can mitigate this by writing each group in its own line:

infix BinaryExpression on PrimaryExpression: '%' > '^' > '*' | '/' > '+' | '-';

This is how I'd format such expressions for readability.

Lotes · 2025-03-16T22:02:36Z

packages/langium/src/parser/langium-parser.ts

@@ -63,7 +71,7 @@ export interface BaseParser {
    /**
     * Adds a new parser rule to the parser
     */
-    rule(rule: ParserRule, impl: RuleImpl): RuleResult;
+    rule(rule: ParserRule | InfixRule, impl: RuleImpl): RuleResult;


ParserRule | InfixRule occur together quite often. Maybe add a common type and use that one instead.

That would also help for isParserRule(rule) || isInfixRule(rule), which I've seen at least twice.

spoenemann

Great feature! I have a few conceptual proposals.

spoenemann · 2025-03-26T14:39:20Z

packages/langium/src/grammar/langium-grammar.langium

+
+InfixRuleOperators: precedences+=InfixRuleOperatorList ('>' precedences+=InfixRuleOperatorList)*;
+
+InfixRuleOperatorList: operators+=Keyword ('|' operators+=Keyword)*;


I assume the associativity is left-to-right?

Shall we add an option to change that?

InfixRuleOperatorList: (associativity=Associativity 'assoc')? operators+=Keyword ('|' operators+=Keyword)*; Associativity returns string: 'left' | 'right';

If omitted, the default should be left. With this, you can realize assignments as operators:

infix BinaryExpression on PrimaryExpression: '*' | '/' > '+' | '-' > right assoc '=' | '*=' | '/=' | '+=' | '-=';

Maybe also allow no assoc to forbid more than two operands on the same level?

spoenemann · 2025-03-26T14:47:35Z

packages/langium/src/grammar/langium-grammar.langium

@@ -74,6 +74,12 @@ ParserRule:
    (definesHiddenTokens?='hidden' '(' (hiddenTokens+=[AbstractRule:ID] (',' hiddenTokens+=[AbstractRule:ID])*)? ')')? ':'
    definition=Alternatives ';';

+InfixRule: 'infix' RuleNameAndParams 'on' call=RuleCall ':' operators=InfixRuleOperators ';';


We should allow infers and returns as usual. If a declared type is given, it must have the properties left, right, and operator with respective types.

Ah 👉 #1836 (comment) – 👍 for a separate PR, but let's not forget that.

spoenemann · 2025-03-26T14:51:03Z

packages/langium/src/utils/ast-utils.ts

@@ -16,7 +16,7 @@ import { inRange } from './cst-utils.js';
 * Link the `$container` and other related properties of every AST node that is directly contained
 * in the given `node`.
 */
-export function linkContentToContainer(node: AstNode): void {
+export function linkContentToContainer(node: AstNode, deep = false): void {


I'd try to avoid boolean parameters and use an options object instead.

spoenemann · 2025-03-26T14:59:02Z

To this discussion: #1836 (comment) (too bad GitHub doesn't have threads 😞)

I actually see value in standardizing the structure of BinaryOperator nodes. That can be very useful for other features that we might add later. So I vote for not giving the option to change the property names.

If you already have custom code relying on different property names, I'd use TypeScript rename refactoring to change it.

ydaveluy · 2025-04-06T06:36:12Z

If I understand correctly, this PR involves a post-parsing refactoring for binary expressions.

I wonder if this case could be handled with a more generic approach:

Write the BinaryExpression rule without precedence:
BinaryExpression: lhs=BinaryExpression op=('*'|'/'|'-'|'+'|'|'|'&') rhs=BinaryExpression | PrimaryExpression;

Provide an API to register refactoring actions for the parser (similar to what is done for validation):

        actions: RefactoringAction<ast.TestAstType> = {
            BinaryExpression: refactorBinaryExpression
        };
    parser.register(actions);
    refactorBinaryExpression(expr: ast.BinaryExpression): void {
       if(ast.isBinaryExpression(expr.lhs) && isBinaryExpression(expr.rhs)) {
           // do refactoring
       }
    }

Provide a generic action to restructure the AST for binary expressions with operator precedence.

This would be easily customizable by the user and could allow other types of refactoring.

msujew · 2025-04-06T09:51:08Z

If I understand correctly, this PR involves a post-parsing refactoring for binary expressions.

Right, but that's not the only advantage. I mainly created this change to speed up expression parsing performance (between 40-50% in my tests). This is done by converting the infix parser rule into a flat list of expressions and then constructing a parse tree. Your proposed change still constructs a parse tree, which is then getting rewritten. This would unfortunately eliminate the performance benefits of this PR.

BinaryExpression: lhs=BinaryExpression op=('*'|'/'|'-'|'+'|'|'|'&') rhs=BinaryExpression | PrimaryExpression;

Note that this syntax is left recursive, and we don't support left recursion (due to using an LL parser).

ydaveluy · 2025-04-06T10:56:14Z

Sorry the syntax was wrong...

Expression: PrimaryExpression ({BinaryExpression.lhs=current} operator=('*'|'/'|'-'|'+'|'|'|'&') rhs=PrimaryExpression)*;

msujew · 2025-04-06T15:02:10Z

Hm, I quite like the idea. I'll iterate on this during the course of the week :)

Edit: Thinking further about this, it becomes rather difficult to use this syntax to use the tree rewriting, mainly because it becomes impossible to decide which expressions are part of the "current" list of expressions, and which are nested (via usage of parenthesis, or similar).

Support infix operator rules

6598e54

msujew added the parser Parser related issue label Mar 11, 2025

msujew requested a review from spoenemann March 11, 2025 14:41

Fix completion test

9141ddb

Lotes reviewed Mar 16, 2025

View reviewed changes

Fix binary expression parsing

ca70591

msujew force-pushed the msujew/binary-operators branch from ea74ce1 to ca70591 Compare March 17, 2025 13:48

spoenemann reviewed Mar 26, 2025

View reviewed changes

spoenemann added this to the v4.0.0 milestone Apr 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support infix operator rules #1836

Support infix operator rules #1836

msujew commented Mar 11, 2025

aabounegm commented Mar 11, 2025

msujew commented Mar 11, 2025

aabounegm commented Mar 11, 2025

msujew commented Mar 11, 2025

ydaveluy commented Mar 11, 2025

msujew commented Mar 11, 2025

szarnekow commented Mar 11, 2025

ydaveluy commented Mar 12, 2025

msujew commented Mar 12, 2025 •

edited

Loading

aabounegm commented Mar 12, 2025

Lotes Mar 16, 2025

ydaveluy Mar 17, 2025

Lotes Mar 17, 2025

spoenemann Mar 26, 2025

Lotes Mar 16, 2025

spoenemann Mar 26, 2025

spoenemann left a comment

spoenemann Mar 26, 2025

spoenemann Mar 26, 2025

spoenemann Mar 26, 2025

spoenemann Mar 26, 2025

spoenemann commented Mar 26, 2025

ydaveluy commented Apr 6, 2025

msujew commented Apr 6, 2025

ydaveluy commented Apr 6, 2025

msujew commented Apr 6, 2025 •

edited

Loading


		InfixRuleOperators: precedences+=InfixRuleOperatorList ('>' precedences+=InfixRuleOperatorList)*;

		InfixRuleOperatorList: operators+=Keyword ('\|' operators+=Keyword)*;

Support infix operator rules #1836

Are you sure you want to change the base?

Support infix operator rules #1836

Conversation

msujew commented Mar 11, 2025

aabounegm commented Mar 11, 2025

msujew commented Mar 11, 2025

aabounegm commented Mar 11, 2025

msujew commented Mar 11, 2025

ydaveluy commented Mar 11, 2025

msujew commented Mar 11, 2025

szarnekow commented Mar 11, 2025

ydaveluy commented Mar 12, 2025

msujew commented Mar 12, 2025 • edited Loading

aabounegm commented Mar 12, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spoenemann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spoenemann commented Mar 26, 2025

ydaveluy commented Apr 6, 2025

msujew commented Apr 6, 2025

ydaveluy commented Apr 6, 2025

msujew commented Apr 6, 2025 • edited Loading

msujew commented Mar 12, 2025 •

edited

Loading

msujew commented Apr 6, 2025 •

edited

Loading