-
Notifications
You must be signed in to change notification settings - Fork 77
Support infix operator rules #1836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Amazing effort on a long-requested feature! I just have a question: how would this work if I want to create a different AST node type for each operator (like here)? Would something like this work?
|
@aabounegm Good question: Splitting the infix operator rule into multiple rules would result in decreased performance - we get a boost by reducing the amount of parser rules (and therefore subrule calls) we need to create. Instead, I would propose something like this perhaps?
I think that should work pretty well. However, I would attempt to implement such a feature in a separate step/PR. This PR is complex enough as it is. |
In my grammar (linked above), I already have multiple rules since the language is for educational purposes. Do you mean the performance will be even worse or the same as the way it is currently defined in my grammar? |
It will be the same (maybe very slightly worse, due to the additional transformation step done to convert the flat list into an expression tree), but you won't gain anything from it. This PR allows to eliminate multiple levels of precedence in order to massively boost performance when parsing expressions. |
How does it works when types are not infered (using pre-defined interfaces)?
|
@ydaveluy Currently not supported, but I already plan to address this in a separate PR. |
can you provide some numbers? |
@msujew The infix rule is very concise but contrary to all other rules it is not possible to infer the names of the attributes for the left operand, feature and right operand. If a future PR provides a syntax using pre-defined types it will mean maybe syntax very different. Maybe a common syntax handling both cases would be more approriate, something like this for inferred types:
and this for pre-defined types:
I don't know if it is technically possible. |
@szarnekow Sure, I just ran the branch through ~4100 lines of the arithmetics example code (parser only). On import { describe, expect, test } from 'vitest';
import { createArithmeticsServices } from '../src/language-server/arithmetics-module.js';
import { EmptyFileSystem } from 'langium';
describe('Test the arithmetics parsing', () => {
const services = createArithmeticsServices(EmptyFileSystem);
const parser = services.arithmetics.parser.LangiumParser;
test('Simple expression', () => {
let text = '1 + 2 ^ 3 * 4 - 5 / 6;';
for (let i = 0; i < 12; i++) {
text = text + text;
}
console.time('parse');
const parseResult = parser.parse('module test ' + text);
console.timeEnd('parse');
expect(parseResult.parserErrors).toHaveLength(0);
});
});
@ydaveluy I'm not sure either. Your proposal still includes building a tree-like structure for infix operators - however, that's exactly what we want to prevent. Tranforming that into the expected flat structure we need for parsing is maybe possible, although you have so many degrees of freedom that it likely just becomes a bottomless hole. Instead I thought that something like this could work:
Or alternatively for inferred types:
|
@ydaveluy's suggestion had me wondering, isn't the new |
|
||
Modulo infers Expression: | ||
PrimaryExpression ({infer BinaryExpression.left=current} operator='%' right=PrimaryExpression)*; | ||
infix BinaryExpression on PrimaryExpression: '%' > '^' > '*' | '/' > '+' | '-'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally the precedence of > is greater than the one of |. It reads a bit unnatural.
Adding parentheses could be an option to enumerate all alternatives.
Or maybe a equal sign instead of the |.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why not using the comma operator instead of '>' ?
'%' , '^' , '*' | '/' , '+' | '-';
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
‚>‘ implies an order between the operators, that makes totally sense to me. Having a comma, you do not know what comes first: low or high precedence. But you can assume some convention, true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can mitigate this by writing each group in its own line:
infix BinaryExpression on PrimaryExpression:
'%'
> '^'
> '*' | '/'
> '+' | '-';
This is how I'd format such expressions for readability.
@@ -63,7 +71,7 @@ export interface BaseParser { | |||
/** | |||
* Adds a new parser rule to the parser | |||
*/ | |||
rule(rule: ParserRule, impl: RuleImpl): RuleResult; | |||
rule(rule: ParserRule | InfixRule, impl: RuleImpl): RuleResult; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParserRule | InfixRule occur together quite often. Maybe add a common type and use that one instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would also help for isParserRule(rule) || isInfixRule(rule)
, which I've seen at least twice.
ea74ce1
to
ca70591
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great feature! I have a few conceptual proposals.
|
||
InfixRuleOperators: precedences+=InfixRuleOperatorList ('>' precedences+=InfixRuleOperatorList)*; | ||
|
||
InfixRuleOperatorList: operators+=Keyword ('|' operators+=Keyword)*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the associativity is left-to-right?
Shall we add an option to change that?
InfixRuleOperatorList:
(associativity=Associativity 'assoc')?
operators+=Keyword ('|' operators+=Keyword)*;
Associativity returns string: 'left' | 'right';
If omitted, the default should be left
. With this, you can realize assignments as operators:
infix BinaryExpression on PrimaryExpression:
'*' | '/'
> '+' | '-'
> right assoc '=' | '*=' | '/=' | '+=' | '-=';
Maybe also allow no assoc
to forbid more than two operands on the same level?
@@ -74,6 +74,12 @@ ParserRule: | |||
(definesHiddenTokens?='hidden' '(' (hiddenTokens+=[AbstractRule:ID] (',' hiddenTokens+=[AbstractRule:ID])*)? ')')? ':' | |||
definition=Alternatives ';'; | |||
|
|||
InfixRule: 'infix' RuleNameAndParams 'on' call=RuleCall ':' operators=InfixRuleOperators ';'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should allow infers
and returns
as usual. If a declared type is given, it must have the properties left
, right
, and operator
with respective types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah 👉 #1836 (comment) – 👍 for a separate PR, but let's not forget that.
@@ -16,7 +16,7 @@ import { inRange } from './cst-utils.js'; | |||
* Link the `$container` and other related properties of every AST node that is directly contained | |||
* in the given `node`. | |||
*/ | |||
export function linkContentToContainer(node: AstNode): void { | |||
export function linkContentToContainer(node: AstNode, deep = false): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd try to avoid boolean parameters and use an options
object instead.
To this discussion: #1836 (comment) (too bad GitHub doesn't have threads 😞) I actually see value in standardizing the structure of BinaryOperator nodes. That can be very useful for other features that we might add later. So I vote for not giving the option to change the property names. If you already have custom code relying on different property names, I'd use TypeScript rename refactoring to change it. |
If I understand correctly, this PR involves a post-parsing refactoring for binary expressions. I wonder if this case could be handled with a more generic approach: Write the BinaryExpression rule without precedence: Provide an API to register refactoring actions for the parser (similar to what is done for validation):
Provide a generic action to restructure the AST for binary expressions with operator precedence. This would be easily customizable by the user and could allow other types of refactoring. |
Right, but that's not the only advantage. I mainly created this change to speed up expression parsing performance (between 40-50% in my tests). This is done by converting the infix parser rule into a flat list of expressions and then constructing a parse tree. Your proposed change still constructs a parse tree, which is then getting rewritten. This would unfortunately eliminate the performance benefits of this PR.
Note that this syntax is left recursive, and we don't support left recursion (due to using an LL parser). |
Sorry the syntax was wrong...
|
Hm, I quite like the idea. I'll iterate on this during the course of the week :) Edit: Thinking further about this, it becomes rather difficult to use this syntax to use the tree rewriting, mainly because it becomes impossible to decide which expressions are part of the "current" list of expressions, and which are nested (via usage of parenthesis, or similar). |
Closes #1065
Adds support for a special infix operator syntax that aims to improve Langium on multiple fronts:
Note that the same AST node is constructed as if you were to write the parser rules manually. Also note that this PR assumes that all binary operators are left-associative (i.e.
A + B + C
is parsed as(A + B) + C
). We can revisit this decision in case we find a case for right-associative grammar rules. Note that this only requires changes in the tree restructuring routine.