Skip to content

Adds better support for Typst #149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: default
Choose a base branch
from

Conversation

2ta00ha3
Copy link

@2ta00ha3 2ta00ha3 commented Mar 27, 2025

This PR adds basic support for Typst. It's pretty basic for now and still has many areas to improve, since I'm still discovering LPeg, I'd love to hear any thoughts or suggestions you have!!!
Typst Reference

Preview from vis editor

image

@2ta00ha3 2ta00ha3 marked this pull request as draft March 27, 2025 11:40
@orbitalquark
Copy link
Owner

Thanks for starting on this! When you think it's ready, request a review and I'll have a look. Also, please give a link to Typst documentation.

@2ta00ha3
Copy link
Author

2ta00ha3 commented Mar 27, 2025

Thanks, will surely do

i know this is out of context but i have a question about a rule i have:

local method = lex:tag("UNKNOWN", '#' * lexer.word * '.') * (lex:tag(lexer.FUNCTION_METHOD, lexer.word) * P('('))
lex:add_rule('func_method', method)

this is supposed to match the following expression:
#dict.len()
and it does, but I don't want to tag the #dict. as anything for now (not even UNKNOWN), but only use it to match the whole expr and only tag len, similar to non-input-consuming operators in lpeg, which when i try to use them the expression isn't matched anymore

@2ta00ha3
Copy link
Author

2ta00ha3 commented Mar 28, 2025

This is what I achieved as of now

2025-03-28_17-27-55_focused_window

some things are still not working correctly as I don't know enough LPeg to get them to work (like embedded parsing: notice the if (on the right pane) in the #for block not being tagged as keyword)

Typst's syntax crate (includes lexer, parser...)

@orbitalquark
Copy link
Owner

Thanks, will surely do

i know this is out of context but i have a question about a rule i have:

local method = lex:tag("UNKNOWN", '#' * lexer.word * '.') * (lex:tag(lexer.FUNCTION_METHOD, lexer.word) * P('('))
lex:add_rule('func_method', method)

this is supposed to match the following expression: #dict.len() and it does, but I don't want to tag the #dict. as anything for now (not even UNKNOWN), but only use it to match the whole expr and only tag len, similar to non-input-consuming operators in lpeg, which when i try to use them the expression isn't matched anymore

You always have to tag matched text with something in order to move on and tag stuff that follows. You can use the lexer.DEFAULT tag for the stuff you don't care about.

@2ta00ha3 2ta00ha3 marked this pull request as ready for review March 31, 2025 23:05
@2ta00ha3
Copy link
Author

2ta00ha3 commented Apr 9, 2025

@orbitalquark, I think the work is ready for a review. I added comments in the code about things I wasn't sure how to do, so I’d love your feedback on those and any other suggestions you have

Copy link
Owner

@orbitalquark orbitalquark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this! It looks really promising. As I've noted in a comment below, you've partially solved the "lexer embedded in itself" problem, and I am eager to iterate on it. In the meantime, I've left some comments and suggestions.

This is not a thorough review, so I may have more to say in a subsequent look-over, but it's a start. Thanks again!

EDIT: I realize I haven't addressed any of your inline code comments. Sorry about that. I will do so in a subsequent review. Hopefully there's enough for you to work on until then.

@@ -0,0 +1,125 @@
local lexer = require('lexer')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modern lexers use local lexer = lexer now.

@@ -0,0 +1,125 @@
local lexer = require('lexer')
local token = lexer.token
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modern lexers don't need this anymore.

Comment on lines +80 to +81
local embed_start = lex:tag('emb_tag', start)
local embed_end = lexer:tag('emb_tag', S('}'))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using lexer.EMBEDDED tag names so they are styled correctly.

}
end

local emb_lex = lexer.new('scripting')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a more descriptive name like 'typst_scripting'.

Comment on lines +83 to +100
local function add_rules(lexer_obj, pre)
local rules = build_rules(pre)
lexer_obj:add_rule('header', rules.header)
lexer_obj:add_rule('field', rules.field)
lexer_obj:add_rule('function', rules.mod_func + rules.func)
lexer_obj:add_rule('method', rules.method)
lexer_obj:add_rule('label', rules.label + rules.label_two)
lexer_obj:add_rule('code', lex:tag(lexer.CODE, rules.code))
lexer_obj:add_rule('string', lex:tag(lexer.STRING, rules.string))
lexer_obj:add_rule('link', lex:tag(lexer.LINK, rules.link))
lexer_obj:add_rule('math', lex:tag('environment.math', rules.math))
lexer_obj:add_rule('keyword', rules.keyword)
lexer_obj:add_rule('identifier', rules.iden)
--lexer_obj:add_rule('number', lex:tag(lexer.NUMBER, rules.numeric_value))
lexer_obj:add_rule('list', rules.list)
lexer_obj:add_rule('comment', rules.comment)
lexer_obj:add_rule('operator', rules.operator)
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we can bring the contents of build_rules() into this function. When trying to read this function, I find myself constantly scrolling up to see how each pattern is defined. Making changes would be a bit difficult.

Comment on lines +10 to +11
lex:add_rule('bold', bold)
lex:add_rule('italic', italic)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to keep these out of build_rules()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to keep build_rules generic, so it has the shared rules in both scripting/text mode, as for bold/italic, these must only be applied in text mode, as an example:

#let countwords(s) = {
  if s == "" {
    return 0 * 0 * 5
  }
}

In the above example if we kept the bold/italic in the build_rules, it would tag the 0 in 0 * 0 * 5 as bold, which is not correct, since in scripting mode, -plain- text must be in brackets, and in consequence the italic/bold must be only applied in that case, as an example:

#let somefn(word) = {
   let text = [This is a *plain* text];
}

Comment on lines +105 to +109
add_rules(emb_lex, '')

lex:embed(emb_lex, embed_start, embed_end)

add_rules(lex, '#')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm completely blown away. You've partially solved the "language embedded in itself" problem. I would really like to use something like

local self = lexer.load('typst', 'typst_scripting')
lex:embed(self, start_rule, end_rule)

You cannot do this now because of an endless loop, but I would totally work on a fix to try and make this viable.



lex:set_word_list(lexer.KEYWORD, {
'if', 'else', 'for', 'while', 'let', 'set', 'import', 'include', 'return',
Copy link
Owner

@orbitalquark orbitalquark Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of a nitpick for now, but in my brief testing,

#while n < 10 {
  n = (n * 2) - 1
  (n,)
}

the #while was not highlighting as a keyword.

I just thought I'd make a note.

EDIT: I saw you already noted something similar in an earlier comment, so this is just another instance to report.

@orbitalquark
Copy link
Owner

orbitalquark commented Apr 9, 2025

Sorry to commit directly to your PR. I was trying to figure out how to show a proof-of-concept for how we can embed this lexer within itself: 1a3f6f9. I thought I'd be on my own branch or something, but I guess not. I'm not very well versed with how git and GitHub work together.

Sorry about all the formatting changes. I forgot to turn off my autoformatter before I started making changes. Hopefully you get the idea.

@2ta00ha3
Copy link
Author

2ta00ha3 commented Apr 9, 2025

Thanks so much for the notes, ill try to apply the changes required, as for the embedding mode, i'll look more into it later, and try to find something to fix the unbalanced brackets/curly braces and the recursive tagging in embedding mode

P.S: the docs are very very helpful and straight forward, probably one of the best docs i've came across for a long time since the BSDs' manpages, usually i'd spam some LLM w/ the man page or the docs and ask him what i want (i'm looking at you GNU manpages), but the Scintillua API docs had exactly what i needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants