-
Notifications
You must be signed in to change notification settings - Fork 24
Adds better support for Typst #149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: default
Are you sure you want to change the base?
Conversation
Thanks for starting on this! When you think it's ready, request a review and I'll have a look. Also, please give a link to Typst documentation. |
Thanks, will surely do i know this is out of context but i have a question about a rule i have: local method = lex:tag("UNKNOWN", '#' * lexer.word * '.') * (lex:tag(lexer.FUNCTION_METHOD, lexer.word) * P('('))
lex:add_rule('func_method', method) this is supposed to match the following expression: |
This is what I achieved as of now some things are still not working correctly as I don't know enough LPeg to get them to work (like embedded parsing: notice the |
You always have to tag matched text with something in order to move on and tag stuff that follows. You can use the |
this restricts the styling (bold/italic) to only be in text mode, also adds 'link' rule
@orbitalquark, I think the work is ready for a review. I added comments in the code about things I wasn't sure how to do, so I’d love your feedback on those and any other suggestions you have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work on this! It looks really promising. As I've noted in a comment below, you've partially solved the "lexer embedded in itself" problem, and I am eager to iterate on it. In the meantime, I've left some comments and suggestions.
This is not a thorough review, so I may have more to say in a subsequent look-over, but it's a start. Thanks again!
EDIT: I realize I haven't addressed any of your inline code comments. Sorry about that. I will do so in a subsequent review. Hopefully there's enough for you to work on until then.
@@ -0,0 +1,125 @@ | |||
local lexer = require('lexer') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modern lexers use local lexer = lexer
now.
@@ -0,0 +1,125 @@ | |||
local lexer = require('lexer') | |||
local token = lexer.token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modern lexers don't need this anymore.
local embed_start = lex:tag('emb_tag', start) | ||
local embed_end = lexer:tag('emb_tag', S('}')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest using lexer.EMBEDDED
tag names so they are styled correctly.
} | ||
end | ||
|
||
local emb_lex = lexer.new('scripting') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use a more descriptive name like 'typst_scripting'.
local function add_rules(lexer_obj, pre) | ||
local rules = build_rules(pre) | ||
lexer_obj:add_rule('header', rules.header) | ||
lexer_obj:add_rule('field', rules.field) | ||
lexer_obj:add_rule('function', rules.mod_func + rules.func) | ||
lexer_obj:add_rule('method', rules.method) | ||
lexer_obj:add_rule('label', rules.label + rules.label_two) | ||
lexer_obj:add_rule('code', lex:tag(lexer.CODE, rules.code)) | ||
lexer_obj:add_rule('string', lex:tag(lexer.STRING, rules.string)) | ||
lexer_obj:add_rule('link', lex:tag(lexer.LINK, rules.link)) | ||
lexer_obj:add_rule('math', lex:tag('environment.math', rules.math)) | ||
lexer_obj:add_rule('keyword', rules.keyword) | ||
lexer_obj:add_rule('identifier', rules.iden) | ||
--lexer_obj:add_rule('number', lex:tag(lexer.NUMBER, rules.numeric_value)) | ||
lexer_obj:add_rule('list', rules.list) | ||
lexer_obj:add_rule('comment', rules.comment) | ||
lexer_obj:add_rule('operator', rules.operator) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we can bring the contents of build_rules()
into this function. When trying to read this function, I find myself constantly scrolling up to see how each pattern is defined. Making changes would be a bit difficult.
lex:add_rule('bold', bold) | ||
lex:add_rule('italic', italic) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to keep these out of build_rules()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to keep build_rules generic, so it has the shared rules in both scripting/text mode, as for bold/italic, these must only be applied in text mode, as an example:
#let countwords(s) = {
if s == "" {
return 0 * 0 * 5
}
}
In the above example if we kept the bold/italic in the build_rules, it would tag the 0 in 0 * 0 * 5
as bold, which is not correct, since in scripting mode, -plain- text must be in brackets, and in consequence the italic/bold must be only applied in that case, as an example:
#let somefn(word) = {
let text = [This is a *plain* text];
}
add_rules(emb_lex, '') | ||
|
||
lex:embed(emb_lex, embed_start, embed_end) | ||
|
||
add_rules(lex, '#') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'm completely blown away. You've partially solved the "language embedded in itself" problem. I would really like to use something like
local self = lexer.load('typst', 'typst_scripting')
lex:embed(self, start_rule, end_rule)
You cannot do this now because of an endless loop, but I would totally work on a fix to try and make this viable.
|
||
|
||
lex:set_word_list(lexer.KEYWORD, { | ||
'if', 'else', 'for', 'while', 'let', 'set', 'import', 'include', 'return', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit of a nitpick for now, but in my brief testing,
#while n < 10 {
n = (n * 2) - 1
(n,)
}
the #while
was not highlighting as a keyword.
I just thought I'd make a note.
EDIT: I saw you already noted something similar in an earlier comment, so this is just another instance to report.
Sorry to commit directly to your PR. I was trying to figure out how to show a proof-of-concept for how we can embed this lexer within itself: 1a3f6f9. I thought I'd be on my own branch or something, but I guess not. I'm not very well versed with how git and GitHub work together. Sorry about all the formatting changes. I forgot to turn off my autoformatter before I started making changes. Hopefully you get the idea. |
Thanks so much for the notes, ill try to apply the changes required, as for the embedding mode, i'll look more into it later, and try to find something to fix the unbalanced brackets/curly braces and the recursive tagging in embedding mode P.S: the docs are very very helpful and straight forward, probably one of the best docs i've came across for a long time since the BSDs' manpages, usually i'd spam some LLM w/ the man page or the docs and ask him what i want (i'm looking at you GNU manpages), but the Scintillua API docs had exactly what i needed |
This PR adds basic support for Typst. It's pretty basic for now and still has many areas to improve, since I'm still discovering LPeg, I'd love to hear any thoughts or suggestions you have!!!
Typst Reference
Preview from vis editor