Skip to content

Commit 2ba61cf

Browse files
authored
Merge #16 remove scanner, rewrite grammar.js
2 parents d1900d9 + 0f85e4d commit 2ba61cf

24 files changed

+5566
-1715
lines changed

README.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
tree-sitter-vimdoc
2+
==================
3+
4+
This grammar intentionally support a subset of the vimdoc "spec"; predictable
5+
results are the primary goal, so that _output_ formats (e.g. HTML) are
6+
well-formed; the _input_ (vimdoc) is secondary. The first step should always be
7+
to try to fix the input (within reason) rather than insist on a grammar that
8+
handles vimdoc's endless quirks.
9+
10+
Notes
11+
-----
12+
13+
- vimdoc format "spec":
14+
- [:help help-writing](https://neovim.io/doc/user/helphelp.html#help-writing)
15+
- https://github.com/nanotee/vimdoc-notes
16+
- whitespace is intentionally captured in `(word)`, because it is often necessary to be
17+
able to correctly layout vim help files (especially old/legacy).
18+
- `(codeblock)` is contained by `(line)` because `>` can start a code block at the end of a line.
19+
- `(column_heading)` is contained by `(line)` because `>` (to close
20+
a `(codeblock)` can appear at the start of `(column_heading)`.
21+
- `h1` ("Heading 1"): `======` followed by text and optional `*tags*`.
22+
- `h2` ("Heading 2"): `------` followed by text and optional `*tags*`.
23+
- `h3` ("Heading 3"): only UPPERCASE WORDS, followed by optional `*tags*`.
24+
25+
Known issues
26+
------------
27+
28+
- `line_li` ("list item") is _experimental_. It doesn't support nesting yet and
29+
it may not work well; you can treat it as a normal `line` for layout purposes.
30+
- `codeblock` ">" must not be preceded only by tabs, a space char is required (" >").
31+
See `:help lcs-tab` for example. Currently the grammar doesn't enforce this.
32+
- `codeblock` terminated by an "implicit stop" (i.e. no terminating `<`)
33+
consumes the first char of the terminating line, and continues the parent
34+
`block`, preventing top-level forms like `h1`, `h2` from being recognized
35+
until a blank line is encountered.
36+
- `line` in a `codeblock` does not contain `word` atoms, it's just the full
37+
raw text line including whitespace. This is somewhat dictated by its
38+
"preformatted" nature; parsing the contents would require loading a "child"
39+
language (injection). See [#2](https://github.com/vigoux/tree-sitter-vimdoc/issues/2).
40+
- `url` doesn't handle _surrounding_ parens. E.g. `(https://example.com/#yay)` yields `word`
41+
- `url` doesn't handle _nested_ parens. E.g. `(https://example.com/(foo)#yay)`
42+
- Ideally `block_end` should consume the last block of the document _only_ if that
43+
block is missing a trailing blank line or EOL ("\n").
44+
- TODO: consider simply _not supporting_ docs without EOL?
45+
- Ideally `line_noeol` should consume the last line of the document _only_ if
46+
that line is missing EOL ("\n").
47+
- TODO: consider simply _not supporting_ docs without EOL?
48+
49+
TODO
50+
----
51+
52+
- `line_noeol` is a special-case to support documents that don't end in EOL.
53+
Grammar could be a bit simpler if we just require EOL at end of document.
54+
- `line_modeline` (only at EOF)

corpus/arguments.txt

Lines changed: 38 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,53 @@
11
================================================================================
2-
Simple argument
2+
simple argument
33
================================================================================
44
This in an argument: {arg}
55
--------------------------------------------------------------------------------
66

77
(help_file
8-
(line
9-
(word)
10-
(word)
11-
(word)
12-
(word)
13-
(argument
14-
(word))))
8+
(block
9+
(line
10+
(word)
11+
(word)
12+
(word)
13+
(word)
14+
(argument
15+
(word)))))
1516

1617
================================================================================
17-
Multiple arguments on the same line
18+
multiple arguments on the same line
1819
================================================================================
19-
2020
{foo} {bar} {baz}
2121

2222
--------------------------------------------------------------------------------
2323

2424
(help_file
25-
(line
26-
(argument
27-
(word))
28-
(argument
29-
(word))
30-
(argument
25+
(block
26+
(line
27+
(argument
28+
(word))
29+
(argument
30+
(word))
31+
(argument
32+
(word)))))
33+
34+
================================================================================
35+
NOT an argument
36+
================================================================================
37+
{foo "{bar}" `{baz}` |{baz| } {}
38+
39+
--------------------------------------------------------------------------------
40+
41+
(help_file
42+
(block
43+
(line
44+
(argument
45+
(word)
46+
(ERROR))
47+
(word)
48+
(codespan
49+
(word))
50+
(taglink
51+
(word))
52+
(word)
3153
(word))))

corpus/backtick.txt

Lines changed: 0 additions & 45 deletions
This file was deleted.

corpus/code_block.txt

Lines changed: 0 additions & 96 deletions
This file was deleted.

0 commit comments

Comments
 (0)