Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@ in URLs is interpreted as a citation #10570

Open
InAnYan opened this issue Jan 27, 2025 · 3 comments
Open

@ in URLs is interpreted as a citation #10570

InAnYan opened this issue Jan 27, 2025 · 3 comments
Labels

Comments

@InAnYan
Copy link

InAnYan commented Jan 27, 2025

Explain the problem.
Example: https://pandoc.org/try/?params=%7B%22text%22%3A%22https%3A%2F%2Fmedium.com%2F%40mirzasamaddanat%2Fbuild-your-own-large-language-model-llm-from-scratch-using-pytorch-0614f2ab3051%22%2C%22to%22%3A%22html5%22%2C%22from%22%3A%22markdown%22%2C%22standalone%22%3Afalse%2C%22embed-resources%22%3Afalse%2C%22table-of-contents%22%3Afalse%2C%22number-sections%22%3Afalse%2C%22citeproc%22%3Afalse%2C%22html-math-method%22%3A%22plain%22%2C%22wrap%22%3A%22auto%22%2C%22highlight-style%22%3Anull%2C%22files%22%3A%7B%7D%2C%22template%22%3Anull%7D.

The URL I provided contains @ symbol, which pandoc interprets as a citation. This behavior, I guess, is general, and seen in other conversions (take for example Markdown to org-mode).

BTW, if I surround URL with <>, then there is no issue: https://pandoc.org/try/?params=%7B%22text%22%3A%22%3Chttps%3A%2F%2Fmedium.com%2F%40mirzasamaddanat%2Fbuild-your-own-large-language-model-llm-from-scratch-using-pytorch-0614f2ab3051%3E%22%2C%22to%22%3A%22html5%22%2C%22from%22%3A%22markdown%22%2C%22standalone%22%3Afalse%2C%22embed-resources%22%3Afalse%2C%22table-of-contents%22%3Afalse%2C%22number-sections%22%3Afalse%2C%22citeproc%22%3Afalse%2C%22html-math-method%22%3A%22plain%22%2C%22wrap%22%3A%22auto%22%2C%22highlight-style%22%3Anull%2C%22files%22%3A%7B%7D%2C%22template%22%3Anull%7D. However, for me this is not practical, as I have a bunch of URLs without those brackets.

Pandoc version?
3.6.2 (Online)

And actually locally on Arch Linux 3.1.11.1

@InAnYan InAnYan added the bug label Jan 27, 2025
@InAnYan
Copy link
Author

InAnYan commented Jan 27, 2025

Sorry, guess I lied a bit, here are some observations (converting from Markdown, bad means with citation, good without):

  • html: bad.
  • html4: bad.
  • html5: bad.
  • asciidoc: good.
  • docbook: good.
  • docuwiki: good.
  • latex: good.
  • man: good.
  • native: bad. (I guess this is the internal representation, right?).`
  • org: bad.
  • typst: bad.

Probably I didn't have to test them, as native shows that there is indeed a citation

@jgm
Copy link
Owner

jgm commented Jan 29, 2025

pandoc -f markdown+autolink_bare_uris will parse the whole thing as a URI and make it a hyperlink.

We could include the link parsing logic by default (and just not add the link when autolink_bare_uris is disabled). That would provide better results in this case, ensuring that the whole URI is parsed as a unit. There would be some performance cost which we could measure.

@jgm
Copy link
Owner

jgm commented Jan 29, 2025

The function bareURL in T.P.Readers.Markdown.hs is what would need modifying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants