Skip to content

Support block-specific style overrides #13611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

hmedina
Copy link

@hmedina hmedina commented Jun 2, 2025

Purpose

Allow local overrides of the Pygments style, on a per-code-block basis.

This work extends the directives for code-block, sourcecode, literalinclude and code, by introducing two new options, style-light and style-dark. As the app parses the document and discovers these, the builders collects the information. This allows Pygments to produce CSS or STY files (for HTML & LaTeX) for the relevant classes. These are moved from the initializing phase of the builder, to the finalizing phase.

Special attention was paid to light & dark modes. For most builders (e.g. LaTeX), this distinction is ignored, with only the light style having any effect; this mirrors how currently, the configuration value pygments_dark_style has no effect on such builders. For the HTML family builders, if the theme supports a dark style, then the pygments_dark.css file gains the appropriate selectors, independently of what happens to the pygments.css file for light styles. Since in the general sense, a user need not track the overrides for light style in the same fashion as for the dark style, the selectors are generated by hashing the docutils node. In a CPython implementation, this returns a pointer; as the docutils hierarchy is a tree, a pointer to the code-block appears sufficient for these purposes, as collisions should not happen; this said, I'm open to suggestions. The single-style builders, like LaTeX, use a simpler tracking for their command prefixes (used similarly as CSS selectors in the STY files).

The files have been linted with Ruff.

The new options are tested in the test suite. I'm unsure how to incorporate more advanced testing for other parts, so I welcome feedback (should I add tests to the builders?).

References

hmedina added 21 commits May 10, 2025 04:24
* Code highlight directives accept two new options, style-light and style-dark; these override the default Pygments style with a block-specific one, but only for the block in question
* Classes amended were CodeBlock, LiteralInclude, and Code; this means directives code-block, sourcecode, literalinclude, and code
* A theme may support light and/or dark styles; as already done, light and dark styles must be tracked separately. For this overhaul, a data structure is created inside the HtmlBuilder; this tracks the relevant styles, their associated PygmentsBridge highlighter, and the code blocks they're associated with.
* Association with code blocks is handled by using the docutil's node hash; this is used as a CSS selector.
* As the Html5Translator visits elements of the docutils tree, it discovers their light and dark styles, if any. It populates the HtmlBuilder's data structure.
* To finalize, the CSS sheets are created. Since the Pygments style is now a "generated" file rather than static (it depends on what style overrides document contains), the function call is moved down: the HtmlBuilder's create_pygments_style_file got moved from copy_static_files() into a task under the finish() method.
* CSS selectors are grouped to supply different code blocks with the same override style, if the user so chooses (Pygments handles this internally if presented the selectors). This minimizes the length of the CSS files. They're also annotated with a comment for their override style.
* There is a CSS media feature that is set by the user agent: https://developer.mozilla.org/en-US/docs/Web/CSS/@media/prefers-color-scheme ; we add those specifiers to the HTML builder's light and dark Pygments style sheet writers
* Use media queries and scopes to include light and dark styles in a single CSS file
* `app.registry.css_files` apparently only tracked "extra" files, but not the default `pygments.css` file, so the `test_theming.test_dark_style` block removed that check
* Added a `stylename` attribute to `PygmentsBridge` objects, so they can track the "pretty" style name used throughout Pygments, rather than relying on the classname of the associated style, which is not necessarily what is declared in the Pygments style entry-point installation
* `python_docs_theme` uses separate CSS files for light and dark modes, with a simple javascript function to toggle what is shown. It doesn't decorate either CSS file with a `@media` query. This seems advantageous, allowing the user to select their preferred view, and aligns with the philosophy that "the theme is not part of the document, but just a view of it" (paraphrased from MDN)
* bugfix: specialized dark iterator used wrong value for CSS sheet building
* As for the HTML builder, the building of the style sheet is moved from the `prepare_writing` method to the `finish` one; overriding styles are discovered on-the-fly, so the `.sty` file can only be finalized after the document's nodes have been visited
@hmedina
Copy link
Author

hmedina commented Jun 3, 2025

The issue with the pypy test concerns the usage in sphinx\highlighting.py l.236, where I pass a list of strings to the Pygments function.

This is valid use of said function html.py#L516, so I am unsure how to tell mypy about this; it appears to be confused and checking against the base class formatter.py, which uses arg='', instead of the HTML formatter, which uses arg=None

@hmedina
Copy link
Author

hmedina commented Jun 11, 2025

All the tests are passing, the documentation is updated, and there's no visible issues left, so I don't have any more work planned on this. If there's something missing, or there's concerns, I'm happy to work / address them; any feedback would be welcome. As I mention in the Discussion linked in the first post, I have a project that requires this functionality

jfbu

This comment was marked as resolved.

jfbu

This comment was marked as resolved.

hmedina added 2 commits June 14, 2025 04:30
* Whether a character is in the correct category for the definition of a LaTeX macro can, in the general sense, only be determined at compile time. As a fall-back, "valid" macro names generally use the a-zA-Z range; so we use this to replace any character not in those ranges with an uppercase Z using regular expressions.
* Since the user-facing names come from the SetupTools installation entry-points, they need not match the `name` attribute in the associated classes (sigh). Added to this, something needs to be printed to the LaTeX file. So, on the Python side and any user-facing usage retain the user-given name (this also helps avoid collisions, as Z-replacement loses information); any LaTeX-printed file will use the "sanitized" version in the macro names where likely-wrong-category characters got replaced with a Z. A header / comment in the .sty file specifies the user-given name for that portion of the file
* The sphinx re-definitions of "problematic" LaTeX characters (e.g. \PYGam{\&} ) gets overhauled; the string now contains an `override` key, for use with Python's `str.format(key=value)` syntax. When given an empty string, the string prior to this commit is returned; otherwise, appropriate overrides are generated for the various style-specific special characters. However, the usage in `sphinx/texinputs/sphinxlatexliterals.sty`, lines 561-650, seem to have the `PYG` prefix baked-in...
@jfbu

This comment was marked as resolved.

@jfbu

This comment was marked as resolved.

@jfbu
Copy link
Contributor

jfbu commented Jun 15, 2025

@hmedina I have implemented my remarks to achieve full LaTeX support at https://github.com/jfbu/sphinx/tree/multistyle.

I think at some places your docstrings are a bit long and should be hardwrapped for shorter line lengths. I am not too happy about using twice and not only once md5, but it works. I checked the wrapping of long codelines inclusive of verbatimforcewraps=true inside 'sphinxsetup' value as key of latex_elements and all works perfectly. As indicated in a TODO comment perhaps at a later stage some slight change will be made. I may also consider passing the sanitized stylename rather as option to \fvset as the way hllines is handled. Maybe I will refactor a bit the way the sanitized stylename is passed over to the sphinxVerbatim. But it does work at this stage. By the way I observed some breakage in PDF builds for the parent commit (current HEAD of this branch) with longer code-blocks than I had been using so far, so we have to be careful to test this with long code blocks. Which I did at my locale.

Anyway, do you mind if I push that commit to this branch?

@jfbu jfbu added the type:enhancement enhance or introduce a new feature label Jun 15, 2025
@hmedina
Copy link
Author

hmedina commented Jun 16, 2025

@jfbu hey there! I do not mind at all if you push your work to this branch; I'm very happy the full feature set is kept.

Once you merge into this PR, I'll take a look at the docstrings; I kept them at the Ruff setting (95 characters?), but if a shorter length is preferable I'll shorten them, just let me know to what length.

As for the MD5 hashing, it might be appropriate to replace the two calls and create a utility function just for providing "sanitized" LaTeX macro names; there's some functions in sphinx/util/texescape.py, a file that's already imported on both the sphinx/writers/latex.py and sphinx/highlighting.py; if that sounds like a good idea, I can do that tweak.

For the PDF breaking, what kind of break are you seeing? I just tried with the ~270 lines of highlighting.py and I got no errors, nor visible issues with the PDF

Edit: a couple make clean later, I'm seeing an issue with long code blocks; they're missing in the .tex file

@jfbu
Copy link
Contributor

jfbu commented Jun 17, 2025

@jfbu hey there! I do not mind at all if you push your work to this branch; I'm very happy the full feature set is kept.

This is done. I will need to investigate why the RTD build test fails. (Maybe this will be a bit later as I have tasks ; the same commit on my clone passed all tests).

Once you merge into this PR, I'll take a look at the docstrings; I kept them at the Ruff setting (95 characters?), but if a shorter length is preferable I'll shorten them, just let me know to what length.

I personnally have a preference for 80 or even 78 charactes for line lengths in docstrings, but maybe we don't enforce it here. Maybe wait for a more global review of this PR. Things may need some attention like referring to builders from the LaTeX writer.

As for the MD5 hashing, it might be appropriate to replace the two calls and create a utility function just for providing "sanitized" LaTeX macro names; there's some functions in sphinx/util/texescape.py, a file that's already imported on both the sphinx/writers/latex.py and sphinx/highlighting.py; if that sounds like a good idea, I can do that tweak.

Yes another approach which I had considered avoids hashing altogether. It simply enumerates all the encountered specialized block styles, and output them as a string acceptable to LaTeX in command names i.e. only a-zA-Z with the numeric index 0, 1, ... converted to a suitable ascii-letter representation. The problem is that we have to make sure the prefix chosen will have no clash with existing style names.

In my Python virtual environment I have about 70 Pygments styles and I checked the md5 truncated to 6 hexadecimals were unique. Using md5 solves our problem and keeps the produced sanitized tex names to a controlled length.

For the PDF breaking, what kind of break are you seeing? I just tried with the ~270 lines of highlighting.py and I got no errors, nor visible issues with the PDF

I see

! Missing \endcsname inserted.
<to be read again> 
                   \futurelet 
l.113 ...a}\PYGcolorful{l+s+s2}{\PYGcolorfulZdq{}}

with HEAD at 7fab81e and my test index.rst which is a bit long, and anyhow all is fine with the new HEAD of this PR at 4b09baa.

Edit: a couple make clean later, I'm seeing an issue with long code blocks; they're missing in the .tex file

Can you test with the new HEAD at 4b09baa?

@jfbu
Copy link
Contributor

jfbu commented Jun 17, 2025

Testing failed apparentely because CI could not fetch HEAD after my push. No idea why and can't investigate for now.

@jfbu jfbu dismissed a stale review June 17, 2025 07:27

probably a bot

@jfbu
Copy link
Contributor

jfbu commented Jun 17, 2025

@hmedina I have merged master with was without conflicts (done by Ort strategy I did not look too closely if merge was sane). I was eager to see if CI testing would succeed and it seems to turn out good (attow only CI/Windows has not yet completed). Some glitch happened earlier for unknown reason at the commit I contributed prior.

@jfbu
Copy link
Contributor

jfbu commented Jun 17, 2025

Yes another approach which I had considered avoids hashing altogether. It simply enumerates all the encountered specialized block styles, and output them as a string acceptable to LaTeX in command names i.e. only a-zA-Z with the numeric index 0, 1, ... converted to a suitable ascii-letter representation. The problem is that we have to make sure the prefix chosen will have no clash with existing style names.

Now that we let Pygments use only 'PYG' prefix in the actual LaTeX, we can also use @ for the command prefix which will be specific to the stylessheets. The stylesheets all end up in .sty file which Sphinx loads by \RequirePackage so we can use @ as letter in LaTeX commands; but Pygments somewhat strangely inserts \makeatletter/makeatother (hence in _LATEX_ADD_STYLES we also have too), because it is very old code with very old legacy choices, in that case to allow using \input in place of \usepackage/\RequirePackage, which makes sense if other TeX formats than LaTeX are targeted, except that \makeatletter is a LaTeX only command so its insertion is self-defeating.

Anyhow what matters is that we can use @ now that such command prefix is not used in the produced .tex file but only in the accessory stylesheets.

So a possibility avoiding the md5 method would to enumerate all specialized styles and assign same sanitized names of the stype 'foo@' for the first one, 'foo@@' for the second, 'foo@@', foo@@@, etc... (where foo is anything for which we are sure \PYGfoo@ macro exists nowhere already, aaa should be fine) No Pygments style I know of uses @ character anyhow in its name so we are certain there will be no collision. Besides, when LaTeX will encounter \def\sphinxpygmentsstylename{foo@...@} it will assign to these @'s catcode other, but that does not matter because the \sphinxpygmentsstylename is expanded inside \csname...\endcsname (see _LATEX_ADD_STYLES in highlighting.py) and there @ can be either "letter" or "other", both work.

The difference though with this is that we have to pass the information to visit_literal_block of what is the index of that block stylename, so that the number of @ (at least 1) to use is known.

Copy link
Contributor

@jfbu jfbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the fact that this works fully for LaTeX/PDF output, LGTM! Internal aspects may need reviewing though.

@jfbu
Copy link
Contributor

jfbu commented Jun 17, 2025

Sorry there, my 4b09baa had a botched revert of changes in _LATEX_ADD_STYLES, the removal of {override} forgot to account for the fact that {{ and }} all have to revert to { and } now that this string is not destined to be used with .format. So this left some extra curly braces and in particular the lines

\def\PYGZob{{\{{}}
\def\PYGZcb{{\}}}}

were very wrong with the result that \PYGZob was defined with strange meaning to TeX {\{{}} \def \PYGZcb {{\}}}... oops... (somewhat miraculously the lines did not cause a PDF crash as \PYGZcb already had a defintion, only some extra space character ending up in PDF...).

I have corrected this oversight at the third commit I pushed here (27b5239). I know those commits might disappear as we will need at one point to rebase on master for the merge to be feasible.

@hmedina
Copy link
Author

hmedina commented Jun 17, 2025

Hey there! Seems to be working mostly, however I noticed the background colors, for LaTeX builds, only appear for the main/default style; none of the local overrides have it (try with a default style of python_docs_theme and a block of :style-light: nord for an immediately visible issue). Looking at the .sty file, I only see the \colorbox being set for the very first PYG@tok@cs, none of the subsequent ones have that (they only get a \textcolor). It seems this may be an issue in Pygments itself (see pygments/pygments#1054 ), as the command-line pygmentize also doesn't return background color values...

The HTML builder does not suffer from this, apparently because everything got a class selector with either .highlight or the .c[hash] element.

Left: HTML build. Right: LaTeX build.

left: HTML build, right: LaTeX build

As for using @ in style names, SetupTools will not refuse such a name (when quoted), so styles with that character in them definitely can be installed and used. This said, I'm not sure there's any character that would be a safe delimiter, so if there's no general solution, I agree this usage seems good enough as it covers all the stock Pygments themes.

I'm going to try a couple work-around for the background color issue, and try to do some code cleanup. Anything else while I'm at it?

@jfbu
Copy link
Contributor

jfbu commented Jun 18, 2025

The issue about background colors is a Pygments issue. Their output makes zero provision for setting a background color. The explanation is probably than until very recently the target LaTeX package fancyvrb had no interface for that (it is the one providing the Verbatim environment one sees in Pygments output, but we use sphinxVerbatim which is a sophisticated layer on top of it adding features), or it had something with faulty output. I think there is now better support (no time nor interest to check now details). But we at Sphinx would not want to use that, we have all that is needed to set a background color correctly. The only issue here is to get somehow the information of what is the background color expected for the block by the style. Once we have it we can enforce it very easily. Here is a temporary user level work around where I see an arbitrarily chosen background color:

.. raw:: latex

   \sphinxsetup{VerbatimColor=black!80}

.. code-block::
   :style-light: nord

   def get_stylesheet(self, selectors: list[int] | str | None = None) -> str:
       """Return a string with the specification for the tokens yielded by the language lexer, appropriate for the output formatter, using the style defined at initialization. In an HTML context, `selectors` is a list of CSS class selectors. In a LaTeX context, it modifies the command prefix used for macro definitions; see also LaTeXBuilder.add_block_style()
       """
       formatter = self.get_formatter()
       if isinstance(formatter, HtmlFormatter):
           if selectors:
               return formatter.get_style_defs(['.c{}'.format(s) for s in selectors])  # type: ignore [no-untyped-call]
           else:
               return formatter.get_style_defs('.highlight')  # type: ignore [no-untyped-call]
       else:
           if selectors:
               if isinstance(selectors, str):
                   _tex_name = md5(selectors.encode()).hexdigest()[:6]
                   for d, l in [('0', 'G'), ('1', 'H'), ('2', 'I'), ('3', 'J'),
                                ('4', 'K'), ('5', 'L'), ('6', 'M'), ('7', 'N'),
                                ('8', 'O')]:
                       _tex_name = _tex_name.replace(d, l)

renders to PDF (using :style-light: nord) as:
Capture d’écran 2025-06-18 à 08 58 20

(note that above code-block for example the docstring is modified from original to test specific things; besides it was missing ('9', 'P') and ruff forced me into another syntax later on).

The background color you see in non especially styled blocks is not the one from the pygments_style, it is a default of Sphinx set via the VerbatimColor key of the 'sphinxsetup' key of the latex_elements conf value. See https://www.sphinx-doc.org/en/master/latex.html#the-sphinxsetup-configuration-setting.

VerbatimColor

The background color for code-blocks.

Default: {RGB}{242,242,242} (same as {gray}{0.95}).

Changed in version 6.0.0: Formerly, it was {rgb}{1,1,1} (white).

About

.. raw:: latex

   \sphinxsetup{VerbatimColor=black!80}

one has to reuse it after the code-block to reset another color. Of course, if we implement this at Sphinx level, we would prefer locate the change inside the LaTeX environment. Which however is not trivial due to actual structure of visit_literal_block(), for example for setting the local style in this PR I opted temporarily to passing the information from outside the sphinxVerbatim and canceling it afterwards. At some point in future this may need refactoring to make it in a nicer way.

I'm going to try a couple work-around for the background color issue, and try to do some code cleanup. Anything else while I'm at it?

Sorry for late reply but I recommend you don't devote energy to this as part of this PR, you can do it in a separate PR if you have the motivation. I don't see anything else regarding LaTeX/PDF until other maintainers check the structure of how information is exchanged between builder and writer.

@jfbu
Copy link
Contributor

jfbu commented Jun 18, 2025

copied pasted from earlier now hidden review

As per the handling of dark-mode, it could be possible to do it, I have actually done it for a project using Sphinx, this requires modifying the Verbatim background color which can be done by \sphinxsetup anywhere inside the document, something such as (with the choices I made for the colors)

\sphinxsetup{%
    pre_background-TeXcolor={RGB}{40,42,54},% #282a36 aka VerbatimColor
    pre_TeXcolor={RGB}{248,248,242},% #f8f8f2
    VerbatimHighlightColor=VerbatimColor!75,
}

I also needed to add pygments_style = pygments_dark_style in the conf.py.

It is needed to set the default color for text adequately, as the above choices do, so that un-highlighted tokens show on dark background.

The issue for us is only to get from the Pygments style the info about what it wants to set as background color (and ideally default text color) for the framed block.

\textcolor LaTeX command will serve nothing to us in such context.

@hmedina
Copy link
Author

hmedina commented Jun 19, 2025

The issue for us is only to get from the Pygments style the info about what it wants to set as background color (and ideally default text color) for the framed block.

The Pygments.Style class has a background_color attribute: https://github.com/pygments/pygments/blob/94dda77d69a6d6c47c33f06ce2425e7f306154a2/pygments/style.py#L170-L174

As for the default text color, assuming there's one, that would inherit from the Token key in the styles attribute (in Pygments, all tokens inherit from this class). An annotation in that string like bg:#fff would signify a background color for those characters.

Not sure how to translate the values into viable latex, but for example, modifying the final part of LaTeXBuilder.write_stylesheet()

if self.specialized_highlighters:
    specialized_styles = []
    for style_name, pyg_bridge in self.specialized_highlighters.items():
        specialized_style = '\n% Stylesheet for style {}'.format(style_name)
        specialized_style += pyg_bridge.get_stylesheet(style_name)
        print('For style {}:'.format(style_name))
        if pyg_bridge.get_style(style_name).background_color is not None:
            bc = pyg_bridge.get_style(style_name).background_color.lstrip('#')
            bc_rgb = '{:.2f},{:.2f},{:.2f}'.format(*[int(bc[i:i+2], 16)/255 for i in (0, 2, 4)])
            print('\tgeneral background color was: {} -> {}'.format(bc, bc_rgb))
            from pygments.token import Token
            base_style = pyg_bridge.get_style(style_name).styles[Token]
            if base_style:  # could look like 'italic #000 bg:#ffffff'
                match = re.match(r'#([0-9a-fA-F]{3,6})(?:\s+bg:#([0-9a-fA-F]{3,6}))?', base_style)
                text_color_rgb = '{:.2f},{:.2f},{:.2f}'.format(*[int(match.group(1)[i:i+2], 16)/255 for i in (0, 2, 4)])
                print('\tdefault text color was: {} -> {}'.format(match.group(1), text_color_rgb))
                if match.group(2):
                    text_back_rgb = '{:.2f},{:.2f},{:.2f}'.format(*[int(match.group(2)[i:i+2], 16)/255 for i in (0, 2, 4)])
                    print('\tdefault text background color was: {} -> {}'.format(match.group(2), text_back_rgb))
        specialized_styles.append(specialized_style)
    f.write('\n'.join(specialized_styles))

My local testing gets this in output:

processing sphinxmultistylesample.tex: done
writing... done
For style lovelace:
        general background color was: ffffff -> 1.00,1.00,1.00
For style xcode:
        general background color was: ffffff -> 1.00,1.00,1.00
For style solarized-light:
        general background color was: fdf6e3 -> 0.99,0.96,0.89
        default text color was: 657b83 -> 0.40,0.48,0.51
For style nord:
        general background color was: 2E3440 -> 0.18,0.20,0.25
        default text color was: d8dee9 -> 0.85,0.87,0.91
Writing evaluated template result to /home/hmedina/sphinx_multistyle_sample_project/_build/latex/sphinxmessages.sty
build succeeded.

What would be the right way of translating this into background colors? I tried something with the pyg@toc@tc line's use of \colorbox but didn't get much...

NB: I guess both text-color and background-color could be optional, so the if match.group should be more rigorous and check both separately.

@jfbu
Copy link
Contributor

jfbu commented Jun 19, 2025 via email

hmedina and others added 4 commits June 19, 2025 18:23
* `LaTeXBuilder.get_bridge_for_style()` was only used to get a recently-created `PygmentsBridge` object, but added a None-check. Refactoring the method that added the created object to return it (i.e. `LaTeXBuilder.add_block_style()`) avoids the extra method and the requirement for a None check. If the method fails, it would fail at creation of the PygmentsBridge object, which is better handled by its own reporting
* Rename `LaTeXBuilder.add_block_style()` to `update_override_styles()`
* `StandaloneHTMLBuilder.get_bridge_for_style()` was only used to get a recently-created `PygmentsBridge` object, but added a None-check. Refactoring the methods that added the created object to return it (i.e. `add_block_dark_style()` & `add_block_light_style()`) avoids the extra method and the requirement for a None check. If the methods fail, they would fail at creation of the PygmentsBridge object(s), which is better handled by its own reporting
* Rename `add_block_dark_style()` to `update_override_styles_dark()`
* Rename `add_block_light_style()` to `update_override_styles_light()`
Some styles set the background_color to #ffffff, in such cases we
ignore that because the default Sphinx PDF light gray is nicer.

This also handles a "default" text color, but some testing shoud be
done.
@jfbu
Copy link
Contributor

jfbu commented Jun 20, 2025

I have implemented following on your comment LaTeX/PDF support for background_color and default color for non-highlighted tokens. I will push to your branch once tests pass, feel free of course to revert or improve. And test. Which I don't have much time for, these days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:enhancement enhance or introduce a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants