Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repository size severely affects processing speed for ManagedReference templates due to RawModel sharing #23

Open
banchan86 opened this issue Jan 15, 2025 · 4 comments

Comments

@banchan86
Copy link

banchan86 commented Jan 15, 2025

Note: This technically isn't an existing issue yet, but will become an issue as we move forward with the ManagedRef templating system, especially for larger repositories like Bonsai.Core docs. @glopesdev encountered it while experimenting with the templates using the Bonsai.Core docs when the docfx local preview would not finish building, and I have also encountered it myself.

Background

As part of our revamp of the API/References pages, we have been experimenting with the docfx templating system. Most of the time, we are using the templates to process the docfx metadata output files which are in the ManagedReference format. As such, I have taken to calling it the ManagedReference templating system to distinguish them from the usual HTML templating system.

This is a quick rundown of the steps in this process based on this chart:

  1. Metadata - a ManagedReference file is generated for every eligible object (for instance, API classes). These are stored as .yml files in the /api folder and serve as the Content Source for the Build step.
  2. Build - the ManagedReference document processor generates a RawModel for each object. This serves as the input data model for the next few steps.
  3. Pre-processing - By setting the isShared parameter in ManagedReference.overwrite.js to true in a custom template, every RawModel gets shared with all the other RawModels and is stored in the  __global._shared property of each RawModel.
  4. Pre-processing Transform - the template preprocessor transforms each RawModel into a ViewModel by performing some computation on the original RawModel and the RawModels in the __global._shared property.
  5. Renderer step - the template renderer transforms each ViewModel into the final html output.

Problem

I believe the problem comes about in step 3, when the RawModels are being shared with each other. The resulting RawModel size grows linearly and processing time grows exponentially with the number of objects. Step 3 is already being impacted, as it takes a long time to generate these RawModels, but I believe it also affects Step 4.

Normally the RawModels are not exported, but you can use this command to inspect them. Without any additional arguments, the RawModels get exported to _site/api/ as raw.json files.

dotnet docfx --exportRawModel
  • When isShared parameter in ManagedReference.overwrite.js is set to true, the RawModels are huge.

I tested it on the Bonsai.Core docs, the above command does run but takes 30 mins to complete (even with just the default templates). Each resulting RawModel.raw.json file is ~70MB and has >1.8 million lines.

  • When isShared parameter in ManagedReference.overwrite.js is set to false, the RawModels are much more manageable.

With the Bonsai.Core docs, each resulting RawModel.raw.json file is <100kb and only has ~1000 lines.

Solution

Unfortunately, in order to pull the information we want to customize the API pages with, it seems like we do need to share the RawModels with each other. I do not have a good solution for it yet, but if we want to continue with this approach it seems like
we should:

  • reduce the repository size by spinning off packages from Bonsai.Core, generate separate API reference pages for them in their respective package repository, and then integrate them later.
    • Note that the current approach of importing them as submodules does not work for this issue, as they are still included in the docfx build process and does not reduce the RawModel file size.

Additional context

#24 exacerbates this issue.

@banchan86 banchan86 changed the title Repository size severely affects processing speed for ManagedReference templates due to RawModel filesize Repository size severely affects processing speed for ManagedReference templates due to RawModel size Jan 15, 2025
@banchan86
Copy link
Author

Issue raised on the official docfx repository here - dotnet/docfx#10506.

@banchan86
Copy link
Author

Would also like to add that the OE team has a nice explanation of the docfx templating system on the README.md at bonsai onix1 docs which helped me understand the process.

@banchan86
Copy link
Author

@glopesdev mentioned that he is trying to tackle this from another direction and avoid sharing all together @dotnet/docfx#5674 (comment).

@banchan86 banchan86 changed the title Repository size severely affects processing speed for ManagedReference templates due to RawModel size Repository size severely affects processing speed for ManagedReference templates due to RawModel sharing Jan 16, 2025
@banchan86
Copy link
Author

banchan86 commented Jan 16, 2025

Another idea: We also might be able to trim down the input data with the exports.preTransform function that we can modify in ManagedReference.extension.js it probably will not affect the sharing stage or size of the RawModels in step 3 but it might speed up performance in step 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant