-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3D Parallel + Model Spec API #245
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Actually, no urgent reason to merge this soon. Will get DDP/FSDP, PP and TP working together atleast, since if we are going to be flaky for a bit, it's best to have all features available for testing and work my way through bugs |
This was referenced Jan 28, 2025
This was referenced Feb 7, 2025
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Model Specification
TODO
Parallel Backends
Finetrainers supports parallel training on multiple GPUs & nodes. This is done using the Pytorch DTensor backend.
As an experiment for comparing performance of different training backends, I've implemented multi-backend support. These backends may or may not fully rely on Pytorch's distributed DTensor solution. Currently, only 🤗 Accelerate is supported for backwards-compatibility reasons (as we initially started with only Accelerate support). In the near future, there are plans for integrating natively with:
Native support for context-parallel and pipeline-parallel, based on Pytorch DTensor and custom ParaAttention & xDiT inspired solutions is also planned. Note that this is just for experimental purposes to satisfy my curiosity and questions regarding performance of different frameworks. Users should only expect stable support with accelerate and pytorch dtensor.
Support matrix currently in this PR that have been verified to work:
Check the docs for more information.
Training improvements
Precomputation
A new mechanism for preprocessing and batched training has been implemented so that medium-to-large scale datasets can be handled efficiently without using too much disk space. The
DistributedDataProcessor
is fed in the dataloader iterators and processes a fixed batch of--precomputation_items
and saves them to--precomputation_dir
. These items can then be used for batched training based on frame-height-width bucket collation. By default,512
items are precomputed but should be adjusted by users based on available disk space and scale of training.Processors
This PR introduces a
ProcessorMixin
. It's an attempt to provide a standard interface for creating graph-based dataset manipulation from source data to input data for condition/latents models. A processor should ideally do very simple functionality and one thing only, so that multiple processors can be composed together. Processors should be invoked from theModelSpecification::prepare_latents
andModelSpecification::prepare_conditions
methods. Users are not required to use them (as in, this is opt-in), so can use any custom logic for preprocessing.TODO: show an example
Environment
Currently, the only tested functional environment is as follows (output obtained from
diffusers-cli env
)Finetrainers has only been widely tested with the following environment (output obtained by running
diffusers-cli env
):Other changes
pretrained_model_name_or_path