MAISI scaling factor #1994

nordinbelkacemi · 2025-06-05T10:25:05Z

nordinbelkacemi
Jun 5, 2025

I'm having trouble understanding why the scaling factor is needed when training the diffusion model (and controlnet). The autoencoder is supposed to produce embeddings that are normally distributed (mean≈0, std≈1), and it looks like the scaling factor is calculated using only the first batch, making things pretty unclear to me.

# generation/maisi/scripts/diff_model_train.py

def calculate_scale_factor(train_loader: DataLoader, device: torch.device, logger: logging.Logger) -> torch.Tensor:
    """
    Calculate the scaling factor for the dataset.

    Args:
        train_loader (DataLoader): Data loader for training.
        device (torch.device): Device to use for calculation.
        logger (logging.Logger): Logger for logging information.

    Returns:
        torch.Tensor: Calculated scaling factor.
    """
    check_data = first(train_loader)
    z = check_data["image"].to(device)
    scale_factor = 1 / torch.std(z)
    logger.info(f"Scaling factor set to {scale_factor}.")

    if dist.is_initialized():
        dist.barrier()
        dist.all_reduce(scale_factor, op=torch.distributed.ReduceOp.AVG)
    logger.info(f"scale_factor -> {scale_factor}.")
    return scale_factor

All batches are scaled with this number when training. In the paper appendix B, it says that it is ensured the VAE's latents have standard distribution between 0.9 and 1.1, so maybe this scaling factor is to correct for that, but then shouldn't the scaling factor should be computed using the entire dataset?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MAISI scaling factor #1994

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

MAISI scaling factor #1994

Uh oh!

nordinbelkacemi Jun 5, 2025

Replies: 0 comments

nordinbelkacemi
Jun 5, 2025