Skip to content

FineTuneHyperparamters

MarcoDotIO edited this page Dec 5, 2022 · 1 revision

FineTuneHyperparamters

The hyperparameters used to tweak the Fine-tune model.

public struct FineTuneHyperparamters: Codable 

Inheritance

Codable

Properties

batchSize

The batch size to use for training.

public let batchSize: Int?

The batch size is the number of training examples used to train a single forward and backward pass.

By default, the batch size will be dynamically configured to be ~0.2% of the number of examples in the training set, capped at 256 - in general, we've found that larger batch sizes tend to work better for larger datasets.

learningRateMultiplier

The learning rate multiplier to use for training.

public let learningRateMultiplier: Double?

The fine-tuning learning rate is the original learning rate used for pretraining multiplied by this value.

By default, the learning rate multiplier is the 0.05, 0.1, or 0.2 depending on final batch_size (larger learning rates tend to perform better with larger batch sizes). We recommend experimenting with values in the range 0.02 to 0.2 to see what produces the best results.

nEpochs

The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.

public let nEpochs: Int

promptLossWeight

The weight to use for loss on the prompt tokens.

public let promptLossWeight: Double

This controls how much the model tries to learn to generate the prompt (as compared to the completion which always has a weight of 1.0), and can add a stabilizing effect to training when completions are short.

If prompts are extremely long (relative to completions), it may make sense to reduce this weight so as to avoid over-prioritizing learning the prompt.

Types
Global Functions
Clone this wiki locally