I made a Multi Layer Perceptron(Neural Network) in numpy
Backpropagation is an algorithm used for computing gradients of the loss function with respect to weights in the neural network. The gradient is then used to update weights via gradient descent.
For a single linear layer, we calculate the gradient of the loss
1. Gradient of the Loss with respect to Output y:
The gradient of the loss with respect to the output of the current layer is passed down from the next layer in the backward pass:
2. Gradient with respect to Weights W :
Using the chain rule, the gradient of the loss with respect to the weights is computed as:
Where
3. The gradient with respect to bias is calculated as:
4. Finally, the gradient of the loss with respect to the input to this layer is:
Activation Function’s Gradient
If the activation function
where
Full Backpropagation Process
For each layer during backpropagation, we perform the following steps:
- Calculate the gradient of the loss with respect to the output (from the next layer).
- Compute the gradients with respect to weights, biases, and inputs using the chain rule.
- Update the parameters using gradient descent:
where
The Linear Layer in a Multi-Layer Perceptron (MLP) computes the following operation:
where:
•
layer = Linear(input_size, output_size, activation, use_bias,
dropout, use_act, eval, regularisation, beta)
-
Input size (int) : Size of input dimension (
$X$ ) -
Output size (int) : Size of output dimension (
$Y$ ) -
Activation ['relu', 'sigmoid']: Type of non Linearity [default : Sigmoid]
-
Dropout (float) : fraction of nodes turned off using dropout [default : 0]
-
Use Activation (bool) : To use activation (
$\sigma$ ) in the layer or not [default : TRUE] -
Use Bias (bool) : To use bias
$b$ or not [default : TRUE] -
Eval (bool) : When true layer store gradients for training, false for inference [default : TRUE]
-
Regularisation ['l1', 'l2'] : Type of regularisation [default : none]
-
beta (float) : Regularisation constant
$\beta$ [default : 0]
model = MLP(input_size, output_size, hid_sizes,
activation, is_softmax, reg, beta, layer_dropout,
use_bias, eval)
-
Input Size ( int ) : Number of nodes in input layer i.e size of input
-
Output Size ( int ): Number of nodes in output layer i.e size of output
-
Hidden sizes ( list[ int ] ) : Size of each hidden layer between input to output layer (in order) ; [default : [ ] i.e. no hidden layer]
-
activation : Type of non Linear activation used in Neural Network [default : sigmoid activation], (same for all layers)
-
Is_softmax (bool) : To use softmax at output of Neural Network [default : False]
-
Layer Dropout (float) : Dropout ratio for layers in Neural Network [default : 0] (same for all layers)
-
Use Bias (bool) : To use bias
$b$ or not [default : TRUE] (same for all layers) -
Eval (bool) : When true model is in traning mode, false for inference [default : TRUE]
-
Regularisation ['l1', 'l2'] : Type of regularisation [default : none]
-
beta (float) : Regularisation constant
$\beta$ [default : 0]
trainer = MLPTrainer(model, learning_rate,batch_size , lossfn)
trainer.train(x_train, y_train, x_val, y_val, epochs)
-
Model ( object ) : Instance of a MLP class
-
Learning Rate ( int ): Step size in stochastic gradient descent
-
Loss Function ['sq-error', 'cross-enropy']: Loss function for regression or classification
-
Batch Size ( int ): Batch size for Training
To get batches separately for features and Labels for training and validation data
batches = trainer.data_loader(data)
y = model.forward(x)