neural-network-in-numpy

I made a Multi Layer Perceptron(Neural Network) in numpy

Backpropogation in Neural Networks

Backpropagation is an algorithm used for computing gradients of the loss function with respect to weights in the neural network. The gradient is then used to update weights via gradient descent.

For a single linear layer, we calculate the gradient of the loss $L$ with respect to the parameters $W$ , $b$ , and the input $x$ . Let’s denote the output of the linear layer as $y = Wx + b$ , and the loss as $L$ .

1. Gradient of the Loss with respect to Output y: The gradient of the loss with respect to the output of the current layer is passed down from the next layer in the backward pass:

$$ \frac{\partial L}{\partial y} $$

2. Gradient with respect to Weights W : Using the chain rule, the gradient of the loss with respect to the weights is computed as:

$$ \frac{\partial L}{\partial W} = \frac{\partial L}{\partial y} \cdot \frac{\partial y}{\partial W} = \delta \cdot x^T $$

Where $\delta=\frac{\partial L}{\partial y}$ represents the gradient propagated from the next layer.

3. The gradient with respect to bias is calculated as:

$$ \frac{\partial L}{\partial b} = \frac{\partial L}{\partial y} = \delta $$

4. Finally, the gradient of the loss with respect to the input to this layer is:

$$ \frac{\partial L}{\partial x} = W^T \cdot \delta $$

Activation Function’s Gradient

If the activation function $\sigma$ is applied after the linear transformation, we need to propagate gradients through it. For an activation function $\sigma(y)$ , the gradient is:

$$ \frac{\partial L}{\partial z} = \frac{\partial L}{\partial y} \cdot \sigma’(y) $$

where $\sigma{\prime}(y)$ is the derivative of the activation function.

Full Backpropagation Process

For each layer during backpropagation, we perform the following steps:

Calculate the gradient of the loss with respect to the output (from the next layer).
Compute the gradients with respect to weights, biases, and inputs using the chain rule.
Update the parameters using gradient descent:

$$ W \leftarrow W - \eta \cdot \frac{\partial L}{\partial W} $$

$$ b \leftarrow b - \eta \cdot \frac{\partial L}{\partial b} $$

where $\eta$ is the learning rate.

Implementation

1. Linear Layer

The Linear Layer in a Multi-Layer Perceptron (MLP) computes the following operation:

$$ y \ = \ \sigma (Wx + b)$$

where:

• $W$ is the weight matrix, • $x$ is the input, • $b$ is the bias term, and • $\sigma$ is the activation function applied element-wise.

layer =  Linear(input_size, output_size, activation, use_bias,
              dropout, use_act, eval, regularisation, beta)

Input size (int) : Size of input dimension ($X$)
Output size (int) : Size of output dimension ($Y$)
Activation ['relu', 'sigmoid']: Type of non Linearity [default : Sigmoid]
Dropout (float) : fraction of nodes turned off using dropout [default : 0]
Use Activation (bool) : To use activation ($\sigma$) in the layer or not [default : TRUE]
Use Bias (bool) : To use bias $b$ or not [default : TRUE]
Eval (bool) : When true layer store gradients for training, false for inference [default : TRUE]
Regularisation ['l1', 'l2'] : Type of regularisation [default : none]
beta (float) : Regularisation constant $\beta$ [default : 0]

2. Multi Layer Perceptron

model =  MLP(input_size, output_size, hid_sizes,
          activation, is_softmax, reg, beta, layer_dropout,
          use_bias, eval)

Input Size ( int ) : Number of nodes in input layer i.e size of input
Output Size ( int ): Number of nodes in output layer i.e size of output
Hidden sizes ( list[ int ] ) : Size of each hidden layer between input to output layer (in order) ; [default : [ ] i.e. no hidden layer]
activation : Type of non Linear activation used in Neural Network [default : sigmoid activation], (same for all layers)
Is_softmax (bool) : To use softmax at output of Neural Network [default : False]
Layer Dropout (float) : Dropout ratio for layers in Neural Network [default : 0] (same for all layers)
Use Bias (bool) : To use bias $b$ or not [default : TRUE] (same for all layers)
Eval (bool) : When true model is in traning mode, false for inference [default : TRUE]
Regularisation ['l1', 'l2'] : Type of regularisation [default : none]
beta (float) : Regularisation constant $\beta$ [default : 0]

3. MLP Trainer

Training

trainer =  MLPTrainer(model, learning_rate,batch_size , lossfn)
trainer.train(x_train, y_train, x_val, y_val, epochs)

Model ( object ) : Instance of a MLP class
Learning Rate ( int ): Step size in stochastic gradient descent
Loss Function ['sq-error', 'cross-enropy']: Loss function for regression or classification
Batch Size ( int ): Batch size for Training

To get batches separately for features and Labels for training and validation data

batches = trainer.data_loader(data)

Inference

y = model.forward(x)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

neural-network-in-numpy

Backpropogation in Neural Networks

Implementation

1. Linear Layer

2. Multi Layer Perceptron

3. MLP Trainer

Training

Inference

About

Releases

Packages

Languages

License

sghawana/neural-network-in-numpy

Folders and files

Latest commit

History

Repository files navigation

neural-network-in-numpy

Backpropogation in Neural Networks

Implementation

1. Linear Layer

2. Multi Layer Perceptron

3. MLP Trainer

Training

Inference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages