A perceptron can be used for Binary Classification while two classes are linearly seperable. A 2D visualization of perceptron is depicted below:
We know that a single perceptron is a linear function and can only do Binary Classification. We connect multiple perceptrons in multiple layers and apply non-linear activation function in between and improve the hypothesis class to learn better non-linear mappings. Such Neural Network is Called Feed Forward Neural Network and can learn more complex non-linear mapping. The learning procedure of an MLP which is utilized to regress a function is depicted below:
There are lot's of techniques to improve Vanilla Gradient Descent and there are lot's of variations of Vanilla GD, such as Adam, AdamW, and etc. One of popular techniques is Momentum. A visual comparison between Vanilla Gradient Descent and Gradient Descent with Momentum is depicted below: