-
Notifications
You must be signed in to change notification settings - Fork 8
Glossary
Ishaan Javali edited this page Jul 1, 2019
·
3 revisions
- Activation: The activation parameter in the Dense() specifies how input from one layer will be processed to give output to the next layer which will then be received as input.
- Association: The model discovers rules that describe large portions of data, such as people that buy X also tend to buy Y.
- Batch Size: The batch size is the number of samples per gradient update. It specifies how many rows to train the model at a time.
- Bias - Difference between the expected output and the average prediction of the model.
- Classification: Type of Supervised Learning. Given data with labels of the class, the model trains to classify new data. The outputs are discrete values.
- Class Weight: Optional dictionary mapping indices (integers) to a weight (floats) used for weighting the loss function during training. It can be useful for telling the model to “pay more attention” to certain attributes by giving them a higher weight.
- Clustering: The model discovers inherent groupings in the data on its own.
- Decision Trees: Flowchart-like tree structure that branches to illustrate outcomes for different decisions.
- Density: The density is the number of layers in a model.
- Epochs: The epoch is the number of iterations of the dataset that the neural network will train on. The goal of multiple epochs is to minimize the loss of the model.
- Generalization - Refers to an algorithm’s ability to be effective across a range of inputs.
- Gradient Descent: Gradient descent is a popular optimization method that can be used for every type of neural network. It is an optimization algorithm used while training a model and it tweaks its parameters repeatedly to minimize the given loss function to its local minima. It can be thought of as the process of climbing down to the bottom of the valley (which represents minimizing the loss).
- Hidden Layers: The layers between the input and output layers where activation and weighted inputs are used.
- Hyperparameters - Hyperparameters are the features of a model that can be tuned to provide optimal results.
- Input Layer: The first layer of a neural network where the inputs are given.
- K Nearest Neighbors: Calculates the Euclidean distance between a point of data and the k nearest points of pre-classified data. It classifies the new point as the class of the majority of its neighbors.
- Learning Rate: One of the parameters that can be passed into the optimizer method is the learning rate of the model. The learning rate specifies how quickly a model is going to update its weights when given new data.
- Logistic Regression: Uses the logit function, also called sigmoid function, to calculate probabilities in order to make predictions.
- Loss Function: While going through iterations of data (aka epochs), the model also works on reducing its loss. Examples: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MEA).
- Margin: Maximum possible perpendicular distance in a support vector classifier hyperplane between the nearest points from two different classes.
- Naive Bayes: It is a classifier based on Bayes’ theorem and classifies values independently of other values. It uses probability to predict a class.
- Neural Network: Artificial Neural Networks (ANNs) are a type of machine learning model loosely inspired by the biological structure of the brain. They are used in machine learning to be trained on data in order to analyze new data.
- Neurons: Make up a neural network. Take input and then use the activation function to give an output to the next layer.
- Optimizer: The optimizer is a method that is used to minimize the loss function during the training process of a model. Examples: Adam, Adagrad, RMSProp.
- Output Layer: The last layer of the natural network where the result is output through neurons.
- Overfitting - The model is trained too well on the data and “memorizes” it. Therefore, it captures the noise in the dataset, making it less accurate. Low bias, high variance.
- Parameters - Parameters are the data values that are being given to the model to train on.
- Principle Component Analysis (PCA): A method that reduces the dimensionality of a data set consisting of many attributes whilst retaining the variation in the dataset. This method is used to limit a dataset to only a few features for training.
- Regression: Type of Supervised Learning. The model is given data with the expected outputs and trains on the data to predict output for new data. The outputs are continuous values.
- Reinforcement Learning: Aims at using observations from training to maximize the reward while minimizing the risk. The model learns from its experiences to explore a full range of possible states that it uses to make optimal decisions.
- Stochastic Gradient Descent: Gradient Descent with a batch size of 1.
- Supervised Learning:* The model is trained with label data. This means that in addition to receiving features/attributes from the data, the model also receives the expected output. Then, when given new data the model can output its prediction.
- Support Vectors: Refers to the data points from each class that are nearest to each other.
- Support Vector Classifier: Finds a hyperplane that separates the classes with the maximum possible perpendicular distance between the nearest points from each class. This distance is called the margin and the nearest points are called support vectors.
- Underfitting - The model is too simple and makes too many assumptions about the data and cannot find the underlying trend. High variance, low bias.
- Unsupervised Learning: The model is trained with unlabeled data. What this means is that it is not told the class of the data or the expected values. It has to determine on its own what to output.
- Vanishing Gradient Problem - Gradient-based activation function methods understand the values of parameters by “learning” how a change in the parameters’ values can affect the neural network’s output. However, if the change in the parameters’ values is very insignificant, then the neural network can’t understand the parameter’s effectively.
- Variance - The amount by which the model’s predictions for a data point vary.