Linear Regression with multiple variables

Multiple Features

Add missing notes here

Gradient Descent for Multiple Variables

Add missing notes here

Feature Scaling

If features have very different ranges of values (e.g. size of a house being from 1000ft to 5000ft and the number of bedrooms being from 1 to 4), then linear regression may take too long to converge to a global minimum. Plotting the cost function of those features might give a visual explanation (they would be very narrow elipses).

Normalization

To normalize a given feature, a good rule of thumb is to subtract all values by the mean value and then divide them by the range (max - min) value. The range value can be replaced by the standard deviation as well.

xi = (xi - μi) / (xmax - xmin)
# or
xi = (xi - μi) / (Si)

# where:
# xi   = give feature value
# µi   = mean value
# xmax = max value of xi
# xmin = min value of xi
# Si   = standard deviation of x

Learning Rate

The chosen learning rate (α) cannot be too large or too small. When the learning rate is too large, the cost function may never converge to a minimum or it may even diverge and start increasing after each iteration. On the other hand, if the learning rate is too small, the cost function may take too long to converge to a minimum.

A good rule of thumb for choosing α is to choose values like this:

..., 0.001, 0.01, 0.1, 1, ...
# or better yet
..., 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, ...
# roughly increasing them threefold

Features and Polynomial Regression

To fit linear regression to your model, you can create new features based on your current set of features. Let's say the features are the length and width of a land size, then another feature called area could be created by multiplying the previous features: area = length * width.

Sometimes a linear model might not be the best fit for the model, but rather a quadratic, cubic or maybe a square root model might fit the model in a better way. To accomplish this, new features can be created to represent these values. For instance, if we are using the size of a house as one of our features, J function could be represented as following:

J(Θ) = Θ0 + Θ1(size) + Θ2(size)² + Θ3(size)³
# or depending on our model, in a different way:
J(Θ) = Θ0 + Θ1(size) + Θ2(√size)

For that, simply applying these operations to create "new features" will make it possible to use linear regression.

Normal Equation

It is a way to solve for Θ analytically. It can be mathematically proved that the values of Θ that minimize J(Θ) can be computed as the following:

Θ = inverse(X' * X) * X' * y
# where:
# X = feature matrix where each row corresponds to xm feature values
# X' = transpose of X
# inverse(X) = X^-1 or the inverse of matrix X
# y = vector where each row corresponds to the value to be predicted

When using normal equation, there is no need for feature scaling, so the features can be in any range of values.

For m training examples and n features:

Gradient Descent
- Need to choose α
- Needs many iterations
- Works well even when n is large
Normal Equation
- No need to choose α
- No need for many iterations
- Needs to compute inverse(X' * X), a nxn matrix
- Slow if n is very large (complexity of inverting a matrix is approximately O(n³))
- For current computer standards, if n > 10000, then gradient descent begins to get faster.
- Does not work for more sofisticated algorithms (e.g. classification algorithms)

Normal Equation and Non-Invertibility

X' * X is non-invertible (singular/degenerate) if:

Redundant features (linearly dependent)
- x1 = size in feet²
- x2 = size in m²
- x1 = (3.28)² * x2
Too many features (e.g. m <= n)
- delete some features or use regulatization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

04.Linear.Regression.with.multiple.variables.markdown

04.Linear.Regression.with.multiple.variables.markdown

Linear Regression with multiple variables

Multiple Features

Gradient Descent for Multiple Variables

Feature Scaling

Normalization

Learning Rate

Features and Polynomial Regression

Normal Equation

Normal Equation and Non-Invertibility

Files

04.Linear.Regression.with.multiple.variables.markdown

Latest commit

History

04.Linear.Regression.with.multiple.variables.markdown

File metadata and controls

Linear Regression with multiple variables

Multiple Features

Gradient Descent for Multiple Variables

Feature Scaling

Normalization

Learning Rate

Features and Polynomial Regression

Normal Equation

Normal Equation and Non-Invertibility