Linear Regression Gradient Descent
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠
Text Elements
Link to original
Feature Scaling
- always normalize/standardize
 
How to pick learning rate?
- Line search methods
 - Conjugate gradient
- used for quadratic objectives
 
 - Newton Search direction
 
Large datasets
In large datasets, computing gradient gets expensive (since it uses the whole dataset X)
- Online Learning
 - Stochastic Gradient Descent a.k.a minibatch gradient descent