• Syllabus
  • Lecture 1 Notes + Video

  • Week 1 (Overview + K-nearest Neighbors):
    • Tasks:
      • Reading
      • Videos
    • Topics:
      • Feature vectors, Labels
      • 0/1 loss, squared loss, absolute loss
      • Train / Test split
      • Hypothesis classes
      • Nearest Neighbor Classifier
      • Sketch of Covert and Hart proof (that 1-NN converges to at most 2xBayes Error in the sample limit)
      • Curse of Dimensionality
  • Week 2 (Perceptron + Estimating Probabilities from data):
    • Tasks:
      • Reading
      • Videos
    • Topics:
      • Linear Classifiers
      • Absorbing bias into a d+1 dimensional vector
      • Perceptron convergence proof
      • MLE
      • MAP
      • Bayesian vs Frequentist statistics.
  • Week 3 (Naive Bayes + Logistic Regression):
    • Tasks:
      • Reading
      • Videos
    • Topics:
      • Naive Bayes Assumption.
      • Why is estimating probabilities difficult and high dimensions?
      • Logistic Regression formulation.
      • Relationship of LR with Naive Bayes.
  • Week 4 (Gradient Descent + Linear Regression):
    • Tasks:
      • Reading
      • Videos
    • Topics:
      • Gradient Descent (GD)
      • Taylor’s Expansion
      • Proof that GD Decreases with every step if stepsize is small enough.
      • Some Tricks to set the step size.
      • Newton’s Method
      • Assumption of Linear Regression with Gaussian Noise.
      • Ordinary Least Squares (OLS) = MLE
      • Ridge Regression = MAP
  • Week 5 (Linear SVM + Empirical Risk Minimizaton):
    • Tasks:
      • Reading
      • Videos
    • Topics:
      • What is the margin of a hyperplane classifier
      • How to derive a max margin classifier
      • That SVMs are convex
      • The final QP of SVMs
      • Slack variables
      • The unconstrained SVM formulation
      • Setup of loss function and regularizer
      • classification loss functions: hinge-loss, log-loss, zero-one loss, exponential
      • regression loss functions: absolute loss, squared loss, huber loss, log-cosh
      • Properties of the various loss functions
      • Which ones are more susceptible to noise, which ones are loss
      • Special cases: OLS, Ridge regression, Lasso, Logistic Regression
  • Week 6 (ML Debugging, Over/Under fitting + Bias/Variance Tradeoff):
    • Reading
    • Videos
  • Week 7 (Kernels Reducing Bias + Gaussian Processes / Bayesian Global Optimization):
    • Reading
    • Videos
  • Week 8 (K-nearest neighbors data structures + Decision/Regression Trees):
    • Reading
    • Videos
  • Week 9 (Bagging + Boosting):
    • Reading
    • Videos
  • Week 10 (Artifical Neural Networks / Deep Learning):
    • Reading
    • Videos