[TIL] 20-Feb-2020
R control structures, lexical scope, dates, Anomaly Detection, Multivariate Gaussian.
Basics of R just like any other programming language , has while , for, repeat loops for controlling flow. Its like python in determining scope in that it also uses lexical scope, where the function closure includes the environment at time of function definition as compared to dynamic scope which considers environment at call / execution time.
Dates count days/seconds from 1970-01-01 and time is in POSIXlt and POSIXct where POSIXlt has named attributes and POSIXct is just a large integer.
Anomaly Detection is an algorithm to help find anomalies in our dataset. It works by first assuming that all of our features are following a Gaussian / Normal distribution. Then we take a threshold probability \(\epsilon\) and we classify examples that have a lower probability of occurrence than our threshold as Anomalous. The actual probability of a certain sample with a certain combination of features is simply calculated as the product of the gaussian distributions of all of our features.
Anomaly detection vs Supervised Learning:
- They both seem to be doing the same thing. We have labelled binary classes and we want to train a binary classifier, However Anomaly detection can be a better choice in situations where we have very few anomalous examples. Like we have a dataset with 100,000 normal examples and 20 anomalies. Then a supervised learning algorithm will be very poor at identifying the anomalies.
- If we have a situation where anomalies dont follow any pattern and can be wildly different like in manufacturing a bad engine can be represented as generating too much heat or too much noise or being inefficient or consuming too much petrol etc. There can be wide variations in our anomaly criteria which will be hard for a supervised learning algorithm to learn with such little examples.
- In short: If we have a large and balanced number of positive and negative examples then supervised learning is better. If the negative examples follow a similar pattern then supervised learning is better. On the other hand if we have a very few negative examples or our anomalies dont really follow a well defined pattern then Anomaly detection would be preferred.
Anomaly detection can be improved by coming up with new features after carefully studying what could be causing certain anomalies.
Multivariate Gaussian Distribution. Is an improvement to anomaly detection which also allows us to take into account correlations in the features. Instead of separately modelling our gaussians and taking the product, now we take a single multivariate gaussian distribution.
Comparing the Multivariate Gaussian to the original model we see that the original model is just a special case of the multivariate gaussian where all the off diagonal elements of the covariance matrix \(\sigma\) are 0. We have a few cases where the original model is better:
- The original model is computationally much cheaper, while the multivariate gaussian needs to compute the inverse of the covariance matrix.
- The original model is unable to capture correlations and we need to spend time on feature engineering and thinking of ways to come up with novel features that may capture anomalies, the multivariate gaussian does this automatically.
- There are conditions where the multivariate gaussian cant work due to non-invertible covariance matrix which come up in 2 scenarios: m < n where we have more features than examples in our training set.Also Duplicate / redundant features where our features may be copied or be multiples of each other or linearly dependent.