Areas to look at
- 1 hot encoding for categorical variables
-
Random forest for tackling over fitting
-
Scikit learn
Resources/concepts to reference
-
YouTube video reference: 3blue1brown - Study linear algebra - L2 regularization - for Bayesian classification
Down sampling
to ensure category within dataset does not crowd out other category
- Example if 85% are negative examples, we need to figure out how to reduce this percentage from 85%
- Up sampling
to ensure category within dataset is not under-represented
- Example if only 5% are positive examples, we need to figure out how to increase the examples by coming up with more variants within this dataset so as to boost it above 5%
Recent trends
- Technical trend: One shot learning
how to determine features and corresponding coefficients from examination of limited dataset - assumption is that small data is big data in disguise
-
Counter argument against this trend - humans are born with rudimentary mental model which means that they are born knowing what these features are
- Political and Social trend: DAT ASS - Data and Artificial Intelligence as a service
Top conferences for machine learning
- International conference on machine learning - ICML
- NIPS
source: From my conversations with Sujit