Areas to look at
- 1 hot encoding for categorical variables
-
Random forest for tackling over fitting
-
Scikit learn
Resources/concepts to reference
-
YouTube video reference: 3blue1brown
-
Study linear algebra
-
L2 regularization – for Bayesian classification
- Down sampling
- to ensure category within dataset does not crowd out other category
- Example if 85% are negative examples, we need to figure out how to reduce this percentage from 85%
- Up sampling
- to ensure category within dataset is not under-represented
- Example if only 5% are positive examples, we need to figure out how to increase the examples by coming up with more variants within this dataset so as to boost it above 5%
- Down sampling
Recent trends
- Technical trend: One shot learning
- how to determine features and corresponding coefficients from examination of limited dataset – assumption is that small data is big data in disguise
-
Counter argument against this trend – humans are born with rudimentary mental model which means that they are born knowing what these features are
- Political and Social trend: DAT ASS – Data and Artificial Intelligence as a service
Top conferences for machine learning
- International conference on machine learning – ICML
- NIPS
source: From my conversations with Sujit