I was postponing the last post for the last of my Pull Requests to get merged. Now since it got merged, I do not have any reason to procrastinate. This is the work that I have done across summer, with a short description of each,
(Just in case you were wondering why the “another” in the title, https://manojbits.wordpress.com/2013/09/27/the-end-of-a-journey/ )
1. Improved memory mangement in the coordinate descent code.
Pull Request: https://github.com/scikit-learn/scikit-learn/pull/3102
Changing the backend from multiprocessing to threading by removing the GIL, and replacing the function calls with pure cblas. A huge improvement 3x – 4x in terms of memory was seen without compromising much on speed.
2. Randomised coordinate descent
Updating a feature randomnly with replacement instead of doing an update across all features can make descent converge quickly.
3. Logistic Regression CV
Pull Request: https://github.com/scikit-learn/scikit-learn/pull/2862
Fitting a cross validation path across a grid of Cs, with new solvers based on newton_cg and lbfgs. For high dimensional data, the warm start makes these solvers converge faster.
4. Multinomial Logistic Regression
Pull Request: https://github.com/scikit-learn/scikit-learn/pull/3490
Minimising the cross-entropy loss instead of doing a OvA across all classes. This results in better probability estimates of the predicted classes.
5. Strong Rules for coordinate descent
Status: Work in Progress
Pull Request: https://github.com/scikit-learn/scikit-learn/pull/3579
Rules which help skip over non-active features. I am working on this and it should be open for review in a few days.
Apart from these I have worked on a good number of minor bug fixes and enhancements, including exposing the n_iter parameter across all estimates, fixing incomplete download of newsgroup datasets, and soft coding the max_iter param in liblinear.
I would like to thank my mentor Alex who is the best mentor one can possibly have, (I’m not just saying this because of hope that he will pass me :P), Jaidev, Olivier, Vlad, Arnaud, Andreas, Joel, Lars, and the entire scikit-learn community for helping me to complete an important project to an extent of satisfaction. (It is amazing how people manage to contribute so much, inspite of having other full time jobs). I will be contributing to scikit-learn full-time till December at least as part of my internship.
EDIT: And of course Gael (how did I forget), the awesome project manager who is always full of enthusiasm and encouragement.
As they say one journey ends for the other to begin. The show must go on.