Hi. It has been another slow week, and the heat and laziness at home (holiday season) isn’t quite helping. However I did manage to do a bit of work and this post will serve as a remainder for me to complete all the work that I have accumulated by the end of next week.
1. Got the memory profiler working.
The https://github.com/fabianp/memory_profiler is a wonderful tool bult by Fabian that gives a line by line coments about the memory being used. You can install it like any other python package, by doing
sudo python setup.py install
a] You can use it by simply importing it in the top of the file.
from memory_profiler import profile
and adding @profile at the top of the function, that you need to use it for.
As mentioned in the last post, this was giving ambiguous results, since it wasn’t taking into account the child processes. There are is a workaround to this, that is using the mprof -c command directly, which gives plots showing how much memory is being used up. (Thanks to Olivier for helping me out with this)
We can see that threading has a considerable advantage with respect to memory, this is because in threading, the memory of the input data, is shared by all the worker threads while in multiprocessing each process needs its own memory.
2. While I was trying to release the GIL for the Gram case, I found that the optimisation algorithm cd_fast.enet_coordinate_descent_gram wasn’t being used at all! I confirmed from Alexandre that it was indeed a refactoring bug, and so I’ve sent a Pull Request to fix this. https://github.com/scikit-learn/scikit-learn/pull/3220 .
The Gram Matrix which is actually np.dot(X.T, X) (and the coordinate_descent_gram) is found out in either of these two cases.
1. When precompute is set to be true.
2. When precompute is set to be “auto” and when n_features is greater than n_samples (default)
However, contrary to intution, it actually slows down (maybe because of the time taken to compute the Gram matrix) and hence I’ve changed the defualt to False in the PR.
3. Started with the LogisticRegression CV PR
I started reading the source code of the Logistic Regression CV Pull Request, and I could understand some of it. I pushed in a commit to Parallelize a computation involving a regularization parameter, but I found out that it actually slows things down (:-X).
By the way, thanks to the people at Rackspace Cloud for providing an account to help me with my benchmark sessions and a perfect opportunity to learn vim(!).
Things to do be done by next week.
1. Change the LogisticRegressionCV PR from a WIP to MRG.
2. Get the releasing GIL PR merged, with the help of other mentors.
P.S: For those who follow cricket, whenever I’m really slow in code, I take inspiration from M.S.Dhoni (the Indian Captian) who starts off really slow, but then finishes off in style every time (well almost).