Hi, The first part of the last few days of have been the most productive and the last part well not so much.
1. Fixing precompute for ElasticNetCV
The function argument precompute=”auto” was being unused, in ElasticNetCV as mentioned in my previous post. Setting precompute equals auto uses a gram variant of the input matrix which according to the documentation is np.dot(X.T, X) . This theoritically helps the descent algorithm to converge faster. (However at the time of writing, I do not know exactly how). Practically however, (and after testing using the line profiler) it seems to be a bit slower since the computation of the gram matrix takes quite a bit of time. So with the advice of ogrisel, I split it across three Pull Requests. All three are essentially easy to fix.
1. https://github.com/scikit-learn/scikit-learn/pull/3247 – This ensures that the Gram variant is used if precompute is said to True or (auto and if n_samples > n_features)
2. https://github.com/scikit-learn/scikit-learn/pull/3248 – Remove precompute from Multi Task models since it is being unused,
3. https://github.com/scikit-learn/scikit-learn/pull/3249 – This is a WIP, and changes default precompute from auto to False
2. Threading backend for Linear Models.
I have successfully changed the backend from muli-processing to threading after releasing the GIL for all four variants. After a final round of review it can be merged,
a] Simple coordinate descent
b] Sparse coordinate descent
c] Gram variant
d] MultiTask variant
There is a huge memory gain and speed is almost the same (if not slightly higher by this Pull Request). https://github.com/scikit-learn/scikit-learn/pull/3102
3. Logistic Regression CV
Reading someone else’s magically vectorised NumPy code isn’t an easy task and I somehow crawled my way through it (which explains the more productive first part).
I fixed a bug in the code to compute the Hessian when the intercept is true. I’ve also fixed sparse matrix support and added multiple tests to it and confirmted that the newton-cg and lbfgs solvers give the exact same result, The liblinear has a slight change due to the penalisation of the intercept.
However benchmarking gives ambiguous results. On standard datasets such as the newsgroup and digits data, almost always the lib-linear solver is the fastest. However in datasets using make_classification, lbfgs seems to be the faster solver.
Right now, my job is just to wait for comments from Alex and Olivier and make the necessary changes. I shall come up with a more detailed description on Log Reg CV next week.