Apache Spark: Another Summer

It’s been a long time since I blogged about anything. (My last post was around 10 months ago). This is not because I had nothing to blog about, but I could not get myself to write something useful.

Taking that as an opportunity for another shameless self plug, I will be working on MLlib, a machine learning library built on top of Apache Spark, this summer as part of Google Summer of Code. It is written in Scala, with API’s in Python and Java. As part of preparation for the project I have worked on a number of issues related to both the Python API as well as the Scala backend (https://github.com/apache/spark/pulls?q=is%3Apr+author%3AMechCoder+is%3Aclosed), some of them include Sparse Matrix support for GaussianMixtures where the input can be a RDD of SparseVectors and training DecisionTrees with cross-validation.

I’m still a newbie to machine learning and programming in general and would look upon this project as an opportunity to build up on my programming skills and machine learning knowledge. This is the link to my project abstract (https://www.google-melange.com/gsoc/project/details/google/gsoc2015/manojkumar/5721450489053184) . The project is dynamic, that is newer issues can be worked upon on depending on the backlog.

P.S:  Just a few more days and some stupid exams left and then finally I’m a Mechanical engineer 🙂


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: