Solved – Is matlab/octave widely used for prototyping in ML/data science industry

machine learningMATLABpython

In the second lesson on Machine Learning (https://www.coursera.org/learn/machine-learning/lecture/olRZo/unsupervised-learning), Prof. Andrew Ng from Stanford (https://en.wikipedia.org/wiki/Andrew_Ng) mentions that Matlab/Octave is widely used in the Machine learning industry to prototype.
I did quite a bit of research before settling on learning Python, as it seemed to be more applicable for real-life problems. I have used Matlab ~ 2 years back, and I am wondering if I should really go back to Matlab, because of his statement here.

However, I have also heard other arguments, that the reason why Matlab/Octave is still used in this course is because this course started in 2011, when python was not as popular or widely used in ML, as a result most of the algorithms was hard to get, or had to be handcoded in Python.

I know a lot of you work in the ML/data science industry, so I was wondering:

Is Matlab/Octave that widely used in ML/data science industry?
Why so, especially since numpy/pandas have a lot of matrix algaebra capabilities?

Best Answer

While Matlab certainly remains a primary tool in much of academic science and engineering, I do not see it used extensively in data science. The primary reason, as I see it, is R's (and Python-Pandas) extensive use of data frames and reference-by-name ecosystem. Matlab is designed to work with matrices, and while you can get Matlab to work with tables and group by categorical variables (e.g., varfun), it's often terribly cumbersome and less intuitive. R and Python employ a syntax that is conducive to thinking-and-coding-as-you-go, almost like writing a data story. Matlab becomes quite verbose in this context, and often requires multi-line solutions for problems R/Python can attack with a fraction of text (though perhaps double or triple the time). To Matlab's credit, that isn't its primary use case. If you want to do serious optimization and simulation, you'll age waiting for R to complete, while Matlab barely sweats. But if you want to explore and model your data in a thoughtful, principled way, R and Python are often better suited, in my opinion. To each their own, but Matlab just wasn't designed for the types of tasks data scientists face.

Related Solutions

Solved – Is Matlab/octave or R better suited for monte carlo simulation

I use both. I often prototype functions & algorithms in Matlab because, as stated, it is easier to express an algorithm in something which is close to a pure mathematical language.

R does have excellent libraries. I'm still learning it, but I'm starting to leave Matlab in the dust because once you know R, it's also fairly easy to prototype functions there.

However, I find that if you want algorithms to function efficiently within a production environment, it is best to move to a compiled language like C++. I have experience wrapping C++ into both Matlab and R (and excel for that matter), but I've had a better experience with R. Disclaimer: Being a grad student, I haven't used a recent version of Matlab for my dlls, I've been working almost exclusively in Matlab 7.1 (which is like 4 years old). Perhaps the newer versions work better, but I can think of two situations off the top of my head where a C++ dll in the back of Matlab caused Windows XP to blue screen because I walked inappropriately outside an array bounds -- a very hard problem to debug if your computer reboots every time you make that mistake...

Lastly, the R community appears to be growing much faster and with much more momentum than the Matlab community ever had. Further, as it's free you also don't have deal with the Godforsaken flexlm license manager.

Note: Almost all of my development is in MCMC algorithms right now. I do about 90% in production in C++ with the visualization in R using ggplot2.

Update for Parallel Comments:

A fair amount of my development time right now is spent on parallelizing MCMC routines (it's my PhD thesis). I have used Matlab's parallel toolbox and Star P's solution (which I guess is now owned by Microsoft?? -- jeez another one is gobbled up...) I found the parallel toolbox to be a configuration nightmare -- when I used it, it required root access to every single client node. I think they've fixed that little "bug" now, but still a mess. I found *'p solution to be elegant, but often difficult to profile. I have not used Jacket, but I've heard good things. I also have not used the more recent versions of the parallel toolbox which also support GPU computation.

I have virtually no experience with the R parallel packages.

It's been my experience that parallelizing code must occur at the C++ level where you have a finer granularity of control for task decomposition and memory/resource allocation. I find that if you attempt to parallelize programs at a high level, you often only receive a minimal speedup unless your code is trivially decomposable (also called dummy-parallelism). That said, you can even get reasonable speedups using a single-line at the C++ level using OpenMP:

#pragma omp parallel for

More complicated schemes have a learning curve, but I really like where gpgpu things are going. As of JSM this year, the few people I talked to about GPU development in R quote it as being only "toes in the deep end" so to speak. But as stated, I have minimal experience -- to change in the near future.

Solved – the correct equation of AdaGrad one should use if one aims to use AdaGrad in practice as the automatic way to choose the step size

The update rule

$$w_i^{(t+1)} = w_i^{(t)} - \frac{\eta}{\sqrt{\sum_{\tau=1}^t g_{\tau,i}^2}}g_{t,i},$$

is the composite mirror descent variety of ADAGRAD. Take a look at equation 4 (or 23) from the paper. When there's no regularization term, the update can be derived (take derivative and set to 0) as:

$$ w_{t+1} = w_t - \eta \,\text{diag}(G)^{-1/2} g_t.$$

Notice that $\text{diag}(G)^{-1/2}_{ii} = \frac{1}{\sqrt{\sum_{\tau=1}^t g_{\tau,i}^2}}$ and that multiplying a diagonal matrix by a vector amounts to element-wise multiplication.

$\eta$ is a constant step size which determines the size of the step at each update. I would try several choices and compare them.
Changing an algorithm from SGD to ADAGRAD just requires plugging in your gradient values. That is, you need to keep track of the sum of squared gradient elements for the term in the denominator.

Best Answer

Related Solutions

Solved – Is Matlab/octave or R better suited for monte carlo simulation

Solved – the correct equation of AdaGrad one should use if one aims to use AdaGrad in practice as the automatic way to choose the step size

Related Question