Solved – Using Python for building machine learning application

machine learningpython

I'm currently using R to find the best approach to solving a machine learning problem. Once I've got the approach sorted, I will need to build this into an application which can be used by end users. My background is as a .NET developer. I see there are a few questions related to this, but my question is more about what I should use to build an end user product which incorporates machine learning.

From what I've seen so far, R is very powerful but does not integrate very well with other programming languages (and even less well with .NET).

So I'm trying to figure out the best approach for building the app. I see that Python is widely used by the ML community. Is this a good choice for building an app which will be delivered to users, or is it better as a scripting tool for prototyping etc? One benefit I can see is the range of machine learning libraries available, whereas .NET does not have a large range of libraries available. Performance concerns me given that it is interpreted.

Is Python my best choice, or would it be better to build the algorithms I need from scratch in C++, C# etc.?

Best Answer

In my view, Python is a good choice for building the machine learning part (you don't say anything about the rest of your application, so I can't comment of that).

NumPy is powerful and mature, and has lots of numerical packages built on top of it.

For example, SciKits is a suite of such packages. It incorporates scikit-learn, which is

a Python module integrating classic machine learning algorithms in the tightly-knit scientific Python world (numpy, scipy, matplotlib). It aims to provide simple and efficient solutions to learning problems, accessible to everybody and reusable in various contexts: machine-learning as a versatile tool for science and engineering

With regards to performance, native NumPy operations are on par with their BLAS counterparts (they basically are wrappers around BLAS). Thus, NumPy code that can be expressed in terms of vector/matrix operations tends to be as fast as comparable C/Fortran code.

On the flip side, code expressed as Python loops can be slow. Additionally, it is hard to speed things up by using multiple threads. However, there are ways around both of these shortcomings: using multiprocessing instead of threading, numexpr, Cython and so on.