Machine Learning relies on algorithms. Unless you’re a data scientist or ML expert, these algorithms are very complicated to understand and work with.
A machine learning framework, then, simplifies machine learning algorithms. An ML framework is any tool, interface, or library that lets you develop ML models easily, without understanding the underlying algorithms.
There are a variety of machine learning frameworks, geared at different purposes. Nearly all ML the frameworks—those we discuss here and those we don’t—are written in Python. Python is the predominant machine learning programming language.
- TensorFlow
- Caffe
- H2O
- Apache Spark
- Microsoft CNTK
- Accord .NET
- Apache Mahout
- MXNet
- ONNX
- DASK
- MLFlow
- Chainer
Library | Developer(s) | Initial Release | Written In | Type |
---|---|---|---|---|
TensorFlow | Google Brain Team | 2015 | Python, C++, CUDA | Machine Learning |
Caffe | Berkeley Vision and Learning Center | 2017 | C++ | Deep Learning |
H2O | SriSatish Ambati, Cliff Click | 2017 | C++, Python | Statistics |
Apache Spark | Matei Zaharia | 2014 | Scala | Data analytics, Machine Learning Algorithms |
Microsoft CNTK | Microsoft Research | 2016 | C++ | Machine Learning, Deep Learning |
Accord .NET | César Roberto de Souza | 2010 | C# | Data Analytics, Machine Learning |
Apache Mahout | Apache Software Foundation | 2009 | Java, Scala | Machine Learning |
MXNet | Apache Software Foundation | 2015 | C++, Python, R, Java, Julia, JavaScript, Scala, Go, Perl | Machine Learning, Deep Learning |
ONNX | Facebook, Microsoft | 2017 | C++, Python | Artificial Intelligence Ecosystem |
DASK | Matthew Rocklin | 2018 | Python | Data Analytics |
MLFlow | Databricks | 2015 | Python | Machine Learning |
Chainer | Seiya Tokui | 2015 | Python | Deep Learning |
TensorFlow is one of the best framework available for working with Machine Learning on Python. Offered by Google, TensorFlow makes ML model building easy for beginners and professionals alike.
Using TensorFlow, you can create and train ML models on not just computers but also mobile devices and servers by using TensorFlow Lite and TensorFlow Serving that offers the same benefits but for mobile platforms and high-performance servers.
CAFFE (Convolutional Architecture for Fast Feature Embedding) is a deep learning framework, originally developed at University of California, Berkeley.
Caffe supports many different types of deep learning architectures geared towards image classification and image segmentation. It supports CNN, RCNN, LSTM and fully connected neural network designs.Caffe supports GPU- and CPU-based acceleration computational kernel libraries such as NVIDIA cuDNN and Intel MKL
H2O implements algorithms from the field of statistics , data mining and machine learning ( generalized linear models , K-Means , Random Forest , Gradient Boosting and Deep Learning ). The software is based on the Hadoop Distributed File System , so that a performance gain is achieved compared to other analysis tools.
H2O can be viewed graphically using a web browsercan be operated or used via interfaces with R , Python , Apache Hadoop and Spark and executed in Maven .
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.
Microsoft Cognitive Toolkit, previously known as CNTK and sometimes styled as The Microsoft Cognitive Toolkit, is a deprecated deep learning framework developed by Microsoft Research. Microsoft Cognitive Toolkit describes neural networks as a series of computational steps via a directed graph.
The framework comprises a set of libraries that are available in source code as well as via executable installers and NuGet packages. The main areas covered include numerical linear algebra, numerical optimization, statistics, machine learning, artificial neural networks, signal and image processing, and support libraries (such as graph plotting and visualization).
Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily on linear algebra. In the past, many of the implementations use the Apache Hadoop platform, however today it is primarily focused on Apache Spark. Mahout also provides Java/Scala libraries for common maths operations (focused on linear algebra and statistics) and primitive Java collections. Mahout is a work in progress; a number of algorithms have been implemented.
Apache MXNet is an open-source deep learning software framework, used to train, and deploy deep neural networks. It is scalable, allowing for fast model training, and supports a flexible programming model and multiple programming languages (including C++, Python, Java, Julia, Matlab, JavaScript, Go, R, Scala, Perl, and Wolfram Language.)
The Open Neural Network Exchange (ONNX) is an open-source artificial intelligence ecosystem of technology companies and research organizations that establish open standards for representing machine learning algorithms and software tools to promote innovation and collaboration in the AI sector.
Dask is a library composed of two parts. It includes a task scheduling component for building dependency graphs and scheduling tasks. Second, it includes the distributed data structures with APIs similar to Pandas Dataframes or NumPy arrays. Dask has a variety of use cases and can be run with a single node and scale to thousand node clusters.
MlFlow is a framework that supports the machine learning lifecycle. This means that it has components to monitor your model during training and running, ability to store models, load the model in production code and create a pipeline. The framework introduces 3 distinct features each with it's own capabilities.
Chainer is an open source deep learning framework written purely in Python on top of NumPy and CuPy Python libraries. The development is led by Japanese venture company Preferred Networks in partnership with IBM, Intel, Microsoft, and Nvidia.
Chainer is notable for its early adoption of "define-by-run" scheme, as well as its performance on large scale systems. The first version was released in June 2015 and has gained large popularity in Japan since then. Furthermore, in 2017, it was listed by KDnuggets in top 10 open source machine learning Python projects.