14 open source tools to make the most of machine learning

Spam filtering, face recognition, recommendation engines — when you have a large data set on which you’d like to perform predictive analysis or pattern recognition, machine learning is the way to go. The proliferation of free open source software has made machine learning easier to implement both on single machines and at scale, and in most popular programming languages. These open source tools include libraries for the likes of Python, R, C++, Java, Scala, Clojure, JavaScript, and Go.

Apache Mahout

Apache Mahout provides a way to build environments for hosting machine learning applications that can be scaled quickly and efficiently to meet demand. Mahout works mainly with another well-known Apache project, Spark, and was originally devised to work with Hadoop for the sake of running distributed applications, but has been extended to work with other distributed back ends like Flink and H2O.

Mahout uses a domain specific language in Scala. Version 0.14 is a major internal refactor of the project, based on Apache Spark 2.4.3 as its default.


Compose, by Innovation Labs, targets a common issue with machine learning models: labeling raw data, which can be a slow and tedious process, but without which a machine learning model can’t deliver useful results. Compose lets you write in Python a set of labeling functions for your data, so labeling can be done as programmatically as possible. Various transformations and thresholds can be set on your data to make the labeling process easier, such as placing data in bins based on discrete values or quantiles.

Core ML Tools

Apple’s Core ML framework lets you integrate machine learning models into apps, but uses its own distinct learning model format. The good news is you don’t have to pretrain models in the Core ML format to use them; you can convert models from just about every commonly used machine learning framework into Core ML with Core ML Tools.

Core ML Tools runs as a Python package, so it integrates with the wealth of Python machine learning libraries and tools. Models from TensorFlow, PyTorch, Keras, Caffe, ONNX, Scikit-learn, LibSVM, and XGBoost can all be converted. Neural network models can also be optimized for size by using post-training quantization (e.g., to a small bit depth that’s still accurate).

Copyright © 2020 IDG Communications, Inc.

Source link