10 MLops platforms to manage the machine learning lifecycle

For most professional software developers, using application lifecycle management (ALM) is a given. Data scientists, many of whom do not have a software development background, often have not used lifecycle management for their machine learning models. That’s a problem that’s much easier to fix now than it was a few years ago, thanks to the advent of “MLops” environments and frameworks that support machine learning lifecycle management.

What is machine learning lifecycle management?

The easy answer to this question would be that machine learning lifecycle management is the same as ALM, but that would also be wrong. That’s because the lifecycle of a machine learning model is different from the software development lifecycle (SDLC) in a number of ways.

To begin with, software developers more or less know what they are trying to build before they write the code. There may be a fixed overall specification (waterfall model) or not (agile development), but at any given moment a software developer is trying to build, test, and debug a feature that can be described. Software developers can also write tests that make sure that the feature behaves as designed.

By contrast, a data scientist builds models by doing experiments in which an optimization algorithm tries to find the best set of weights to explain a dataset. There are many kinds of models, and currently the only way to determine which is best is to try them all. There are also several possible criteria for model “goodness,” and no real equivalent to software tests.

Unfortunately, some of the best models (deep neural networks, for example) take a long time to train, which is why accelerators such as GPUs, TPUs, and FPGAs have become important to data science. In addition, a great deal of effort often goes into cleaning the data and engineering the best set of features from the original observations, in order to make the models work as well as possible.

Keeping track of hundreds of experiments and dozens of feature sets isn’t easy, even when you are using a fixed dataset. In real life, it’s even worse: Data often drifts over time, so the model needs to be tuned periodically.

Copyright © 2020 IDG Communications, Inc.

Source link