Kaggle: Where data scientists learn and compete

Data science is typically more of an art than a science, despite the name. You start with dirty data and an old statistical predictive model and try to do better with machine learning. Nobody checks your work or tries to improve it: If your new model fits better than the old one, you adopt it and move on to the next problem. When the data starts drifting and the model stops working, you update the model from the new dataset.

Doing data science in Kaggle is quite different. Kaggle is an online machine learning environment and community. It has standard datasets that hundreds or thousands of individuals or teams try to model, and there’s a leaderboard for each competition. Many contests offer cash prizes and status points, and people can refine their models until the contest closes, to improve their scores and climb the ladder. Tiny percentages often make the difference between winners and runners-up.

Kaggle is something that professional data scientists can play with in their spare time, and aspiring data scientists can use to learn how to build good machine learning models.

What is Kaggle?

Looked at more comprehensively, Kaggle is an online community for data scientists that offers machine learning competitions, datasets, notebooks, access to training accelerators, and education. Anthony Goldbloom (CEO) and Ben Hamner (CTO) founded Kaggle in 2010, and Google acquired the company in 2017.

Kaggle competitions have improved the state of the machine learning art in several areas. One is mapping dark matter; another is HIV/AIDS research. Looking at the winners of Kaggle competitions, you’ll see lots of XGBoost models, some Random Forest models, and a few deep neural networks.

Kaggle competitions

There are five categories of Kaggle competition: Getting Started, Playground, Featured, Research, and Recruitment.

Copyright © 2020 IDG Communications, Inc.

Source link