The value of time series data and TSDBs

Time series data, also called time-stamped data, is data that is observed sequentially over time and that is indexed by time. Time series data is all around us. Because all events exist in time, we are in constant contact with an immense variety of time series data.

Time series data is used for tracking everything from weather, birth rates, disease rates, heart rates, and market indexes to server, application, and network performance. Analysis of time series data plays an important role in disciplines as varied as meteorology, geology, finance, social sciences, physical sciences, epidemiology, and manufacturing. Monitoring, forecasting, and anomaly detection are some of its main use cases.

Why is time series data important?

The value of time series data resides in the insights that can be extracted from tracking and analyzing it. Understanding how specific data points change over time forms the foundation for many statistical and business analyses. If you can track how the stock price has changed over time, you can make a more educated guess about how it might perform over the same interval in the future. Analyzing time series data can lead to better decision making, new revenue models, and faster business innovation. To learn how various industries are putting time series to work for their use case, read some of these time series case study examples.

Time series data examples

Time series data isn’t just about measurements that happen in chronological order, but also about measurements whose value increases when you add time as an axis. To determine if your dataset is time series, check if one of your axes is time. For example, time series data can be used to track changes—over time—in the temperature of an indoor space, the CPU utilization of some software, or the price of a stock.

Time series data can be classified into two categories: regular and irregular time series data, or in other words metrics and events. Here are some examples:

  • Regular time series data (metrics): Daily stock prices, quarterly profits, annual sales, weather data, river flow rates, atmospheric pressure, heart rate, and pollution data are all examples of regular time series data. Regular time series data are collected at regular time intervals and are called metrics.
  • Irregular time series data (events): Time series data can also occur at irregular time intervals and are then called events. Examples include logs and traces, ATM withdrawals, account deposits, seismic activity, logins or account registrations, content consumption, and manufacturing or production process data like processing time, inspection time, move time, and queue time.

Time series data sometimes exhibit high granularity, as frequently as microseconds or even nanoseconds.

Features and functions of time series databases

Time series data requires a database that is optimized for measuring change over time and that is capable of handling high volume workloads. Time series databases (TSDBs) were designed specifically to support the ingestion, storage, and analysis of time series data.

Time series databases in recent years have become the fastest growing database segment, concurrent with the rapid growth of IoT, big data, and artificial intelligence technologies, all of which require the processing and analysis of vast volumes of time series data at a high ingestion rate. Examples of time series databases include InfluxDB, Prometheus, and Graphite.

Important features of a time series database include the following:

  • Data lifecycle management: The process of managing the flow of data through its lifecycle from collection and ingestion to aggregation, processing, and expiration.
  • Summarization: The practice of presenting a meaningful summary of your data through flexible queries, transformations, visualizations, and dashboards.
  • Large range scans of many records: Scans of millions of time series records is a frequent requirement for many time series use cases. These types of scans require specialized software like time series databases that utilize purpose-built compression, indexing, and spatial generalization algorithms that enable users to quickly write, query, and visualize millions of points.

These features are designed to facilitate large-scale processing of large volumes of time series data. Common tasks of a time series database include the following:

  • Write high volumes of data. Whether you’re collecting and writing data at the nanosecond precision for high frequency trading or collecting data from hundreds of thousands of sensors, time series databases are optimized for high ingest rates that other databases simply can’t handle.
  • Request a summary of data over a large time period. Collecting summaries of your data over large time periods helps you gain valuable insights into the behavior of the data overall. For example, you might want to look at the mean monthly temperature of various cities for many years before deciding which city you want to move to.
  • Automatically downsample or expire old time series that are no longer useful or keep high-precision data around for a short period of time. For example, monitoring the pressure of a pipe in a chemical plant every minute could be critical for upholding safety standards during operation. However, that data doesn’t need to be retained at a high precision forever. A time series database should allow the user to downsample that minute precision data to a daily average.

The design of time series databases

Time series databases should also follow some of the below design principles in order to optimize for time series data:

  • Scale is critical: A time series database must be able to handle the high write and query rates required by common time series use cases such as IoT, application monitoring, and fintech.
  • No one point is too important: Those who collect time series data are more interested in the overall behavior of a system rather than an individual point among the countless points collected daily. Therefore updates and deletes are a rare occurrence. Restricting delete and update functionality allows you to prioritize high-ingest volumes and query rates, and enables users to gain valuable insights about their system.

Purpose-built time series databases outperform relational databases in handling time series data. Time series databases can easily handle large sets of time-stamped data, they can be used for real-time monitoring, and they make it easy to manage your data lifecycle. This ease of use—especially if the TSDB has no dependencies, has a built-in GUI, and integrates well with other technologies—means faster time to launch for application builders putting time series data to work for their projects.

Anais Dotis-Georgiou is a developer advocate for InfluxData with a passion for making data beautiful with the use of data analytics, AI, and machine learning. She takes the data that she collects and applies a mix of research, exploration, and engineering to translate the data into something of function, value, and beauty. When she is not behind a screen, you can find her outside drawing, stretching, boarding, or chasing after a soccer ball.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Copyright © 2021 IDG Communications, Inc.

Source link

Exit mobile version