Beyond NoSQL: The case for distributed SQL

In the beginning, there were files. Later there were navigational databases based on structured files. Then there were IMS and CODASYL, and around 40 years ago we had some of the first relational databases. Throughout much of the 1980s and 1990s “database” strictly meant “relational database.” SQL ruled

Then with the growing popularity of object-oriented programming languages, some thought the solution to the “impedance mismatch” of object-oriented languages and relational databases was to map objects in the database. Thus we ended up with “object-oriented databases.” The funny thing about object databases was that in many cases they were basically a normal database with an object mapper built-in. These waned in popularity and the next real mass-market attempt was “NoSQL” in the 2010s.

The attack on SQL

NoSQL attacked both relational databases and SQL in the same vein. The main problem this time was that the Internet had destroyed the underlying premise of the 40-year-old relational database management system (RDBMS) architecture. These databases were designed to conserve precious disk space and scale vertically. There were now way too many users and way too much for one fat server to handle. NoSQL databases said that if you had a database with no joins, no standard query language (because implementing SQL takes time), and no data integrity then you could scale horizontally and handle that volume. This solved the issue of vertical scale but introduced new problems.

Developed in parallel with these online transaction processing systems (OLTP) was another type of mainly relational database called an online analytical processing system (OLAP). These databases supported the relational structure but executed queries with the understanding that they would return massive amounts of data. Businesses in the 1980s and 1990s were still largely driven by batch processing. In addition, OLAP systems developed the ability for developers and analysts to imagine and store data as n-dimensional cubes. If you imagine a two-dimensional array and lookups based on two indices so that you are basically as efficient as constant time but then take that and add another dimension or another so that you can do what are essentially lookups of three or more factors (say supply, demand, and the number of competitors)—you could more efficiently analyze and forecast things. Constructing these, however, is laborious and a very batch-oriented effort.

Around the same time as scale-out NoSQL, graph databases emerged. Many things are not “relational” per se, or not based on set theory and relational algebra, but instead on parent-child or friend-of-a-friend relationships. A classic example is product line to product brand to model to components in the model. If you want to know “what motherboard is in my laptop,” you find out that manufacturers have complicated sourcing and the brand or model number may not be enough. If you want to know what-all motherboards are used in a product line, in classic (non-CTE or Common Table Expression) SQL you have to walk tables and issue queries in multiple steps. Initially, most graph databases didn’t shard at all. In truth, many types of graph analysis can be done without actually storing the data as a graph.

NoSQL promises kept and promises broken

NoSQL databases did scale much, much better than Oracle Database, DB2, or SQL Server, which are all based on a 40-year-old design. However, each type of NoSQL database had new restrictions:

Copyright © 2020 IDG Communications, Inc.

Source link