Flexible Consistency

Flexible ConsistencyDuring last week’s PASS Summit day 2 keynote, Rimma Nehme discussed some off the architecture behind Cosmos DB, the globally-distributed flexible database system rolled out earlier this year. Dr. Nehme’s presentations are almost always buzz-worthy, with a great mix of technical depth and big-picture vision. Take 90 minutes to watch the recording; you won’t be disappointed.

Among the many notable points in this presentation was a journey through the concept of database consistency. Database consistency, also referred to as transactional consistency, is the attribute of database systems describing the coupling of different parts of a single transaction.

The simplest example of this is the data tasks that go into a banking transfer. In the database, money doesn’t “move” from one account to another; rather, the amount of money in one account is decreased by adding a credit line item, and the same amount is added to the other account by inserting a debit line item. Although these are two separate operations, they should be treated as one, with the whole operation succeeding or failing as one transaction. In a transactionally consistent system, the two parts of the transaction would not be visible unless both of them were complete (so you wouldn’t be able to query the accounts at the moment of transfer to see that money had been removed from one but not yet added to the other).

During Dr. Nehme’s keynote, she pointed out that most of us look at database consistency as a binary choice: either you have full transactional consistency or you don’t. However, data workloads aren’t always that simple. In Cosmos DB, there are actually five different consistency models to choose from.

Flexible Consistency

For those of us who have spent most of our careers working with ACID-compliant operations, having this kind of flexibility in database consistency requires a shift in thinking. There are many business cases where such transactional consistency is not critical, and the rise of NoSQL-type systems has demonstrated that, for some workloads, transactional consistency is not worth the latency it requires.

Some of us old-school data folks (this guy included) have in the past poked fun at other consistency models, using the term “eventually consistent” as a punch line rather than a real consideration. However, it’s no longer safe to assume that every database system must have absolute transactional consistency. As workloads become larger and more widely distributed, the use of less-restrictive consistency settings will be essential for both performance and availability.

This certainly doesn’t mean that transactionally consistent systems are going away. It just means that the classic RDBMS way of maintaining consistency is no longer the only way to do things.

Designing in Absolutes

Designing in Absolutes

Designing in AbsolutesThere are absolutes that are true of data modeling and architecture, but these are fewer in number than most people think. There is a liberal use of the words “always” and “never” handed out as technical advice, and while it is usually well-meaning, can lead to a design myopia that limits one’s ability to adapt to atypical application needs.

“You should always have a restorable backup for your production databases.” It would be hard to find anyone to argue a counterpoint to that statement. Similarly, a declaration that all source code should be stored in some form of source control is a generally accepted truism for any data project (or any other initiative based on code, for that matter). Most such absolutes are broad and generalized, and are applicable regardless of architecture, operating system, deployment platform (cloud or on-prem), or geographic location.

It’s much more rare to find absolutes that apply to specific design principles. However, that doesn’t keep some folks from incorrectly asserting absolutes. As I wrote in a post last year entitled Technical Dogma, we are creatures of habit and tend to favor tools or solutions we already know. This tendency coupled with repetition leads to a sort of muscle memory in which we become loyal – sometimes to a fault – to the methods we prefer to build things.

Designing in Absolutes

When we assume that a particular way of doing things is the only way to do it, we make assertions such as the following:

  • Every dimensional design should be built as a star schema. There are no valid reasons to build a snowflake schema.
  • You should never use the T-SQL MERGE statement to load data.
  • Anything with more than a terabyte of data belongs on premises, not in the cloud.
  • I’ll never use ETL again. Big data tools can do everything ETL can, and more.
  • Database triggers should never be used.

These aren’t anecdotal examples. I’ve heard every one of these recently. To be fair, those who declare such preferences to be truisms rarely do so with nefarious intent, but such statements can have negative consequences. Building a solution with the assumption that a particular design pattern must always be used is risky, as it can lead to an inflexible solution that does not account for nuances of the particular application.

When I write about best practices, I am very cautious about speaking in absolutes. Even in my ETL Best Practices series, which represents my experience at having built hundreds of ETL processes over the past decade, I generously use the terms “usually”, “typically”, and “with few exceptions”. I do so not out of a fear to commit, but to be as accurate as possible. As with any other collection of best practices, there will be exceptions and edge cases which may seem to violate one of the principles of a typical design, but are entirely appropriate for some less-common design patterns. Providing the business with the data it needs, not the adherence to a particular set of design patterns, is the ultimate measure of success for any data project.

There are some absolute always-or-never cases in solution design. However, these are few in number and typically vague. Try to focus less on what should always (or never) be true, and more on the needs and nuances of the project at hand.

~~

This post was originally published in my Data Geek Newsletter.