We need this built NOW!

One of the biggest worldwide stories of the last few weeks has been the spread of the deadly coronavirus, which has infected tens of thousands and killed hundreds. In the wake of the panic around this illness, governments around the world have gone to drastic measures to restrict its spread. In the Chinese city of Wuhan, which is the epicenter of this outbreak, officials have quickly scrambled to quarantine and treat those exposed to the coronavirus. The most significant accomplishment of this effort is that they were able to build an entire hospital in the span of just ten days.

Think about that for a second: a brand new, multi-story, 1,000-bed hospital was designed and built in just a week and a half. It took longer than that for the body shop to repair my wife’s car after a minor fender bender. That the officials in Wuhan were able to construct such a facility in so little time, without much time to plan ahead, speaks to both the creativity and resourcefulness of the architects, engineers, and laborers involved in this project.

Ordinarily, building such a facility would require years: clearing the land, laying the foundation, building out the structure, installing utilities, and finishing out the interior each require many months of planning and labor, and this work largely happens consecutively, not concurrently.

So this begs the question: If we can build such a facility in days rather than years, why don’t we always do it that way?

The answer, of course, is that a hospital designed to be built in 10 days is constructed with speed as the only consideration. Treating as many patients as possible, as quickly as possible, is the only goal. As a result, other attributes – quality, durability, maintainability, and comfort – are all ignored to satisfy the only metric that really matters in such a project: time. The interior photos show a facility that looks more like the inside of a U-Haul truck than a hospital. Outside, the exposed ductwork and skeleton-like walls reveal a structure that is unlikely to withstand the rigors of use.

As a data guy, I see this same pattern when building data architectures. Everyone involved in a data project wants to have a perfectly working data pipeline, with validated metrics and user-tested access tools, delivered at or under budget, and ready for use tomorrow. The challenge comes in when deadlines (whether legitimate or invented on the fly) become the only priority, and architects and developers are asked to sacrifice all else to meet a target date. Sure, you can add a lot of hands to the project, like they did by engaging 7,000 people to build the Wuhan hospital. Throwing more people at the problem might get you a solution more quickly, but the same shortcuts to sacrifice quality, durability, and maintainability will need to be made.

When setting schedules with my clients, I sometimes have to work through this same thought exercise. Yes, we could technically build a data warehouse in a week, but it’s going to be lacking in what one would normally expect of such a structure: many important features would be left out, it’ll likely be difficult to maintain, and there would be no room for customization of any type. And, like the temporary Wuhan hospital, it would likely be gone or abandoned in 18 months.

Building something with speed as the only metric is occasionally necessary, but only under the most extreme of circumstances. Creating a data architecture that delivers accuracy, performance, functionality, and durability requires time – time to design, time to develop, time to test, and time to make revisions. Don’t sacrifice quality for the sake of speed.

Building Processes That Fail

“I build processes that never fail.”

failAs I was chatting with a peer who was pitching me on the robustness of the systems they developed, I was struck by the boldness of those words I had just heard. As we chatted about data in general and data pipelines in particular, this person claimed that they prided themselves on building processes that simply did not fail, for any reason. “Tell me more…“, said the curious technologist in me, as I wondered whether there was some elusive design magic I had been missing out on all these years.

As the conversation continued, I quickly surmised that this bold prediction was a recipe for disaster: one part wishful thinking, one part foolish overconfidence, with a side of short-sightedness. I’ve been a data professional for 17-someodd years now, and every data process I have ever seen has one thing in common: they have all failed at some point. Just like every application, every batch file, every operating system that has ever been written.

Any time I build a new data architecture, or modify an existing one, one of my principal goals is to create as robust an architecture as possible: minimize downtime, prevent errors, avoid logical flaws in the processing of data. But my experience has taught me that one should never expect that any such process will never fail. There are simply too many things that can go wrong, many of which are out of the control of the person or team building the process: internet connections go down, data types change unexpectedly, service account passwords expire, software updates break previously-working functionality. It’s going to happen at some point.

Failing gracefully

Rather than predicting a failure-proof outcome, architects and developers can build a far more resilient system by first asking, “What are the possible ways in which this could fail?” and then building contingencies to minimize the impact of a failure. With data architectures, this means anticipating delays or failure in the underlying hardware and software, coding for changes to the data structures, and identifying potential points of user error. Some such failures can be corrected as part of the data process; in other cases, there should be a soft landing to limit the damage.

Data processes, and applications in general, should be built to fail. More specifically, they should be built to be as resilient as possible, but with enough smarts to address the inevitable failure or anomaly.

[This post first appeared in the Data Geek Newsletter.]

Welcome, Joshua Ferguson!

Here we grow! Thanks to the numerous clients we have partnered with in the past year, Tyleris Data Solutions is expanding to add another skilled data architect to our team.

Joshua FergusonWe are proud to welcome Joshua Ferguson as the newest member of the Tyleris team. Joshua is a highly skilled technologist and a pragmatic problem solver, with a keen ability to bridge the gap between business needs and technical specifications. He studied informatics as an undergraduate, and later earned his Master’s Degree in Computer Science from Arizona State University.

Joshua has worked in various industries throughout his technical career, most recently having worked as a business intelligence architect at a healthcare company. He currently resides in Japan where his wife teaches English to second-language learners.

Joshua has already gotten plugged in to some exciting work with Tyleris clients, and you will likely see more from him both in our professional engagements as well as through our blog and on social media. We are delighted to have him on board!

Join us in Consultant Corner at SQL Saturday Dallas

Do you have questions about business intelligence, analytics, Power BI, or data architecture? If so, we would love to chat with you at the SQL Saturday Dallas event this spring.

On June 1st of this year, we will be hosting a Consultant Corner at SQL Saturday Dallas. Consultant Corner is a casual space where you can have one-on-one conversations with data experts. If you have specific “how do I …?” questions, or if you are just looking for general advice about the business intelligence and analytics landscape, we would love to chat.

We are co-hosting this event with our friends over at 28twelve Consulting. Like us, they are focused on building outstanding solutions in the Microsoft stack, and are great at helping to navigate folks through the multifaceted world of business intelligence.

Registration for SQL Saturday Dallas is free, with an optional on-site lunch for $12. We will be set up in the Consultant Corner in the vendor area all day. We look forward to seeing you there!

What To Look For When Hiring A Data Professional

What To Look For When Hiring A Data Professional

check_smFinding just the right data professional to hire is one of the most challenging tasks an organization can undertake. While hiring a team member for any role requires a great deal of work and care, the role of the data professional is particularly challenging to fill. From day 1, the data professional will have access to and responsibility over the company’s most valuable asset. These roles usually require a mix of hard skills and soft skills, and often require engagement with people at every level, from peers to executive leadership.

Finding the right person

Here at Tyleris Data Solutions, we are getting ready to grow our team this year. In preparation, I have been thinking a lot about the attributes that we should look for in our new team member. While there will always be a longer and more specific list of needs for each role, these are the attributes I have identified that I look for in every data professional.

Integrity. This one is first on the list for a reason, and is the one attribute where compromise is not acceptable. Data professionals have vast access to an organization’s data, and if that information were to be lost or stolen, it could literally end the company. The thing about integrity is that it is almost impossible to fully assess in an interview. Learning about a person’s level of integrity takes time and effort, which is why hiring a data professional should be a slow process.

Intellectual curiosity. Among all of the technical professionals I’ve worked with, I’ve learned that those with a strong intellectual curiosity tend to be more effective. Team members with this attribute often go out of their way to learn about other areas of the business or technical architecture that aren’t necessarily required for the job, leading to a better big-picture view of how the organization uses data.

A positive and empathetic attitude. Increasingly, data professionals have highly visible roles, requiring them to engage with peers, superiors, customers, and clients. Their attitude is the backdrop for each one of those interactions, so it is essential that the data professional come to the table in the right frame of mind. Having an empathy for one’s constituents will improve the quality of the job one performs.

Technical aptitude. The data field is rapidly evolving, and requires of data professionals the willingness and ability to quickly learn new things. Hiring staff members with technical aptitude will help to build a team that is adaptable and can assimilate into new technologies quickly.

Initiative. There are folks who wait to be told exactly what to do, and others who go figure out what needs to be done and then do it. Not every team member has to have this go-getter attitude, but each team needs at least a few people with this characteristic.

Experience. I put this one at the bottom of the list for a reason. It’s not that experience isn’t important – it is! – but of all the items on this list, experience is the one thing that the organization can give to the team member after they are hired. A person with minimal experience but who possesses all of the other attributes on this list is going to be a very compelling candidate.

Hiring is hard. Hiring technical professionals is especially challenging, and is critical to get right. While technical skills are important, finding the person with integrity, attitude, and aptitude will help to build a solid team.

This post was originally published in my Data Geek Newsletter.

Our Relationship with Facebook

At Tyleris Data Solutions, we are data people, and by extension, our first and primary role is that of data stewards. With each and every one of our relationships, our overriding concern beyond all other tasks is the security and privacy of data. In the partnerships that we build with other companies, we look for a similar level of care and concern around protecting the data.

Since our inception, we have used Facebook, both as a social media platform as well as an advertising outlet. During the past year, we have become aware of a number of serious security and privacy issues around Facebook’s protection of and use of data. We strive to only do business with organizations whose partnerships reflect well on us, and vice versa. Based on what the data breaches and the privacy decisions within Facebook, we feel that we can no longer engage with them in any capacity.

Starting today, we will no longer be updating or monitoring our Facebook page, nor will we be responding to any messages sent through Facebook Messenger on that page. In addition, we will discontinue indefinitely all advertising on Facebook.

We are still available for our clients and followers on our website, our newsletter, or by telephone at 214/509-6570. We are also on Twitter, and have recently established a presence on MeWe, a promising new social media platform that is very focused on data privacy.

As always, thanks for your business and for your attention. Feel free to contact us with any questions.

Webinar: Getting Started with Change Tracking in SQL Server

Change TrackingStart your summer off right by brushing up on a highly effective change detection technique! We will be hosting a webinar, Getting Started with Change Tracking in SQL Server, on Friday, June 8th at 11:00am CDT.

In this webinar, I’ll walk you through the essentials of change tracking in SQL Server: what it is, why it’s important, and how it fits into your data movement strategy. I’ll walk through demos to give you realistic examples of how to use change tracking.

Registration is free and is open now. I hope to see you there!

Famous Last Words: “That Should Never Happen”

That should never happen, right?When it comes to managing data, there is often a difference between what should happen and what can happen. That space between should and can is often challenging, forcing data professionals to balance risk with business value. A couple of common examples:

“There should not be any sales transactions without a valid customer, but because the OLTP system doesn’t use foreign keys, it could theoretically happen.”

“Location IDs should be less than 20 characters, but those aren’t curated so they could exceed 20 characters.”

“This list of product IDs from System A should match those in System B, but because they are entered manually it is possible to have a few typos.”

“This data warehouse should only contain data loaded as part of our verified ETL process, but since our entire data team has read-write permission on the data warehouse database, it’s possible that manual data imports or transformations can be done.”

“This field in the source system should always contain a date, but the data type is set to ASCII text so it might be possible to find other data in there.”

“The front-end application should validate user input, but since it’s a vendor application we don’t know for certain that it does.”

Managing data and the processes that move data from one system to another requires careful attention to the data safeguards and the things that can happen during input, storage, and movement. As a consultant, I spend a lot of time in design sessions with clients, discussing where data comes from, how (if at all) it is curated in those source systems, and what protections should be built into the process to ensure data integrity. In that role, I’ve had this conversation, almost verbatim, on dozens of occasions:

Me: “Is it possible that <data entity> might not actually be <some data attribute>?”

Client: “No, that should never happen.”

Me: “I understand that it shouldn’t. But could it?”

Client (after a long silence): “Well…. maybe.”

Building robust systems requires planning not just for what should happen, but for what could happen. Source systems may not include referential integrity to avoid situations that are impossible in business but technically possible inside the data store. Fields that appear to store one type of data might be structured as a more generic type, such as text. Data that should be in curated lists can sometimes contain unvalidated user input. None of these things should happen, but they do. When designing a data model or ETL process, be sure that you’re asking questions about what protections in place to make sure that the things that shouldn’t happen, don’t.

Flexible Consistency

Flexible ConsistencyDuring last week’s PASS Summit day 2 keynote, Rimma Nehme discussed some off the architecture behind Cosmos DB, the globally-distributed flexible database system rolled out earlier this year. Dr. Nehme’s presentations are almost always buzz-worthy, with a great mix of technical depth and big-picture vision. Take 90 minutes to watch the recording; you won’t be disappointed.

Among the many notable points in this presentation was a journey through the concept of database consistency. Database consistency, also referred to as transactional consistency, is the attribute of database systems describing the coupling of different parts of a single transaction.

The simplest example of this is the data tasks that go into a banking transfer. In the database, money doesn’t “move” from one account to another; rather, the amount of money in one account is decreased by adding a credit line item, and the same amount is added to the other account by inserting a debit line item. Although these are two separate operations, they should be treated as one, with the whole operation succeeding or failing as one transaction. In a transactionally consistent system, the two parts of the transaction would not be visible unless both of them were complete (so you wouldn’t be able to query the accounts at the moment of transfer to see that money had been removed from one but not yet added to the other).

During Dr. Nehme’s keynote, she pointed out that most of us look at database consistency as a binary choice: either you have full transactional consistency or you don’t. However, data workloads aren’t always that simple. In Cosmos DB, there are actually five different consistency models to choose from.

Flexible Consistency

For those of us who have spent most of our careers working with ACID-compliant operations, having this kind of flexibility in database consistency requires a shift in thinking. There are many business cases where such transactional consistency is not critical, and the rise of NoSQL-type systems has demonstrated that, for some workloads, transactional consistency is not worth the latency it requires.

Some of us old-school data folks (this guy included) have in the past poked fun at other consistency models, using the term “eventually consistent” as a punch line rather than a real consideration. However, it’s no longer safe to assume that every database system must have absolute transactional consistency. As workloads become larger and more widely distributed, the use of less-restrictive consistency settings will be essential for both performance and availability.

This certainly doesn’t mean that transactionally consistent systems are going away. It just means that the classic RDBMS way of maintaining consistency is no longer the only way to do things.