
We look forward to presenting Transform 2022 in person again on July 19 and virtually from July 20 to 28. Join us for insightful conversations and exciting networking opportunities. Register today!
Data can be a company’s most valuable asset – it can even be more valuable than the company itself. However, if the data is inaccurate or constantly delayed due to delivery problems, a company cannot properly use it to make informed decisions.
Having a solid understanding of a company’s data assets is not easy. Environments are changing and becoming increasingly complex. Tracing the lineage of a record, analyzing its dependencies, and keeping documentation up to date are resource-intensive tasks.
This is where data operations (dataops) come into play. Dataops — not to be confused with its cousin Devops — began as a set of best practices for data analysis. Over time, it developed into an independent practice. Here’s its promise: Dataops helps accelerate the data lifecycle, from building data-centric applications to delivering accurate business-critical information to end users and customers.
Dataops came about because there were inefficiencies in the data inventory in most companies. Different IT silos didn’t communicate effectively (if they communicated at all). The tools developed for one team – using the data for a specific task – often prevented another team from gaining visibility. Integrating data sources was haphazard, manual, and often problematic. The sad result: the quality and value of the information delivered to end users fell short of expectations or was just plain inaccurate.
Although dataops offers a solution, C-suite members may worry that the promises may be high and the value low. It can seem like a risk to disrupt already existing processes. Do the benefits outweigh the inconveniences of defining, implementing and launching new processes? In my own organizational debates on this topic, I often quote and refer to the rule of ten. It costs ten times as much to complete an order when the data is wrong than when the information is good. With that argument, Dataops is critical and worth the effort.
You may already be using Dataops but don’t know it
Overall, Dataops improves communication between data stakeholders. It frees businesses from their burgeoning data silos. Dataops is nothing new. Many Agile organizations already practice Dataops constructs, but they may not use the term or be unaware of it.
Dataops can be transformative, but like any great framework, some ground rules are required to be successful. Here are the top three must-haves for effective dataops.
1. Commit to observability in the Dataops process
Observability is fundamental to the entire Dataops process. It gives organizations a bird’s-eye view of their continuous integration and continuous delivery (CI/CD) pipelines. Without observability, your organization cannot safely automate or deploy continuous delivery.
In a skilled development environment, observability systems provide that holistic view—and that view needs to be accessible across departments and integrated into those CI/CD workflows. When you commit to observability, position it to the left of your data pipeline—monitoring and optimizing your communications systems before data enters production. You should begin this process when designing your database and observing your non-production systems along with the various consumers of that data. This lets you see how well apps are interacting with your data – before the database goes liveon.
Monitoring tools can help you stay better informed and perform more diagnostics. In turn, your troubleshooting recommendations improve and help fix errors before they become problems. Monitoring gives data professionals context. But remember to abide by the “Hippocratic Oath” of surveillance: First, do no harm.
If your monitoring is causing so much overhead that your performance is being degraded, you’ve crossed a line. Make sure your overhead is low, especially when adding observability. When data monitoring is viewed as the basis of observability, data professionals can ensure operations are performing as expected.
2. Map your dataset
You need to know your schemas and your data. This is fundamental to the Dataops process.
First, document all of your data to understand changes and their impact. When database schemas change, you need to assess their impact on applications and other databases. This impact analysis is only possible if you know where your data is coming from and where it is going.
Beyond database schema and code changes, you need to control privacy and compliance with a complete view of data lineage. Tag the location and type of data, especially personally identifiable information (PII) — know where all your data is and where it’s going. Where is sensitive information stored? What other apps and reports does this data flow through? Who can access it through each of these systems?
3. Automate data validation
The widespread adoption of devops has led to a shared culture of unit testing for code and applications. Often overlooked is testing the data itself, its quality, and how it works (or doesn’t work) with code and applications. Effective testing of data requires automation. It also requires constant testing with your most recent data. New data is not tried and true, it is volatile.
To ensure you have the most stable system available, test with the most volatile data you have. Break things up early. Otherwise you push inefficient routines and processes into production and experience a nasty surprise when it comes to costs.
The product you use to test this data—whether it’s a third-party product or you write your scripts yourself—must be robust and part of your automated test and build process. As the data moves through the CI/CD pipeline, you should conduct quality, access, and performance tests. In short, you want to understand what you have before you use it.
Dataops is critical to becoming a data company. It is the ground floor of data transformation. Know what you already have and what you need to take it to the next level with these three must-haves.
Douglas McDowell is general manager of database at SolarWinds.
data decision maker
Welcome to the VentureBeat community!
DataDecisionMakers is the place where experts, including technical staff, working with data can share data-related insights and innovations.
If you want to read about innovative ideas and up-to-date information, best practices and the future of data and data technology, visit us at DataDecisionMakers.
You might even consider contributing an article of your own!
Read more from DataDecisionMakers