Several years ago, Tom Davenport defined the notions of data defense and data offense. In case you aren't familiar with these terms, here is a chart that summarizes where data activities fit into data defense and data offense.
The critical questions, for me, are: Where are CIOs and CDOs in their journeys to build out their data chops? And how advanced is their data defense? I posed these in a recent #CIOChat.
Former CIO Mike Kail kicked off the discussion by saying that “IT should be the stewards of data veracity.” Supporting this idea, Analyst Dion Hinchcliffe suggested that IT take a leadership role in every data defense with the possible exception of source tracking, which he believes should be shared with anyone collecting data. "Cleanliness is a shared task," Hinchcliffe said. "But in my opinion, the rest should be job number one for IT.”
Hinchcliffe stated that IT's top data struggles are due to 1) application silos; 2) poor integration; 3) lack of master data; 4) poor quality sources/ingestion processes; 5) limited control over underlying databases; and 6) cut and paste processes. Given these struggles, he suggested that "data defense should be about a lot more than just the integrity of data."
Hinchcliffe’s list includes the following data topics:
Former CIO Tim McBreen agreed with Hinchcliffe, saying he still sees a lot of issues occurring with master data reconciliation. "IT organizations tend to fix things too far down the data path in analytics versus transactional or operational systems,” he explained.
Kail shared the same concern, suggesting that “having secure, automated data ingest pipelines that allow pollutants turns data streams and lakes into murky swamps.”
Given this, it is not surprising that former CIO Theresa Rowe said she thinks of patient intake in emergency rooms as presenting a role model for data intake. "Existing health records must be accurately matched," Rowe explained. "Data updates must be fast and correct. Only when this the case, is data ready for population and individual analysis and research.”
In terms of the responsibilities for data management, CIO Stephen DiFilipo argued that “data defense is the responsibility of the entire organization. If you touch data, then there is an obligation to ensure data integrity. It all starts and ends with well structured, enterprise level data governance.” However, former CIO Peter Weis claimed, “The hard reality is that even though data needs to be a participation sport across the enterprise, the CIO above everyone else is accountable. And where the CIO attempts to defer or deflect responsibility, it will come across as small and defensive.”
CIO Dennis Klemenz said that his biggest data issue often is usage. "Exporting to manipulate data is critical for ad hoc analysis but integrity can become an issue as more analyze data. That's why data lineage is so important. Lineage should answer where did this data come from? And keeping the lineage with data tied to trusted sources helps.”
Klemenz isn't alone in the struggle. CIO David Seidl said, “I see the stumbling blocks around data definitions. Divisions need data that fits their needs. Organizations need data defined. When two sources don't align and aren't called out, you end up with data integrity issues. And when you can't trust your data, you shouldn't use it.” Meanwhile, diFilipo finds visualization and reporting to be a constant struggle regarding data usefulness at his organization. "Those without process thinking tend to not understand the nuances and complexities of data structures and relationships to properly surfacing data as information,” he said.
diFilipo also said that data definitions are particularly challenging in higher education. "The Integrated Postsecondary Education Data System (IPEDS), State, Federal, and universities all have differing definitions and values for what appear to be the same data points,” he explained. diFilipo used defining the field for 'student' as an example. "I start by determining which is the system of record," he said, "and where in the student journey does each system function within that system of record.”
Seidl agrees this is important. “Defining 'student' helped people at our organization understand what they thought was settled fact, [which] wasn't the reality others were living in," he said. "With this, we can talk systems of record, having recognized there's a gap and a need to solve it.”
According to Hinchcliffe, “Lines of business are pretty good at being the owner of their data." But, he continued, "They are unaware of or often unwilling to be data steward beyond their own function." For this reason, he said, the line of business doesn't appreciate the highly strategic nature of digital data.
"As a former enterprise architect and integration lead, I've seen horrors that come from delivering data integrity, even within key systems," Hinchcliffe explained. "This includes the dozen customer databases, unstructured data crammed into structured fields, etc. Approximately 60 percent of the work on AI projects remains focused upon data wrangling."
McBreen personally has had good luck in building data governance programs, including having business representatives step up to be data stewards. "They need coaching from IT custodians but are able to work pretty well," he said. "This allows for the finding of problems when business is doing their own audits, etc. I have seen it work well in the 20 percent of clients [for whom] I have either built or audited data governance programs. The ones that worked best seemed to have a great rotation of stewards brought into metadata along with data hubs for managing master data.”
However, Klemenz insists that “Some are better than others. Data governance is about getting everyone on the same page regarding data and data usage. Some businesses do this well with governance committees and metadata. Others just use descriptive field names. There is no one way to govern data.”
Hinchcliffe said that “there is vast, growing data sprawl as IT proliferates,” and effective data governance is usually an afterthought. The benefits are seen as indirect, there is no direct ROI, no executive sponsor, and projects are already in progress without guidance. Seidl simply stated, “Data governance is generally not well managed.”
Data governance is especially tough to maintain given the sheer volume of new data and data sources being added at the enterprise level. "Today, most organizations lose the daily battle with master data and data governance as they accumulate an average of two-to-three new IT systems a week," Hinchcliffe said. "Data integrity is a bit better because it's inherent in the testing of most systems. This means automation is the key to broadly effective data governance. Only the unblinking gaze of digital data detectives can continuously track and identify issues and opportunities, and ensure a safety net.”
McBreen also believes automation is key to success. “We built spoke/hub pipes from all applications (shadow or not) that were important to the enterprise," he explained. "That allowed us to automate rules for cleansing and merging data. Where this isn’t the case, it should produce an error database for stewards to resolve.”
Meanwhile, Seidl recommends figuring out what data is truly critical for your organization. "Define, govern, and manage it. Get it right, make it valuable. Take a step forward, do it again. Build habits around it, build culture around it. Show more value. Make it accessible. Then iterate. This leads to a whole different conversation about systems of record, and which way the stream should flow. Oh, and how many systems of record you can have before something really weird happens."
For data, Seidl says, prioritize your data and key attributes. Concentrate on them first. Remember it is easier to get buy in with a win. He suggests conducting quarterly audits, similar to security audits. "It's just as important to know that your data is high quality,” he explained.
Dion agrees. "It is important to own the problem and prioritize it," he said. "Issues in resolving the overall data governance issues include risk, expense, and ballooned technical debt. Payoffs take 2-3 years, which in today's world is forever.” He recommends starting by making the most critical fixes, including:
Once your data foundations are in place the next step is to monetize your data.
Boomi Director of Enterprise Architecture Mark Clifton says, "Every company operating today is a data-driven company. You have access to a bunch of data on your supply chains, operations, strategic partners, customers, and your competitors that can be monetized. Getting data monetization right requires significant effort and is becoming critical for staying ahead of traditional competitors and new disruptors.”
He recommends that CIOs start by developing a blueprint that considers the varying data sources, offers a process for recognizing value, presents relevant business models, explores commercialization choices, and points to the various challenges to be addressed.
Getting one’s data house in order is not easy. But the work is valuable to your enterprise and business counterparts. The time to begin the process of making data useful and accessible is now. By taking on this mantle, CIOs can add value to all involved in the data value chain and achieve the long-elusive business relevance they need to get ahead.