The Path to Insight: Making Data Clean, Usable, and Accessible

May 8, 2020
The Path to Insight: Making Data Clean, Usable, and Accessible

In #CIOChat, we’ve talked a lot over the years about the importance of data to business outcomes. The topic of this blog is what it takes to make data clean, usable, and accessible — a.k.a data governance. The CIOs referenced here have different backgrounds and experience, so their opinions on the subject vary. But they all agree that without solid data governance the path to insight through data can be elusive and hazardous.

It’s also worth noting that for many Global 500 organizations, it’s not just the CIO who is involved with data. The number of Chief Data Officers (CDOs) is also on the rise. Forrester Principal Analyst Jennifer Belissent, Ph.D., notes that the analyst firm's 2019 data and analytics survey reveals 58 percent of respondents have appointed CDOs, and another 26 percent are planning to do so. A similar trend is also emerging in small to medium-sized businesses.

Belissent points out that since organizations now understand the value of data, they next have to figure out how to use it for business insights.

So, where are organizations in their data-to-insight journeys? And, where are the gaps?

One thing is certain: whether CDO or CIO, these leaders want their data scientists spending less time making data useful by cleaning and munging it. That’s not a good use of scarce — and expensive — data science labor.

So let’s start there.

What Percentage of a Data Scientist’s Time is Spent Prepping Data?

StarCIO President and former CIO Isaac Sacolick reminds us that “data always starts messy and so always requires some data janitorial work.” Sacolick has seen surveys that suggest data wrangling, cleaning, and discovery can take 30-40 percent of a data scientist's time. That’s quite a bit, but he believes those estimates may even be on the low side.

Saint Peter’s University CIO Milos Topic suggests that “a lot of the reason for time spent on this is due to a lack of organizational standards, data ownership, and data accountability.” Data stewardship and data governance are essential to lessening the load on data scientists. But, if not the data scientists, who does the heavy lifting?

Former CIO Tim McBreen thinks it would help to make “data quality a part of governance by creating dedicated teams aligned to business units working on supply chains for data.” With the right tools, analysts in these groups could evolve into “citizen” data scientists, liberating the professional data scientists from mundane data prep.

Lost in the Data Forest Without a Map

Sacolick believes the biggest hurdle organizations face in obtaining clean, usable, and accessible data is not knowing what data they have, where it resides, and what it’s used for. Antiquated systems, lack of training, and an insufficient understanding of the importance of data integrity exacerbate the problem.

The solution: a data catalog.

“Many CIOs can, unfortunately, tell you more about the box data lives on than what's inside the box and how it can be used to accomplish business outcomes,” Sacolick comments.

McBreen adds another factor he’s seen all too often. “Organizations try to fix data too late in the data usage chain,” he explains. “They fix where they notice it instead of finding the source of the error. This wastes time and leaves the root causes untouched.”

Read our blog post "You Have All the Data You Need. Now What?" to learn more about the path to becoming a data-driven organization.

How Big a Problem Is Discovering and Cataloging Data?

Dark data is data that’s stored — often forgotten — and not used by the enterprise to drive insights or decision making. It represents a threat as well as a missed opportunity. Storing the data could drain resources that affect the performance of a key business process, or the data may contain sensitive information that requires tight security. Those are threats. The opportunity is simply finding out the value the data may have.

Topic believes discovery is not the primary issue. He says, “The most important issues are clarity, definitions, and ownership. Who does what, how, when, and why.”

Verizon Media CIO Ben Haines adds, “Doing this is relatively easy when all you have to think about is ERP data. But there are many more silos of data that are valuable.”

The First Step In Establishing Consistent Data Governance: Executive Leadership

So, data governance is important. Check. It needs to be a consistent process that takes the onus of data preparation off the professional data scientist. Check. How does that happen?

Topic believes the first step is getting executive leadership and engagement. If the CIO tries to bully people into a data governance process, the effort will fail. Sacolick agrees and offered a proactive data governance model shown below.

The diagram illustrates the practices that make up proactive data governance and their relationship to how data is used for insight, action, and competitive advantage. That comprises the feedback loop that shows the value of data governance to the business.

Making Data Useful Starts with Data Governance

As a group, CIOs are painfully aware that making data useful starts with data governance and the processes that support it. And when a company's culture, people, and processes are ready, CIOs can turn to Boomi for help. With Boomi Data Catalog and Preparation, part of the Boomi Platform, to connect, cleanse, and integrate data, the entire organization will experience productivity gains.

For more information on Boomi Data Catalog and Preparation, watch our recorded webinar "Discover, Understand, and Integrate Your Data for Better Outcomes."

About the Author

Myles Suer is Boomi's global enterprise marketing manager.