How to Master the Quality of your Data: Challenges, Methods and Tools
Data is becoming increasingly widespread and extensive in today’s digital environment. Companies’ usage of data is also expanding rapidly to match their growing digital presence. Based on forecasts by Statista, the international statistics and market data portal, 74 zettabytes* of data will be created worldwide in 2021, compared to 59 in 2020.
Although it seems within the reach of any company to collect data, it is still necessary for this data to be structured, qualitative, secure and easily accessible internally to drive revenue and growth. Let’s look at the criteria for data quality, the methods for evaluating and managing data, the stakeholders with whom to engage and the best tools to use.
*To give you an idea of the magnitude, one zettabyte = 1 000 000 000 000 000 000 000 bytes.
What is data quality?
Data quality is a measurement of the state of data based on several criteria: accuracy, completeness, integrity, timeliness, coherence and compliance. These criteria aim to facilitate organization-wide management and decision-making, in full compliance with current privacy regulations:
- Accuracy: Does my data reflect reality over time? Are the values returned reliable?
- Completeness: Is my data being collected the way I want? Do I have all the data I need to make informed decisions?
- Integrity: Is my data free of errors? Are the values readable and properly formatted?
- Timeliness: Is all my data available at the right time? Does it allow me to react in real time?
- Coherence: Is my data consistent across platforms? Is the information centralized in a reliable way? Do all my employees have access to the same data?
- Compliance: Is my data usage compliant with the GDPR? Does my digital analytics provider put me at risk of financial penalties under European law?
The importance of quality data
With the accelerating digitalization of companies and the emergence of new digital pure-players over the last decade, more and more digital actors are adopting data-driven marketing strategies. The implementation of data-driven or data-informed policies aims to respond to the need for ultra-personalization of messages. This allows businesses to stay one step ahead of competitors, better anticipate market fluctuations, enhance decision-making and continuously improve performance.
It is no longer necessary to demonstrate the value of collecting, analyzing and using data. On the other hand, despite the democratization of web analytics in all sectors of activity, the stakeholders who handle data still tend to focus on quantity as opposed to quality, despite the rise of data minimization principles that have been changing our behavior for the last few years.
When processing large volumes of data, the quality can be affected by a range of factors:
- Overestimated measurement caused by bots
- Traffic data blocked by ad blockers
- Faulty measurement due to errors in the tagging plan
- Overestimation of conversions due to poor source attribution
- Traffic data not excluded despite users not consenting to cookies
- Some traffic not measured due to data sampling
And these mistakes can have far-reaching consequences for a company's business, such as:
- Damage to the brand image
- A loss of customer confidence
- Reduced revenues and business opportunities
- Loss of time and additional management costs
- A decrease in the ROI of marketing and sales actions carried out
- Pollution of all analytical projects
- A decrease in confidence among the company's own employees
- Financial penalties in case of non-compliance with privacy rules
This is why it’s essential to be meticulous during the data collection phase, which can be affected by each update or development on the website or mobile application.
How do you manage and improve data quality?
When adopting a data quality approach, organizations are faced with a double challenge: to integrate accurate data into their information systems and to eliminate or correct all the errors identified, such as incomplete data, outdated data, inaccurate data, non-secure data or data that does not comply with current regulations. And these errors, whether from technical or human factors, can occur at any stage of the data lifecycle, such as during:
- Collection, due to errors in the tagging plan
- Data-sharing, due to the existence of several versions and possibly different calculation methods
- Export, due to incompatibility of information systems
- During technical maintenance due to side effects
Here is a 5-step method to continuously improve the quality of your data:
Step 1: Define your data quality scope
First, it is essential to define a clear framework for your data quality approach. Depending on your company's objectives and the information that is useful for driving its sales and marketing strategies, you will be able to map all the data needs of your teams and determine all the relevant contact points in your users' journeys. The goal is to rationalize and focus your efforts on the important data that will help you manage your digital activity.
Step 2: Audit your database
This phase consists of profiling your data. You will have to make sure that your databases do not have any anomalies and are complete. Do all your contacts have an email address and/or a telephone number? Are they correct? Are the first and last names correct? Nothing should stand in the way of a successful marketing automation process or e-mail routing. If you carry out this audit thoroughly, you will be able to define an action plan and make recommendations on the rules for creating and maintaining your data.
Step 3: Clean up your databases
When handling multiple data sources, data sets can become "contaminated" by various types of errors: bad syntax, spelling mistakes, empty fields, faulty tagging, duplicate information, etc. Data cleaning consists of deleting all duplicates, outdated, corrupted or incorrect information and poorly formatted data. This exercise allows you to work on a sound basis in the future, to avoid altering the analysis results and to optimize the enrichment phase. This step can be recorded in order to study the origin of the errors and to better monitor them.
Step 4: Re-import the dataset, check and validate
Once the data cleaning process is complete, you must ensure that your dataset is actually properly cleaned and standardized. We advise you to provide a nomenclature import file to avoid errors or crashes during the next data changeover in your information system. It is often difficult to achieve a totally efficient cleaning. Micro-errors can always slip through the cracks. So be prepared to perform a second cleanup of your dataset if necessary.
Step 5: Maintain data quality efforts for the long term
It is important to remember that data quality is an approach that must be sustained to guarantee the reliability of your data over time. If a data item contains an error, study it, correct it, record it and then adopt the appropriate rules so that it does not happen again. Also make sure that all your employees who create or handle data on a daily basis are aware of following the rules of hygiene, whether it is when creating the tagging plan or when processing the data. Ideally, you should set up a data governance body, even a small one, to monitor the efficiency of your processes on a permanent basis.
What are the major roles in data quality?
Within your organization, it is critical to make all data stakeholders aware of the "quality" factor. Everyone involved (internal employees, service providers or partners) must have the same objectives and be moving in the same direction to successfully implement the data quality strategy.
To get the most out of data, we recommend that you set up a data governance system led by a chief data officer (CDO). This individual, who guarantees the smooth operation of the processes, is responsible for deciding, mediating and planning all the organization's data projects as well as managing the various roles related to this program.
Depending on the size of your organization and the resources available, data governance can include the following roles:
- Data owner: They are the owner of a specific data set or collection for a particular business unit. This person must ensure that processes are followed to guarantee the collection, security and quality of the data. For example, a marketing director may be the owner of customer data, the HR director of internal company data and the CFO of financial data.
- Data steward: This data coordinator is responsible for organizing and managing the content, formatting and normalization of data. They can correct data or give it the status of "reference data." They usually work in partnership with data engineers/scientists/analysts.
- Data custodian: The data custodian is more of an IT role. They manage the technical environment to ensure the appropriate lifecycle of the data, from maintenance to data storage to access rules.
In addition to these roles, which are at the front line of data quality, there are more secondary or cross-functional roles such as data or business analysts, multiple data consumers (product owner & manager; marketing manager; content & community managers; UX designer, etc.) or even the data protection officer (DPO), who ensures the rigorous application of data protection regulations (GDPR, CCPA, CNIL guidelines).
What are the data quality management tools?
To optimize and streamline the time spent by company employees in managing data quality, you can equip yourself with specialized tools. Software can be used to thoroughly cleanse your data, manage matches and remove duplicates, validate new data, verify contact information or perform complete profiling. Be sure to research these solutions and compare them before you start, as they are not always efficient or adapted to your use cases or technical configuration.
We recommend that you develop a master data management (MDM) system for every business scenario. This is a set of tools and methods for storing, managing and distributing the company's reference data. It is a single repository that centralizes all data to simplify and secure the sharing of data among the various business units.
Finally, don't forget to choose a high-performance analytics solution that offers you reliable, non-sampled data. At Piano, we believe that this data control process should be extremely rapid and easy to supervise, verify and correct. For this, we offer several tools dedicated to data quality:
- Tag Crawler allows you to check the presence of your tags thoroughly and accurately on all the pages of your website, especially before and after each functional update or debug.
- Tag Inspector allows you to proceed to a targeted verification of Piano tags on certain pages of the site, to test them and, if necessary, debug them.
- Data Manager allows you to correct, enrich or exclude some of your data from a simplified interface, using personalized data processing rules. You no longer need to touch your code to correct the recovered data.