Quantcast
Channel: Better Data Quality by Dario Bezzina » data
Viewing all articles
Browse latest Browse all 2

You say potato and I say potahto – Part 1

$
0
0

One problem when trying to convince people to improve their data quality is that the word itself, Data Quality, is well known, used but also abused in our organizations today. For any other buzz word this would be positive but not for the word data quality. Actually the term has been around for so many years that it has in a sense lost its original meaning, and many take it for granted. This poses a problem when people think they are handling their data quality correctly when in fact they are not.

When I meet customers for the first time I usually ask them if they are currently working with data quality and if its important to them. The answer is always a ”Yes we do, data quality is very important to us”. When I proceed to ask them how they define data quality in their organization I always get different answers depending on who’s attending the meeting. The business user’s definition relates mainly to the contents of the data while the IT user will talk about primary keys, referential integrity, null values and so on. Although they both speak about data quality they mean different things. Even within the two communities you will see differences in definitions, sometimes because there are so many business rules complicating things. 

To make things worse, and sorry for being a bit provocative here, most Business Intelligence consultants don’t have the appropriate knowledge or tools to help their customers with a true data quality initiative. Some will however acknowledge the importance of data quality and try to the best of their abilities to deal with bad data using whatever tool they have available, in their pursuit of a satisfied customer. Their definition of data quality will usually relate to any data that is causing their code, reports or ETL (Extract Transform and Load) jobs to fail, correcting the data until it works (all my respect to consultants that don’t work like this).
But for the most part the quality of the customer’s data is something they want to stay as far away from as possible. As you see we already have three potential definitions of the word data quality here!

Back to the definition: One perspective is what expectations you have. For example IT are like UPS. They are responsible for delivering a package to the right recipient. The business users don’t care about how IT manage their logistics as long as the contents of the package are intact, delivered in time and correspond to the order. IT on the other hand don’t care about the contents of the package. The business however expects IT to open up the package and control the contents. They also expect IT to make sure that the contents correspond to what the business ordered. You might think that this is a simplified version of the truth? You would be surprised how many companies I have met that have exactly this problem, both the business and IT speaks about data quality but mean different things.

One of the first things I do at my workshops and seminars is to break down the word data quality and present the true meaning of it. This resets the minds of the audience and it becomes much easier for them to relearn what they thought they knew about data quality.

  • I start by explaining the difference between data and information, and between data quality and information quality. Most people know the difference but surprisingly many people need to hear it again.
  • I proceed to explain exactly what typical data quality problems both the business and IT are experiencing and why we often see them blaming each other. A data quality problem for IT may not be a problem for the business.
  • Finally I explain about the six different categories or dimensions of data quality. These make it much easier to agree on a common definition throughout the organization.
    1. Completeness
    2. Conformity
    3. Consistency
    4. Duplication
    5. Integrity
    6. Accuracy

These simple steps are extremely helpful when bringing the business and IT together to discuss data quality. In fact any data quality initiative should start with this very obvious but important task, to find a common definition for data quality.

Oh by the way, the actual lyrics to Let’s call the whole thing off by Louis Armstrong are “You like potato and I like potahto”…


Viewing all articles
Browse latest Browse all 2

Trending Articles