Do you remember your last Easter egg hunt? Some eggs were easy to find. For others, you had to search longer and maybe you needed hints to find the eggs. But, it was worth searching until every egg was found. Otherwise, you ended up with fewer sweets or, if they were real eggs, they would rot in the summer heat.
It is the same with errors hidden like Easter eggs in your data. If you do not find them, you may miss out on potential profits or cause a problem worse than just bad odor.
Some data errors are easy to detect and can be avoided right at data entry. If you define the field format correctly, some errors are easy to avoid. Fields for numeric values should only allow numbers. E-mail addresses should always contain a ”@”, at least one period in the domain name, and some other technical items which can be checked. To be able to do these field checks wherever possible, you should provide separate fields for each kind of information. This allows you to verify the problems correctly. Having the company name and legal form allows you to use a list of values for the allowed entries in each field. Only if you have all elements of an address stored separately can you check them. For this you might need a little bit more logic because the correctness of a ZIP code, for example, depends on a specific country.
Some errors can’t be avoided entirely on entry but with the right tools you can check for them and correct them after entry. For instance, you can check lesser-used values for particular attributes. If you find ”ultramarine” only once, maybe you want to change it to ”navy” or just ”blue” so that your wording is consistent with your other attribute usage. If you use a list of values and the most used entry is ”others,“ you should think about adding further values to the list. This enables the data to be found easier and your reporting to be more meaningful.
Sometimes you need the help from others. When checking address data, you can use tools like Loquate or you can use Google Maps right on the data entry. For product data, you can use data pools like GS1. European tax numbers can be checked via an interface to the Commission's web site.
The difference between an egg hunt and your data is in duplicates. Finding similar eggs is no problem. The more eggs the better. But, for your data, you seek for uniqueness and a single version of truth. Otherwise you impact your sales reporting and big data analytics. Having a customer twice in the system ruins the single customer view and undermines your credit limit checks. Having suppliers stored twice might cost you rebates. And even customer and supplier can be just roles of the same party which should be consolidated to avoid trouble. Having duplicate materials or products means waste of storage space, unneeded out of stock situations or confusion for your customers.
To solve the problem you can use deduplication tools, which work well with standardized data. An additional help could be to classify your material according to catalogues like eclass or UNSPC. You can consolidate data from different data sources in a golden record. When you have set up rules for survivorship this can be done by your Master Data Management system automatically.
You see, there is a lot to do but having your data clean lets you enjoy the Easter holiday and go lighthearted on the egg hunt.