Setting up a Front-end Data Quality Firewall

[tweetmeme source=”jeric40” http://]

In a project with an international vendor some years ago, I introduced the concept of splitting the Data Quality Firewall (DQF) in a Frontend and Backend Data Firewall. These terms are spreading and I get question on how you should set up the Frontend DQF. Last query was just this week via Twitter. My focus is not on the technical side, but the usability and reward for operatives and companies.

Why is the Frontend DQF important?

I participated in the Information Quality Conference in London, where it was stated that 76% of poor data is created in the data entry phase. Be proactive in the data entry phase, instead of being reactive (sometime, if ever) later will help you a long way to good and clean data.

Elements of the Frontend DQF.

First identify in which systems data are created. It may be in a variety of systems like CRM, ERP, Logistics, Booking, Customer Care just to mention a few.

Error tolerant intelligent search in Data Entry systems.

Operatives have been taught by Google and other search engines to go directly to the search box to find information. When you search in Customer Entry systems, it is very often you do not find the customer. In order to this you need error tolerance and intelligence in your search functionality, as well as the suggestion feature. This will help you find the entry despite of typos, different spellings, hearing differences and sloppiness. This will be the biggest contributor to cleaner data. A spinoff is higher employee and customer satisfaction due to more efficient work.

If you want to learn more about error tolerance and intelligent search, read these posts:
Making the case of error tolerance in Customer Data Quality
Is Google paving the way for actual CRM/ERP Search?
Checklist for search in CRM/ERP systems.

Data Correction Registration Module

If you did not find the customer and the operatives have to enter the data, you have to make sure the data entered is accurate. You can install a module or workflows that checks and correct the information.

Most CRM systems will only find the customer if you do exact searches

Check against address vendors
If you have a subscription with an address vendor, you can send the query to them, and they can supply you with the most recently updated data. You can set up so it is easy for the operative, and the data will be correctly formatted to your systems.

This is quite easy for one country. If you are an international company, the laws and regulations are different from country to country. In addition the price can run up if you want local address vendors in several countries. It is important that you registration module can communicate with the local vendors, then format and make the entry correctly into your database(s)

Correct the Data Formats

You might choose not to subscribe to online verification by an address vendor. There are still many checks you can do in the data entry phase. You can check:

– is the domain of the e-mail valid?
– is the format of the telephone number correct?
– is the mobile number really a mobile number?
– is the salutation correct?
– is the format of the address correct?
– is the gender correct?

Example of registration module

Check for unwanted scam and fraud

You can check against:

– internal black lists
– sanction lists
– “Non Real Life Subjects”

Duplicate check

Even though duplicates should have been found in the search, you should do an additional duplicate check, when the entry is done.

If you incorporate these solutions, you should be able to control that the data you enter is clean and correct. It should be possible to get it from one vendor.  Then you can use the Backend DQF to ensure the cleansing of detoriating existing data.


Could the 5 Billion CO2 Quota fraud been avoided?

[tweetmeme source=”jeric40” http://]

There has been a 5 Billion Euro VAT fraud across Europe in trading with CO2 quotas. I repeat 5 Billion Euros. The Danish CO2 register has been an important part of the trading carousel. The fraud is done by selling quotas across borders, and claiming and not claiming VAT, and letting companies goes bankrupt.

Why did the Danish CO2 register become so important in the fraud?
When companies have registered, the registry has not checked if the information is correct concerning address, phone number , e-mail etc. A result is the example of Alim Karakas, who is registered with the Non existing address” Studsgade 1 in Copenhagen” in both the Danish and French registry. Another example is Avni Hysenaj, who is registered in the Non existing address Engelbolden 26 in Hvidovre. The source of this information is the Danish paper, Ekstrabladet, that broke the story.

In addition companies with prior fraud convictions, and companies which the Danish state has decided to be terminated for missing VAT filings and other irregular actions, have been accepted as quota traders.

The spokesperson of the Danish Registry, tells to Ekstrabladet: ”– We do not have the possibility to check the addresses given to us”. This is quite amazing.  She must believe that the only way to check the information, is that you manually have to check each entry.

How could this be detected?
The registry could set up a Data Quality Firewall. It could automatically check every registrar against reference data available in Denmark. This is not difficult, nor expensive. You would get a red flag when the address did not exist, or if the company was not registered to the given address.

If the law would allow it, you could also check for previous fraud, or if the government had put it up for termination.

Check for Terrorism Funding?
In addition I would have checked against the “Consolidated list of persons, groups and entities subject to EU financial sanctions”. In a scam this big, there might be some terrorists looking for a way to fund their operations. To check against the EU Sanction list is again not difficult, nor expensive.

As EU writes “The correct application of financial sanctions is crucial in order to meet the objectives of the Common Foreign and Security Policy and especially to help prevent the financing of terrorism. The application of financial sanctions constitutes an obligation for both the public and private sector”

In addition I would set up the usual Fraud Detection to look for deviant spellings of names, addresses and for “Non Real Life Subjects”.

I do not say the fraud would be avoided, but maybe it would not amount to staggering 5 Billion Euro.

Increase your revenue by 66% with Data Quality

I stumbled on this interesting article in It covers research from SeriousDecisions about how best practices in Data Quality can boost revenue by 66%.  Best practice in Data Quality has earlier been proven to be a key to success when you implement MDM solutions.

I have tried to show the cost of poor Data Quality, and it is good to show the benefit of optimal Data Quality.

There are several key areas where superior data management can have discrete benefits, according to the report. These follow the SiriusDecisions Demand Creation Waterfall methodology:

  • From inquiry to marketing-qualified lead: It’s most cost-effective to manage data at this early stage, rather than let flawed information seep through the organization. A data strategy that solves conflicts at the source can lead to a 25 percent increase in converting inquiries to marketing-qualified leads.
  • From marketing-qualified lead to sales-accepted lead: Bad source data is compounded by the use of multiple databases and formats, leading to distrust of marketing’s work by sales. Unifying the data, whether into one database or by using technology for virtual integration, can lead to a 12.5 percent uplift in conversion rates to the next stage.
  • From sales-accepted lead to sales-qualified lead: Scoring becomes important at this stage, as the sales team goes to work on the leads it can use — and returns others to the marketing team for further nurturing. Clean data can reduce by 5 percent the time spent conducting the kind of additional research that precedes initial contact with a prospect.
  • From sales-qualified lead to close: The benefits seen between sales qualification and close magnify those accumulated during the previous stages, as salespeople continually update the status and disposition of the potential customers. “Given that the average field-marketing function spends no more than 10 percent of its budget in support of this final conversion, accurate data is a must for applying the right tools and resources to the right audience at the right stage of the buying cycle,” Block writes. A single system of record to keep marketing and sales on the same page — cultivated by timely updates by all involved parties — is critical.

The impact of these abstract concepts — the true value of data management — becomes quite clear as soon as real numbers are applied: From a prospect database of 100,000 names, an organization utilizing best practices will have 90,000 usable records versus a typical company’s 75,000; at every stage thereafter, the strong company has a larger pool of prospects with a higher probability of closing. In the end, SiriusDecisions can show 66 percent more revenue for the company with high-quality data management.

This shows me that the Data Quality Firewall and the new concepts I introduced in September 2008, is the best way to optimize the data. The earlier you detect and correct poor data, the higher your revenue will be.

Another article about ignorance of Data Quality

There is another article in IT-Pro which deals with the ignorance of data quality issues.

Some excerpts that make you think:

Nearly two in every three (58 per cent) of UK executives surveyed said they could not confirm that a documented strategy exists to keep their contact data accurate and up-to-date.

Despite this, nearly all (96 per cent) recognise that inaccurate data has a direct financial impact on their operations, with 19 per cent admitting to it having a negative impact on revenue or funding.

Another interesting factor they mention is this:

Only eight per cent of organisations validate all the information they collect, where 34 per cent validate none of the information they collect and enter into their systems.

A good solution might be to install a proper Data Quality Firewall.

Introducing new Thoughts and Concepts of Data Quality Firewall

Data Quality Firewall (DQF)

I have lately worked on a project including the Data Quality Firewall of an international corporation.  In this process we have tried to see what is the best set-up og the DQF and where it should be placed.

Since there may be several definitions of a Data Quality Firewall, I use the definition in Wikipedia:

“A Data Quality Firewall is the use of software to protect a computer system from the entry of erroneous, duplicated or poor quality data. Gartner estimates that poor quality data causes failure in up to 50% of Customer relationship management systems.

Older technology required the tight integration of data quality software, whereas this can now be accomplished by loosely coupling technology in a service-oriented architecture. (SOA)”

The New Concept of the Data Quality Firewall:
The firewall will be set as a workflow process that will do all necessary checks and interpretation to allow only the correct and accurate data to be entered into the database. The workflow will be set up as an integrated process across different systems and databases, based on SOA.

The Data Quality Firewall can be set up at different places to serve different needs. We will also introduce new concepts in the Data Quality Firewall thinking:

A. “Backend Data Quality Firewall” the most common used today
B. “Frontend Data Quality Firewall” set up in the data entry phase
C. “Double Data Quality Firewall” which will ensure the best data quality.

The Data Quality Firewall could include processes like:

A detailed status may be created per record; the detailed status may be analyzed and summarized into a status overview.

All the correction could be made into a interactive report to the supervisor.

Backend Data Quality Firewall

This is the most common used Data Quality Firewall in the market today. The checking is not done in the Data Entry phase, but when the data is transferred from temporary databases to the Master Database.

Even though faulty data is entered in the data entry phase, the Backend Data Quality Firewall will be set up to prevent the irregular data is entered into the Master Database or other relevant databases. The workflows will be set up individually towards the customer, to optimize the firewall according to the nature of the data from each individual system, operator, web and customer service.

The reason for setting up the Backend Data Quality Firewall first is to put the protection as close to the Master Data as possible.

Frontend Data Quality Firewall

As mentioned, the reason for setting up the Backend Data Quality Firewall first is to put the protection as close to the essential data as possible. The challenge with this is that you put the firewall away from where the dirty data is created. The dirty and faulty data is often created in the data entry phase. The reason for this can be many:

  • Operatives cannot find the right customer and re-enters it
  • In a high commission based business, operatives fight for their customer. Customers can be entered with a twist in the name accidentally or on purpose. Either way the operatives will fight for the ownership and commission.
  • Inaccurate or incomplete data can be entered in required fields, just to move on to the next customer

If the Firewall is put in the data entry phase, the amount of dirty data will be drastically reduced. The Firewall will consist of the workflows individually set up to each center/country and FACT-Finder Address Search. The results will be:

  • With the error tolerant search with FACT-Finder it ensured that the operatives find the right customer instantly. This will save time in the search, no need to spend time on register the customer a new. Operatives will get higher job satisfaction and be able to increase the number of calls they can receive.
  • If an operative tries to enter a customer with a little twist in the name they will get a message saying “possible duplicate found, do you wish to continue”. From that window, the operative may jump directly to the found duplicate, to continue working with that identified duplicate.If the operative overrides this message, it can be difficult to argue that it was by accident. This will lead to less infight between operatives, higher job-satisfaction and less double commissioning.
  • Workflows will be set up to check the incomplete and inaccurate data.
  • A monitoring service can also be set up to see if the operatives use the tool available for them. If an operative overrides a duplicate with a “secure match” this action could be logged or sent to a data quality steward to check the quality of matching or the quality of work of the operative.

FACT-Finder Address Search will be implemented in such a way that it is an integrated part of the CRM/ERP system, either it is a self developed or if it comes from outside vendors like, SuperOffice, Microsoft CRM, Siebel or others.

With the Firewall set up in the data entry phase the data that will be sent from the CRM/ERP system will be considerably cleaner. To set up individual workflows will be easier and more secure if is optimized in the centers of Data Entry.

A Double or Multiple Firewall(s)

In the old days you set up more than one wall in the castle to protect yourself. Our idea is to put down a Data Quality Firewall where different needs to be addressed in different ways.

One could set up the 1st firewall in the frontend and optimize the workflows in the Firewall to deal with the challenges that comes in the data entry phase. The 1st Firewall can have interaction with the user and is therefore a powerful solution for data quality (the human factor) Challenges addressed will be:

Correct basic errors as

  • Finding the right customer
  • Incomplete data
  • Data in wrong field – Examples: First name in last name field and or mobile number in the fixed net field.
  • Right salutation and gender.
  • If a duplicate is entered in spite of the search function, the duplicate will be matched to the original record, with a notification that to a supervisor that there was put in a duplicate.

The 2nd firewall will be set up in the backend, and the workflow of this Firewall will be set to deal with the challenges that come in the transfer of the large amount of Data from the front end to the Master Database or optimized databases for CRM, ERP or other specialized system database. It will be working without interaction from users.

Focus of the 2nd Firewall:

  • Settling the “Echo Problem”
  • Building the Customer Hierarchies
  • Worldbase Matching
  • Advanced PAF cleansing and De-duping

The Double Firewall would be highly efficient and would provide the best ROI and results of the solutions.

If you have thoughts and ideas about these concepts, please feel free to contact me!