Software supermarkets or specialized vendors?

[tweetmeme source=”jeric40” http://]

Some customers are worried about the complexity of the data quality challenges. Understandable, since they can hear about a lot of projects that has failed. There are several reasons for this. An important factor is using right tool. If you want to get a nut to loosen from a bolt, you might use several tools.

Some use pliers, almost a sure failure.  Others use an adjustable wrench.  This might work, but if it is not tight enough, there is a big possibility that the nut will be rounded and suddenly will become very hard to get off. It will create a lot of extra work, and the bolt cannot be reused.  If you use the correct wrench, the nut will come off, and it can be reused.

Let me transfer this to the Data Quality world.  Way back I was approached by a prospect.  They had a MDM project had hit the wall.  They had spent over a year so far.  They had used all the products from the Microsoft Enterprise Supermarket, and built some elaborate business rules together with external consultants.  In this way they had found about 23% of duplicates.  Their problem was that they could see there were more, but they could not catch them.  They sent the data over, and a couple of days later we could send the results back, with an additional 27% of duplicates on the cleansed data.  This shows me the power of specialized tools.

This is the result from Google when I searched for Data Quality Tools

There are some great software supermarkets out there.  These often offer excellent and good products.  Customers often want as few vendors as possible, and one stop shopping if possible.  You will get special competence on this vendor’s product, and can be cost efficient.  One challenge with this strategy is that you might miss out on a special product that can be critical for your success. An example: Gartner estimates that 50% of CRM/ERP installations fail, due to poor data quality and integration.

Data might be the most important asset in your company; don’t you want the best available product to handle this precious asset?  If I had the king over for dinner, and I knew that it was a 50% chance of failure if I used the meat from the supermarket,  I would definitely go to the best butcher in town.  90% of the purchase would be in the Supermarket and the lat 10 from specialized vendor.  One thing does not exclude the other.

These challenges is not only in the Data Quality field, but is as much present in the e-Commerce world.

Setting up a Front-end Data Quality Firewall

[tweetmeme source=”jeric40” http://]

In a project with an international vendor some years ago, I introduced the concept of splitting the Data Quality Firewall (DQF) in a Frontend and Backend Data Firewall. These terms are spreading and I get question on how you should set up the Frontend DQF. Last query was just this week via Twitter. My focus is not on the technical side, but the usability and reward for operatives and companies.

Why is the Frontend DQF important?

I participated in the Information Quality Conference in London, where it was stated that 76% of poor data is created in the data entry phase. Be proactive in the data entry phase, instead of being reactive (sometime, if ever) later will help you a long way to good and clean data.

Elements of the Frontend DQF.

First identify in which systems data are created. It may be in a variety of systems like CRM, ERP, Logistics, Booking, Customer Care just to mention a few.

Error tolerant intelligent search in Data Entry systems.

Operatives have been taught by Google and other search engines to go directly to the search box to find information. When you search in Customer Entry systems, it is very often you do not find the customer. In order to this you need error tolerance and intelligence in your search functionality, as well as the suggestion feature. This will help you find the entry despite of typos, different spellings, hearing differences and sloppiness. This will be the biggest contributor to cleaner data. A spinoff is higher employee and customer satisfaction due to more efficient work.

If you want to learn more about error tolerance and intelligent search, read these posts:
Making the case of error tolerance in Customer Data Quality
Is Google paving the way for actual CRM/ERP Search?
Checklist for search in CRM/ERP systems.

Data Correction Registration Module

If you did not find the customer and the operatives have to enter the data, you have to make sure the data entered is accurate. You can install a module or workflows that checks and correct the information.

Most CRM systems will only find the customer if you do exact searches

Check against address vendors
If you have a subscription with an address vendor, you can send the query to them, and they can supply you with the most recently updated data. You can set up so it is easy for the operative, and the data will be correctly formatted to your systems.

This is quite easy for one country. If you are an international company, the laws and regulations are different from country to country. In addition the price can run up if you want local address vendors in several countries. It is important that you registration module can communicate with the local vendors, then format and make the entry correctly into your database(s)

Correct the Data Formats

You might choose not to subscribe to online verification by an address vendor. There are still many checks you can do in the data entry phase. You can check:

– is the domain of the e-mail valid?
– is the format of the telephone number correct?
– is the mobile number really a mobile number?
– is the salutation correct?
– is the format of the address correct?
– is the gender correct?

Example of registration module

Check for unwanted scam and fraud

You can check against:

– internal black lists
– sanction lists
– “Non Real Life Subjects”

Duplicate check

Even though duplicates should have been found in the search, you should do an additional duplicate check, when the entry is done.

If you incorporate these solutions, you should be able to control that the data you enter is clean and correct. It should be possible to get it from one vendor.  Then you can use the Backend DQF to ensure the cleansing of detoriating existing data.

Is Google paving the way for actual CRM/ERP Search?

[tweetmeme source=”jeric40” http://]

Since 1992 CRM has been a big help in structuring my sales. A consisting challenge I’ve had with various CRM systems, was to find the customer in the system. I might have heard the customer’s name wrong, spelled it wrong, did typos, or my search with *.* did not work.  Frustrating, since I knew the customer was in there.

Most CRM systems will only find the customer if you do exact and correct searches

It was even more frustrating when I worked in a big computer company, and my coworkers entered my customer with slightly different spelling, and stole my commission.

When you do not find the customer, you enter them again and the Data Quality mess starts:

–          Wrong reports and predictions

–          Frustrated workers

–          Frustrated customers

–          Higher cost, lower conversion rates and your image take a hit.

3 years ago I predicted that search would become an important part of the CRM. With an error-tolerant search, you could find your customer instantly. I was wrong then, but finally the trend is here.  This year I have already have had several inquiries about CRM search.

Google is transforming CRM Search

Google has transformed the way we use the web the last 10 years.  We now automatically go to the search box, to find the information we want.  Not only on the world wide web but also:

–          In online shops

–          Intranets

–          Onsite search on government sites

–          Facebook has moved its search in center, acknowledging this is the best place.

“Our employees are so frustrated that they cannot find the information in the CRM system” is the reason I hear when they contact me for better CRM search.

I think we will see this in the future of CRM search:

–          One central search box, that search across all fields

–          Will be error tolerant

–          Have suggest feature, to help you complete the search

–          Will be connected to external vendors to automatically import customers if it is not already in the CRM system.

You can see my CRM Search Checklist here, and a demo video I have made about the future features of the CRM Search.

Most poor data is created in the data entry phase. With a good search and registration system, this can be minimized. A

Do you agree with my view on the CRM Search future?

Could the 5 Billion CO2 Quota fraud been avoided?

[tweetmeme source=”jeric40” http://]

There has been a 5 Billion Euro VAT fraud across Europe in trading with CO2 quotas. I repeat 5 Billion Euros. The Danish CO2 register has been an important part of the trading carousel. The fraud is done by selling quotas across borders, and claiming and not claiming VAT, and letting companies goes bankrupt.

Why did the Danish CO2 register become so important in the fraud?
When companies have registered, the registry has not checked if the information is correct concerning address, phone number , e-mail etc. A result is the example of Alim Karakas, who is registered with the Non existing address” Studsgade 1 in Copenhagen” in both the Danish and French registry. Another example is Avni Hysenaj, who is registered in the Non existing address Engelbolden 26 in Hvidovre. The source of this information is the Danish paper, Ekstrabladet, that broke the story.

In addition companies with prior fraud convictions, and companies which the Danish state has decided to be terminated for missing VAT filings and other irregular actions, have been accepted as quota traders.

The spokesperson of the Danish Registry, tells to Ekstrabladet: ”– We do not have the possibility to check the addresses given to us”. This is quite amazing.  She must believe that the only way to check the information, is that you manually have to check each entry.

How could this be detected?
The registry could set up a Data Quality Firewall. It could automatically check every registrar against reference data available in Denmark. This is not difficult, nor expensive. You would get a red flag when the address did not exist, or if the company was not registered to the given address.

If the law would allow it, you could also check for previous fraud, or if the government had put it up for termination.

Check for Terrorism Funding?
In addition I would have checked against the “Consolidated list of persons, groups and entities subject to EU financial sanctions”. In a scam this big, there might be some terrorists looking for a way to fund their operations. To check against the EU Sanction list is again not difficult, nor expensive.

As EU writes “The correct application of financial sanctions is crucial in order to meet the objectives of the Common Foreign and Security Policy and especially to help prevent the financing of terrorism. The application of financial sanctions constitutes an obligation for both the public and private sector”

In addition I would set up the usual Fraud Detection to look for deviant spellings of names, addresses and for “Non Real Life Subjects”.

I do not say the fraud would be avoided, but maybe it would not amount to staggering 5 Billion Euro.

Error tolerance on Hollywood Walk of Fame

[tweetmeme source=”jeric40”

Hmmm how is her name spelled?

Julia Louis-Dreyfus or Julia Luis Dreyfus ?

The correct answer is number one, but on her new star on Holywood wlak of fame or, was it Hollywood Walk of Fame they used the 2nd one.   First time the concept of  Error Tolerance was introduced on the walk

Typos and misspellings in the data entry phase is maybe the most important reason for poor data quality.  If I put Julia Louis-Dreyfus through a typo generator takes care of the most common typos like skipping letter, use double letters, reverse letters, skip spaces, misses keys or insert keys, you get 319 different variation of her name.   Some random examples: Julia Lois-Dreyfus,  Julia Louis-Dryefus, Julia Louis-Dryfus, Julija Loujis-Dreyfus and so on.

Do you Customer Search and Data Quality systems cover this?

Read more here in English and here in Norwegian

Royal Wedding Data Quality Challenges

[tweetmeme source=”jeric40”

Now we have the proof, having blue blood in your veins, is not a vaccine against the challenges of Data Quality.  H.M Carl XVI Gustaf has now experienced this the hard way.  His daughter H.R.H Crown Princess Victoria will marry Daniel Westling in Stockholm on June 19th 2010.   The wedding will be grand with many guests from near and far.

One invited guest is not welcome, a journalist Håkan Kjellberg was invited.  The invitation has now been retracted, and he is no longer welcome in the wedding.  No, he has not written about Royal scandals or anything like that, he is just the wrong Håkan Kjellberg.  The right Håkan is a colleague of Mr. Westling and work in his Gym.

It is not so good when you think about the security the Royal Family is set under. This common mistake could have been avoided with the proper Data Quality checks in order.

Here is an article about it in Swedish, and one in Norwegian.  You can use Google Translate to get the meaning!

Anyway, congratulations to the happy couple!

Detecting Scam and Fraud

[tweetmeme source=”jeric40”

It was a big story in the news yesterday, how Norwegian Air Shuttle was scammed by their Competitor Cimber Sterling.  Norwegian had put out super bargains in the Danish market to attract new customers. Cimber employees bought huge number of tickets in the name of Anders And (Danish version of Donald Duck), Alotto Fagina (Character from Austin Power movies) and Bjørn Kjos, the CEO of Norwegian.  The result was that Norwegian had a tremendous amount of no shows.

The question arises, could this be avoided?

The simple answer is yes, it is quite simple.  With an automated Data Quality Solution, this whole scam would have been stopped in the making. I don’t know the full extent of the Cimber Scam, but here are some of my assumptions based on what has come out in the media.  Here are some red flags that should have been raised:

Red flag 1. “Non Real Life Subjects”
Data Quality solutions are set up to find names people use to trick you it can be:

–          Cartoon Characters like Donald Duck, Batman and Superman

–          Film Characters like Alotta Fagina and Clark Kent

–          Random letters.  Aaaaa Bbbbb and Eeeeff Ghhhh

The DQ solution would have cleansed out these orders.

Red flag 2. Multiple use of same credit card.
It is said that a Cimber Sterling employee used the same credit card to book several hundred ticket over short period of time. This should have raised concerns in the fraud detection team, and be manually checked.

They should also cross check information.  Is it Natural that their CEO will use a Danish Credit Card to book his flights?

Red flag 3. Duplicate Check
DQ solutions run duplicate checks.  Unless Cimber employees have a very vivid imagination in inventing names, I am sure they must have used some names over again.   This should have been caught while ordering.

The Cimber Employees might have used a very common way of committing fraud.  Use variations and typos of the names.  Unless you have an error tolerant solution, it is difficult to catch these. In this post, I have explained how the name Christopher Quist can be written in almost 400 natural versions.

Red flag 4. Sanction List
I just mention this, because I am sure Norwegian has a system for this because of legal requirements.  In DQ solutions we also check according to the EU sanction list, to see that terrorists like Osama Bin Laden and other non wanted individuals cannot purchase from you.

It is normal that when you get an order to check all above red flags.  Most red flag warnings will be handled by an automated process.  When there are dubious entries, it will go to a operator that will handle this manually.

A Data Quality Solution is a cheap way to insure your company against such scams as encountered by Norwegian.  I think we will see more stories like this in the news, since most companies have not have focus on Data Quality.  I have written more about this post.

Most companies are not aware of the high cost of bad data quality, I hope this scam will help rise the awareness.

Here is my post from yesterday about the scam

Here is an article in Norwegian about the scam.

The scam continues. Here is another article in Norwegian

Donald Duck tricked Norwegian Air Shuttle

[tweetmeme source=”jeric40”

The competition in the air is fierce, and Donald Duck entered the fight this weekend.

Norwegian Air Shuttle ( has entered the Danish market for international flight successfully.  Their next step is trying to be a big domestic player in Denmark.  They offered several thousand tickets for 1.DKK a piece.  They were instant sold out.

It seems it is their competitor Cimber Sterling that has bought most of the tickets. On one flight there were 118 “no shows”.  One no show seems to be “Anders And” which is the Danish name of Donald Duck.

Another no show was Alotta Fagina, from Austin Power movies and the CEO of Norwegian Bjørn Kjos

In Data Quality sales, we use the name Donald Duck as an example to look for when people want to trick you, and we talk about fraud detection and data analysis. This scam is not new, but can be easily detected and avoided with Data Quality solutions.

Read my follow up post. “Detecting Scam and Fraud”

Read more in Norwegian her:

With Google Translate:

Product Data Management is important for e-Commerce

[tweetmeme source=”jeric40”

Product Data Management is considered as the most challenging field within Data Quality Management. There are more variables and attributes in Product Data Management, than there is in Customer Data Management.

An example:

Who would think this the same product?  They use different descriptions, measurement and standards.  The examples from e-Commerce are thankfully a little simpler, and would be easier to solve.

If you want to save money in Data Quality Management, Product Data is the area with most potential.  Just by standardize, not cleansing, you have a cost saving potential of 2,5%.

Product Data Management impacts e-Commerce.

Impact on Search Engine Optimazition (SEO)
SEO experts I have talked to tell me that duplicate products and duplicate URL’s are quite common in Webshops.  Duplicate records and URL’s are bad for different reasons:

–          It has negative impact on search rankings

–          It waters out the results in the index

–          If others link to your products, it will be split between the products and the
effect will not be as strong.

Cleansing you Data and URL, will be an effective way to improve your SEO.

Recommendation Engine.
2010 could be the year for the recommendation engine. Amazon has used this for years, and there is a lot of buzz in the market for this feature now.  The increase in Conversion Rate and Page Per Visit is quite amazing.

You will get the best result of you only list the product once; otherwise the recommendations will be spread across several identical products and not be as strong.

Which is most likely to get highest conversion rate?  3 identical products/URL’s with a few recommendations or 1 product with several?

Usability and Image.
How will customers react if they find several entries of the same product?  Maybe the go to the next store, where they only find one entry, to be sure they purchase the right product.

It gives a better impression of your shop if your data is in order. It is a sign that the rest of your business is in order also.

These are just a few examples of impact bad Product Data can have on your web shop, some of the challenges described above can be solved without too much effort, whereas others will be more demanding.  Why don’t you run a test of your data?

A couple of stories about Calendar troubles

[tweetmeme source=”jeric40”

Ohh – is it 53 weeks in 2009?

A little Data Quality Glitch story from Norway again.  Skagerak Elektro which provides the street lights in the Municipality of Tønsberg had not taken into the account that 2009 has 53 weeks – not only 52.  So when the clock passed midnight on Sunday, the streetlights blacked out.  They had forgotten to fill in week 53 into the operating system.

Not a big story, but maybe some Data Quality Processes should be in place??

Link to story in Norwegian:

Link to story with using Google Translate

Calling Information for 17 seconds costs 100 Euro

Daylight saving time enden on oct 25, and the time was turned back from 0300 to 0200.  A Norwegian Telecom provider could not cope with this.

One person who called the information for a number to the Taxi service, called 17 seconds at 02:14:28, but the bill said 1 hour and 17 seconds, and run up to about 100 Euro.

The story in Norwegian here:

The story using Google Translate here.