Stop using Customer Relationship Management systems – and learn about possibilities to make dealing with customer information easier

March 9th, 2010 by Vincent van Hunnik

knock_1691050

Have you ever tried to get contact details in and out of a CRM system, and ended up with a bigger mess? I have. The concept is easy: store all information about prospects and customers in one system, allowing you to have your communication efforts streamlined.

Reality, however, is harder: contact details entered on your website should be fed to the system automatically. Sending your periodic newsletter should be based on the details in your CRM system. Not to mention dealing with information on bounces. Integrating your CRM system(s) with mass mailing, campaign management and self service portals is helpful, but for some reason the major means of transporting lead and customer information still seems to be Excel… Leaving you with the necessity to mass import results, new contacts and changed information. Read the rest of this entry »

Adieu Marcel …..

March 2nd, 2010 by Jacques Baron

french-waiter 3

Everybody who has ever been on holiday in France has probably had a neighbour named Gaston, Jacques, Louis, Claire or Françoise . We are used to those first names, they evocate the “France profonde”, sleepy villages at the end of a road, films of Pagnol or Rohmer. Walks along the Seine in de shadow of “Notre Dame” in the spring. Coffee at a terrace of the Boulevard Saint-Germain where an obsequious garçon, named Marcel, is looking at your girl friend or wife in a way you dot not really appreciate. This particular image of France is in danger. In a few years our total frame of reference could have disappeared.

Nowadays French parents let their imagination go freely when they are choosing first names for their children. Looking at recent entries in the civil registry, you will find rather unusual first names like Bulle, Héribert, Loeva, Hermès, Evolène, and Argan.
These first names have all kind of origins. For example, they can be a combination of first names (Timéo, which is derived from Timothée and Théo),or they are different writing forms of known first names (Lilou becomes Lee-Lou). We can also find names from Greek or Celtic mythology or even from literature, like Arwen, a character from the novel Lord of the Rings. Read the rest of this entry »

Fødselsnummer – Crossing centuries in Norway

February 15th, 2010 by Winfried van Holland

Norwegian Fødselsnummer examples

The Norwegian Fødselsnummer (Birthnumber) is an 11-digit number with 2 control digits. The 10-th digit is a control digit calculated with a weighted modulo 11 variant over the first 9 digits. The 11-th digit is a control digit calculated with another weighted modulo 11 variant over the first 9 digits combined with the 10-th control digit.

As in other countries also this number is based on the date of birth. The first 6 digits represent the birth date as “ddmmyy”. Problem with a 6-digit date is that you cannot identify the century – is a Fødselsnummer starting with 121009 someone born in 1909 or 2009? The Norwegian government has solved this by grouping the following 3 individual digits (individual number) in groups representing a certain era. If you are born between 1854-1899, then your individual number must be between 500 and 749, born between 1900-1999 then your number lies between 000 and 499, and for those born recently between 2000-2039 then your number lies between 500 and 999. With some exceptions for those with an individual number between 900 and 999. Read the rest of this entry »

New Matching Engines go beyond apples and oranges

February 11th, 2010 by Winfried van Holland

Beyond apples and oranges

Professional matching engines are becoming more and more intelligent. Within Human Inference, we also see that our matching techniques are capable of using more and more intelligence, and needless to say that we incorporate and use this intelligence in our engines in order to adopt to the way that humans do their matching.

Traditional data quality or matching engines were based on atomic string comparison functions like match-codes, phonetic comparison, Levenshtein string distance, n-gram comparisons or similar functions. These kinds of functions are relatively easy to implement and to use although a significant amount of plumbing is needed to get reasonable results. Open source projects like the Lucene search engine, and variants, provide a solid and proven set of these functions. The drawback of these functions is that it’s not always clear for what purpose one needs to utilize a particular function. An even larger issue is the fact that these low-level DQ functions cannot distinguish between apples and oranges – you end up comparing family names with street names. We still see that,  for example BI vendors,  claim to provide data quality functionality, while they only provide these atomic comparisons. Read the rest of this entry »

Is 270368A172X a correct Finnish Henkilötunnus?

February 1st, 2010 by Winfried van Holland
FinlandHetu270368A172X-150x150

The Finnish national personal identification number is the Henkilötunnus, aka Hetu or Ht, has the following format – ddmmyyc999C. For details how to calculate the control character, I refer to the overview blog on National Identification Numbers.

Validating  the Hetu 270368A172X shows that it is indeed a correct number. The number 270368172 generates  indeed 29 for the modulo 31 proof, represented by control character “X” in the checksum list. The number shows that this is the 86-th girl born on the 27th of March 2068.

The latter might is exactly the start for the discussion on validity. Althought the number itself is well formed, and passes all the automatic checks, dealing with this number in a data quality assessment will raise your digital eyebrow. In the data quality world we will nowadays say that this Hetu is a wrong Hetu, that it cannot be correct.

So always use a bit of human inference when dealing with finnish national personal identification numbers.

Remarkable facts on Dutch National Personal Identification Number (Burgerservicenummer BSN)

January 19th, 2010 by Winfried van Holland

bsn

The national personal identification number in the Netherlands is called the Burgerservicenummer (or abbreviated with BSN, introduced since november 2007). It is a 9-digit number where the number can be validated by a weighted 11-proof. Basically all the digits become a weighting factor and by calculating the sequential digits with their weight the final result must exactly be divisible by 11.

A nice effect of this weighted 11-proof is that there are at least 2 digits different between 2 individual numbers. You need to perform at least 2 changes to come from one number to another – it might be that there are 2 completely different digits (e.g., 112682765 and 112682777) or the you need to swap one digit and change another (e.g., 427096509 and 427096510).

Mathematically it might still be that there are two succeeding numbers like 427096169 and 427096170, which still need 2 changes to come from the one to the other. Read the rest of this entry »

Why there are maximum of (fe)males in a country

January 19th, 2010 by Winfried van Holland

Within Europe there is no such system as European Social Security Number or European Identification Number. A lot of countries have their own system, and other countries are struggling to get a system into place.

The struggle of some countries has to do with historical reasons and with privacy aspects. Unique identifiation is not always used in favour of the community. And some of the used identification systems contain privacy-sensitive information, among others date of birth, gender and/or place of birth, where older systems might even contain religious or other privacy-senitive information.

A wide range of countries use the combination of date of birth, gender identification and the political region where you are born. In such a mechanism it is most common that part of the identification number is a 2-digit or 3-digit serial number to identify the unique male or female born on a specific date (or born on a specific month). Some countries provide odd serial numbers for male, and even for female. Bulgaria is the only one that wants “odd” females. Some countries like to divide on range (0-499 male, 500-999 female).  And some countries like Norway make nice combinations to include the century of birth or period of birth in the serial number. Read the rest of this entry »

Attempted bombing Christmas Day could have been prevented!

January 13th, 2010 by Eddy Reimerink

flight-253-suspect 

Lack of understanding of the complexity of international names caused a near-accident successfully prevented by the Dutchman Jasper Schuringa.

 On Flight 253, on its way from Amsterdam to Detroit, a passenger tried to explode the airplane. This passenger was not called John Smith, or Peter Johnson. No, his name was a little more complicated: Umar Farouk Abdulmutallab. Easy to misspell, and that is exactly what happened. A misspelling of the name of Umar Farouk Abdulmutallab resulted in the State Department believing he did not have a valid U.S. visa.

 We love damage control, not prevention

Read the rest of this entry »

Matching persons with different official names

January 6th, 2010 by Winfried van Holland

Dealing with matching of persons or contact data in general, we are all aware that individuals can make use of abbreviations or nicknames as kind of synonyms for their name. Classic examples are the usage of the name Bill for the actual name William, or like my own father is using the name Mans while officially his name is Hermanus. Most matching engines make use of a kind of synonym table to take care of this. That can be done because within a culture or region the nicknames are quite often linked to the same names and people do not tend to use completely different official registered names.

It becomes more challenging if there is no longer a link between nickname and official name. That may happen, for example, if people move from one cultural region to another where also other writing sets are used. Take for example my chinese friend 高为民, whose Latin name would be Gao Weimin (family name first), but the moment he works in Europe or the US he is using the Latin variant William Gao. There is no common relation to the name William and Weimin both in Latin or Chinese and it they are no phonetic variants of each other. Read the rest of this entry »

Let’s be honest – Solve your data quality before jumping into Pattern-Based Strategy

December 21st, 2009 by Winfried van Holland

pattern

In the evolution of information technology Gartner provided a new term as ultimate goal to reach: Pattern-Based Strategy.

As you were reaching for the  final destination in your ultimate journey to transform bits and bytes to real information, again you encounter a new optimum. Pattern-Based Strategy, as described by Yvonne Genovese et al. can be identified as the last era in all the eras of  IT-value add. Basically, the level of control identifies in which of the era you currently operate – from tight control and pure automation in the ‘old’ days via augmentation, e-commerce/Web 1.0 and web 2.0 to the highest era called – Pattern-Based Strategy. Read the rest of this entry »