The Finnish national personal identification number is the Henkilötunnus, aka Hetu or Ht, has the following format – ddmmyyc999C. For details how to calculate the control character, I refer to the overview blog on National Identification Numbers.
Validating the Hetu 270368A172X shows that it is indeed a correct number. The number 270368172 generates indeed 29 for the modulo 31 proof, represented by control character “X” in the checksum list. The number shows that this is the 86-th girl born on the 27th of March 2068.
The latter might is exactly the start for the discussion on validity. Althought the number itself is well formed, and passes all the automatic checks, dealing with this number in a data quality assessment will raise your digital eyebrow. In the data quality world we will nowadays say that this Hetu is a wrong Hetu, that it cannot be correct.
So always use a bit of human inference when dealing with finnish national personal identification numbers.

Loading ...
Posted February 1st, 2010
by Winfried van Holland

The national personal identification number in the Netherlands is called the Burgerservicenummer (or abbreviated with BSN, introduced since november 2007). It is a 9-digit number where the number can be validated by a weighted 11-proof. Basically all the digits become a weighting factor and by calculating the sequential digits with their weight the final result must exactly be divisible by 11.
A nice effect of this weighted 11-proof is that there are at least 2 digits different between 2 individual numbers. You need to perform at least 2 changes to come from one number to another – it might be that there are 2 completely different digits (e.g., 112682765 and 112682777) or the you need to swap one digit and change another (e.g., 427096509 and 427096510).
Mathematically it might still be that there are two succeeding numbers like 427096169 and 427096170, which still need 2 changes to come from the one to the other. Read the rest of this entry »

Loading ...
Posted January 19th, 2010
by Winfried van Holland
Within Europe there is no such system as European Social Security Number or European Identification Number. A lot of countries have their own system, and other countries are struggling to get a system into place.
The struggle of some countries has to do with historical reasons and with privacy aspects. Unique identifiation is not always used in favour of the community. And some of the used identification systems contain privacy-sensitive information, among others date of birth, gender and/or place of birth, where older systems might even contain religious or other privacy-senitive information.
A wide range of countries use the combination of date of birth, gender identification and the political region where you are born. In such a mechanism it is most common that part of the identification number is a 2-digit or 3-digit serial number to identify the unique male or female born on a specific date (or born on a specific month). Some countries provide odd serial numbers for male, and even for female. Bulgaria is the only one that wants “odd” females. Some countries like to divide on range (0-499 male, 500-999 female). And some countries like Norway make nice combinations to include the century of birth or period of birth in the serial number. Read the rest of this entry »

Loading ...
Posted January 19th, 2010
by Winfried van Holland
Lack of understanding of the complexity of international names caused a near-accident successfully prevented by the Dutchman Jasper Schuringa.
On Flight 253, on its way from Amsterdam to Detroit, a passenger tried to explode the airplane. This passenger was not called John Smith, or Peter Johnson. No, his name was a little more complicated: Umar Farouk Abdulmutallab. Easy to misspell, and that is exactly what happened. A misspelling of the name of Umar Farouk Abdulmutallab resulted in the State Department believing he did not have a valid U.S. visa.
We love damage control, not prevention
Read the rest of this entry »

Loading ...
Posted January 13th, 2010
by Eddy Reimerink
Dealing with matching of persons or contact data in general, we are all aware that individuals can make use of abbreviations or nicknames as kind of synonyms for their name. Classic examples are the usage of the name Bill for the actual name William, or like my own father is using the name Mans while officially his name is Hermanus. Most matching engines make use of a kind of synonym table to take care of this. That can be done because within a culture or region the nicknames are quite often linked to the same names and people do not tend to use completely different official registered names.
It becomes more challenging if there is no longer a link between nickname and official name. That may happen, for example, if people move from one cultural region to another where also other writing sets are used. Take for example my chinese friend 高为民, whose Latin name would be Gao Weimin (family name first), but the moment he works in Europe or the US he is using the Latin variant William Gao. There is no common relation to the name William and Weimin both in Latin or Chinese and it they are no phonetic variants of each other. Read the rest of this entry »

Loading ...
Posted January 6th, 2010
by Winfried van Holland

In the evolution of information technology Gartner provided a new term as ultimate goal to reach: Pattern-Based Strategy.
As you were reaching for the final destination in your ultimate journey to transform bits and bytes to real information, again you encounter a new optimum. Pattern-Based Strategy, as described by Yvonne Genovese et al. can be identified as the last era in all the eras of IT-value add. Basically, the level of control identifies in which of the era you currently operate – from tight control and pure automation in the ‘old’ days via augmentation, e-commerce/Web 1.0 and web 2.0 to the highest era called – Pattern-Based Strategy. Read the rest of this entry »

Loading ...
Posted December 21st, 2009
by Winfried van Holland

Every year when autumn comes the assistants of the sales department get a little nervous. They know what will happen in short term. It’s almost Christmas and the selections of contacts to receive a Christmas card have to be made.
Every year it’s the same. First the selections for every account manager are made and they will have to check manually if these are correct. This year will be the same as ever, which means that:
- relevant companies and contacts are missing
- new companies and contact persons will be added
- contact persons will be deleted
- contact persons will be transferred to their new company
- addresses appear to be not up-to-date Read the rest of this entry »

Loading ...
Posted December 3rd, 2009
by Ron Mulderij

Through the increase of modern technologies our payments are processed electronically more and more. Banks try to reduce costs and force their customers to carry out the payments themselves. Internet banking has become the standard. Customers no longer can deliver written transfer orders at their bank, but have to book the transfers using internet banking facilities.People can easily make a typing error in the account number that still will result in an existing account number. The risks are fully on the customer’s side. Although banks always are willing to help them to get the money returned, it’s better to avoid these errors.
In my opinion, banks should be obliged to perform a name-number-check for every payment or at least for every larger amount. Read the rest of this entry »

Loading ...
Posted November 10th, 2009
by Ron Mulderij

The internet is on the verge of one of the most fundamental changes in its history. The Internet Corporation for Assigned Names and Numbers (ICANN) is expected to agree on the use of internet addresses in non-Latin characters during this week’s ICANN convention in Seoul. If all goes according to plan, it will be possible to use Greek, Cyrllic, Arabic, Chinese, Korean and many other characters in the internet browser’s address bar. More than half of the 1.6 billion internet users in the world are using a character set which is not Latin. Therefore, ICANN expects that the number of non-Latin domain names, and thus the number of new internet usersm, will increase rapidly.
This far-reaching change in the use of he internet is based on a system that can “translate” or “convert” different writing systems (with sometimes different writing directions, i.a Arabic and Hebrew). On a high level, it would look a little like this, I would imagine:
|
عربي
|
中文
|
English
|
日本語
|
Deutsch
|
Français
|
Español
|
Русский
|
Português
|
한국어
|
Italiano
|
|
AR
|
ZH
|
EN
|
JA
|
DE
|
FR
|
ES
|
RU
|
PT
|
KO
|
IT
|
Naturally, this phenomenon raises questions concerning the matching of internet addresses. Is ووو.هُمَنِنفِرِرِنسِ.كُم the same as www.humaninference.com? It appears that generic multilingual data matching issues also apply in this particular case. How do we handle these comparisons? For a couple of thoughts, please read this…….

Loading ...
Posted October 28th, 2009
by Holger Wandt

On 28 january 2010 the next Human Inference Data Quality Summit will be held in the Evoluon in Eindhoven (NL). The theme – Value your data, value your future- is inspired by the idea that investments in data quality have become part of standard business and that vision, strategy and solutions are being synchronized with these investments. As data quality has reached a certain level of maturity, it is time to have an in-depth look at the (near) future of Data Quality.
The program is challenging, comprehensive and entertaining. Keynote speakers include Ted Friedman (vice president Gartner Research), Mathias Klier (professor at the University of Innsbruck) and Sabine Palinckx (CEO Human Inference). Additionally, in the break-out-session a wide variety of theme-related topics will be addressed: maximising the buisnes value of information, guiding a dq-project through migration, data quality maturity, marketing effectiveness and many more….. In short, the Data Quality Summit is not to be missed!
Save the date and register by clicking this link!

Loading ...
Posted October 21st, 2009
by Holger Wandt