Posts Tagged ‘matching’

Matching persons with different official names

Wednesday, January 6th, 2010

Dealing with matching of persons or contact data in general, we are all aware that individuals can make use of abbreviations or nicknames as kind of synonyms for their name. Classic examples are the usage of the name Bill for the actual name William, or like my own father is using the name Mans while officially his name is Hermanus. Most matching engines make use of a kind of synonym table to take care of this. That can be done because within a culture or region the nicknames are quite often linked to the same names and people do not tend to use completely different official registered names.

It becomes more challenging if there is no longer a link between nickname and official name. That may happen, for example, if people move from one cultural region to another where also other writing sets are used. Take for example my chinese friend 高为民, whose Latin name would be Gao Weimin (family name first), but the moment he works in Europe or the US he is using the Latin variant William Gao. There is no common relation to the name William and Weimin both in Latin or Chinese and it they are no phonetic variants of each other. (more…)

International domain names – there goes the ASCIIhood….

Wednesday, October 28th, 2009

sel-logo-155x82

The internet is on the verge of one of the most fundamental changes in its history. The Internet Corporation for Assigned Names and Numbers (ICANN) is expected to  agree on the use of internet addresses in non-Latin characters during this week’s ICANN convention in Seoul. If all goes according to plan, it will be possible to use Greek, Cyrllic, Arabic, Chinese, Korean and many other characters in the internet browser’s address bar. More than half of the 1.6 billion internet users in the world are using a character set which is not Latin. Therefore, ICANN expects that the number of non-Latin domain names, and thus the number of new internet usersm, will increase rapidly.

This far-reaching change in the use of he internet is based on a system that can “translate” or “convert” different writing systems (with sometimes different writing directions, i.a Arabic and Hebrew). On a high level, it would look a little like this, I would imagine:

عربي

中文

English

日本語

Deutsch

Français

Español

Русский

Português

한국어

Italiano

AR

ZH

EN

JA

DE

FR

ES

RU

PT

KO

IT

Naturally, this phenomenon raises questions concerning the matching of internet addresses. Is ووو.هُمَنِنفِرِرِنسِ.كُم the same as www.humaninference.com? It appears that generic multilingual data matching issues also apply in this particular case. How do we handle these comparisons? For a couple of thoughts, please read this…….   

Deduplication, first time wrong?

Tuesday, March 31st, 2009

twins

One of my current projects has been to take an intelligent approach to the removal of duplicates already on an existing system (SAP).

The client has already successfully used our software in their IT environment to effectively stop all new duplicates being entered into SAP. They now want to use the same technology to remove all existing duplicates. Their idea is so simple I am amazed that I have not heard of it being done elsewhere before.

Every evening the whole clients SAP database will be searched for duplicates in their Companies and Contacts (> 3 million records deduplicated in less than an hour!) The results are stored in a master result table that SAP has been given access to. Now depending on the likelihood of the match, the duplicates can fall into one of three categories: automatic merging, manual merging or no merge. If the score for the whole duplicate group is above the threshold for automatic merging then the automatic merging process is started. (more…)

The added value of an integrated customer view

Monday, December 8th, 2008
MDM Demo

The added value of an integrated customer view depends strongly on the quality of that integrated customer view. Every organization that is seriously planning to create a single customer view should ask itself the following question: “What determines the quality of my customer view and so the accompanying level of added value?”

Prior to answering this question we need to take one step back. Why does not every organization have a single view of the customer? The cause lies in the fact that many organizations have their customer data spread across multiple systems all facilitating separate business processes. Additionally customer data is often highly polluted, fragmented and incomplete.

(more…)