<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Value Talk &#187; matching</title>
	<atom:link href="http://www.datavaluetalk.com/tag/matching/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.datavaluetalk.com</link>
	<description>Customer data is a valuable asset. Why not treat it that way?</description>
	<lastBuildDate>Wed, 01 Sep 2010 11:38:25 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Chinglish &#8211; the most delightful side-effect of internationalization</title>
		<link>http://www.datavaluetalk.com/2010/04/09/chinglish-the-most-delightful-side-effect-of-internationalization/</link>
		<comments>http://www.datavaluetalk.com/2010/04/09/chinglish-the-most-delightful-side-effect-of-internationalization/#comments</comments>
		<pubDate>Fri, 09 Apr 2010 12:18:00 +0000</pubDate>
		<dc:creator>Holger Wandt</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Chinese characters]]></category>
		<category><![CDATA[fault-tolerance]]></category>
		<category><![CDATA[internationalisation]]></category>
		<category><![CDATA[internationalization]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[matching]]></category>

		<guid isPermaLink="false">http://www.datavaluetalk.com/?p=1439</guid>
		<description><![CDATA[
An increasing number of companies have to deal with data from the world’s fastest emerging economy: China. And the big question in this issue is of course: How can we compare these “strange” Chinese characters with our own writing set? 
Grammar and character set of our Western alphabet-languages (such as English, French, Dutch or German) differ [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-medium wp-image-1440" title="little grass has life" src="http://www.datavaluetalk.com/wp-content/uploads/2010/04/little-grass-has-life-300x198.jpg" alt="little grass has life" width="287" height="192" /></p>
<p>An increasing number of companies have to deal with data from the world’s fastest emerging economy: China. And the big question in this issue is of course: How can we compare these “strange” Chinese characters with our own writing set? </p>
<p>Grammar and character set of our Western alphabet-languages (such as English, French, Dutch or German) differ tremendously from Mandarin Chinese (which is the language spoken by most in the People’s Republic of China and abroad. Mandarin is a tonal language with an ideographic character set. Almost all characters have a semantic and a phonetic component. The different pithch in the pronunciation eventually determines the signification</p>
<p>Complicated? Definitely. But what about the other way around? Have you ever thought about the difficulties the Chinese have to face when trying to convert their language into meaningful English?</p>
<p>This phenomenon is sometimes hilariously being illustrated by the many public signs in China used to inform foreign visitors or to help them finding their way around.</p>
<p>This is truly a delightful side-effect of internationalization. …. <span id="more-1439"></span></p>
<p>The German sinologist Oliver Lutz Radtke christened these linguistic attempts “Chinglish” and collected many examples, which can be found virtually everywhere: on hotel room doors, on road signs along the highways, shampoo bottles and t-shirts. A small anthology:</p>
<ul>
<li>A warning sign for a steep slope: <strong><em>“Please, watch your slip”</em></strong></li>
<li>To avoid all misunderstandings, on the inside of a taxi door: <strong><em>“Don’t forget to carry your thing”</em></strong></li>
<li>A sign above a store entrance, to let our fantasy run free: <strong><em>“Welcome to presence”</em></strong></li>
</ul>
<p>Although this is all very funny, from a data quality point of view, this definitely leaves a thing or two to consider. For example: What should we think of fault-tolerance with regard to typo’s when we think of entering Chinese customer data into a database? What is the influence of typo’s in an ideographic writing set on searching, matching, enriching and correcting customer data?</p>
<p>There is still a lot of work to be done in international data quality. For more information, check out the Human Inference website on <a href="http://www.humaninference.com/Our%20products/HIquality%20Name%20Worldwide.aspx" target="_blank">HIquality Name Worldwide</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datavaluetalk.com/2010/04/09/chinglish-the-most-delightful-side-effect-of-internationalization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Matching persons with different official names</title>
		<link>http://www.datavaluetalk.com/2010/01/06/matching-persons-with-different-official-names/</link>
		<comments>http://www.datavaluetalk.com/2010/01/06/matching-persons-with-different-official-names/#comments</comments>
		<pubDate>Wed, 06 Jan 2010 15:32:59 +0000</pubDate>
		<dc:creator>Winfried van Holland</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[cultural differences]]></category>
		<category><![CDATA[fault-tolerant matching]]></category>
		<category><![CDATA[matching]]></category>
		<category><![CDATA[names]]></category>
		<category><![CDATA[naming confusion]]></category>
		<category><![CDATA[nicknames]]></category>

		<guid isPermaLink="false">http://www.datavaluetalk.com/?p=1269</guid>
		<description><![CDATA[Dealing with matching of persons or contact data in general, we are all aware that individuals can make use of abbreviations or nicknames as kind of synonyms for their name. Classic examples are the usage of the name Bill for the actual name William, or like my own father is using the name Mans while [...]]]></description>
			<content:encoded><![CDATA[<p class="mceTemp"><img class="alignnone" title="what is the what?" src="http://img1.fantasticfiction.co.uk/images/n37/n185744.jpg" alt="" width="107" height="137" />Dealing with matching of persons or contact data in general, we are all aware that individuals can make use of abbreviations or nicknames as kind of synonyms for their name. Classic examples are the usage of the name <em>Bill </em>for the actual name <em>William</em>, or like my own father is using the name <em>Mans </em>while officially his name is <em>Hermanus</em>. Most matching engines make use of a kind of synonym table to take care of this. That can be done because within a culture or region the nicknames are quite often linked to the same names and people do not tend to use completely different official registered names.</p>
<p>It becomes more challenging if there is no longer a link between nickname and official name. That may happen, for example, if people move from one cultural region to another where also other writing sets are used. Take for example my chinese friend<em> </em>高为民, whose Latin name would be Gao Weimin (family name first), but the moment he works in Europe or the US he is using the Latin variant William Gao. There is no common relation to the name William and Weimin both in Latin or Chinese and it they are no phonetic variants of each other. <span id="more-1269"></span></p>
<p>Recently, I have read a very impressive book from Dave Eggers, called `What is the What´. It gives you a good insight in one of the current problem areas of the world and how people try to survive there. Achak Denk is one of the so-called <a title="Valentino Achak Deng organization" href="http://www.valentinoachakdeng.org/" target="_blank">Lost Boys from Sudan</a>. During his live in Sudan, in refugee camps and finally in the US he is officially using differnt names. That has nothing to do with purposely trying to mystify his identity, but more with receiving an identity from your environment &#8211; at that time and place. He is born as Achak, baptized as Valentino, and later on using the name Dominic or Dominic Arou and  Marialdit. Of course there are people calling him nick names as &#8216;Sleeper&#8217; or &#8216;Gone Far&#8217; but at certain periods in his life he is officially using completely different names. This makes automatic matching of persons, or even manual matching, challenging and keeps it interresting.</p>
<p>I would recommend the book to everyone who wants to learn about what is happening in our world, and especially those interested in names (don&#8217;t forget to study all the names in the last Section of the book).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datavaluetalk.com/2010/01/06/matching-persons-with-different-official-names/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>International domain names &#8211; there goes the ASCIIhood&#8230;.</title>
		<link>http://www.datavaluetalk.com/2009/10/28/international-domain-names-here-goes-the-asciihood/</link>
		<comments>http://www.datavaluetalk.com/2009/10/28/international-domain-names-here-goes-the-asciihood/#comments</comments>
		<pubDate>Wed, 28 Oct 2009 14:37:31 +0000</pubDate>
		<dc:creator>Holger Wandt</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[ICANN]]></category>
		<category><![CDATA[international domain names]]></category>
		<category><![CDATA[internet address]]></category>
		<category><![CDATA[matching]]></category>
		<category><![CDATA[Seoul]]></category>
		<category><![CDATA[transliteration]]></category>

		<guid isPermaLink="false">http://www.datavaluetalk.com/?p=1236</guid>
		<description><![CDATA[
The internet is on the verge of one of the most fundamental changes in its history. The Internet Corporation for Assigned Names and Numbers (ICANN) is expected to  agree on the use of internet addresses in non-Latin characters during this week&#8217;s ICANN convention in Seoul. If all goes according to plan, it will be possible [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-thumbnail wp-image-1239" title="sel-logo-155x82" src="http://www.datavaluetalk.com/wp-content/uploads/2009/10/sel-logo-155x821-150x82.png" alt="sel-logo-155x82" width="150" height="82" /></p>
<p>The internet is on the verge of one of the most fundamental changes in its history. The Internet Corporation for Assigned Names and Numbers (ICANN) is expected to  agree on the use of internet addresses in non-Latin characters during this week&#8217;s ICANN convention in Seoul. If all goes according to plan, it will be possible to use Greek, Cyrllic, Arabic, Chinese, Korean and many other characters in the internet browser&#8217;s address bar. More than half of the 1.6 billion internet users in the world are using a character set which is not Latin. Therefore, ICANN expects that the number of non-Latin domain names, and thus the number of new internet usersm, will increase rapidly.</p>
<p>This far-reaching change in the use of he internet is based on a system that can &#8220;translate&#8221; or &#8220;convert&#8221; different writing systems (with sometimes different writing directions, i.a Arabic and Hebrew). On a high level, it would look a little like this, I would imagine:</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td>
<p align="center">عربي</p>
</td>
<td>
<p align="center">中文</p>
</td>
<td>
<p align="center">English</p>
</td>
<td>
<p align="center">日本語</p>
</td>
<td>
<p align="center">Deutsch</p>
</td>
<td>
<p align="center">Français</p>
</td>
<td>
<p align="center">Español</p>
</td>
<td>
<p align="center">Русский</p>
</td>
<td>
<p align="center">Português</p>
</td>
<td>
<p align="center">한국어</p>
</td>
<td>
<p align="center">Italiano</p>
</td>
</tr>
<tr>
<td>
<p align="center">AR</p>
</td>
<td>
<p align="center">ZH</p>
</td>
<td>
<p align="center">EN</p>
</td>
<td>
<p align="center">JA</p>
</td>
<td>
<p align="center">DE</p>
</td>
<td>
<p align="center">FR</p>
</td>
<td>
<p align="center">ES</p>
</td>
<td>
<p align="center">RU</p>
</td>
<td>
<p align="center">PT</p>
</td>
<td>
<p align="center">KO</p>
</td>
<td>
<p align="center">IT</p>
</td>
</tr>
</tbody>
</table>
<p>Naturally, this phenomenon raises questions concerning the matching of internet addresses. Is <span style="color: #0000ff;"><span style="text-decoration: underline;"><strong>ووو.هُمَنِنفِرِرِنسِ.كُم </strong></span></span>the same as <a href="http://www.humaninference.com">www.humaninference.com</a>? It appears that generic multilingual data matching issues also apply in this particular case. How do we handle these comparisons? For a couple of thoughts, <a href="http://www.humaninference.com/en/Our%20products/HIquality%20Suite/~/media/C4F9CE0278294526A50A7E960544B1BF.ashx" target="_blank">please read this&#8230;&#8230;.   </a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.datavaluetalk.com/2009/10/28/international-domain-names-here-goes-the-asciihood/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deduplication, first time wrong?</title>
		<link>http://www.datavaluetalk.com/2009/03/31/deduplication-first-time-wrong/</link>
		<comments>http://www.datavaluetalk.com/2009/03/31/deduplication-first-time-wrong/#comments</comments>
		<pubDate>Tue, 31 Mar 2009 13:25:28 +0000</pubDate>
		<dc:creator>Paul Tours</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[duplicate records]]></category>
		<category><![CDATA[duplicates]]></category>
		<category><![CDATA[match records]]></category>
		<category><![CDATA[matching]]></category>
		<category><![CDATA[merge records]]></category>
		<category><![CDATA[SAP]]></category>

		<guid isPermaLink="false">http://www.datavaluetalk.com/?p=856</guid>
		<description><![CDATA[
One of my current projects has been to take an intelligent approach to the removal of duplicates already on an existing system (SAP).
The client has already successfully used our software in their IT environment to effectively stop all new duplicates being entered into SAP. They now want to use the same technology to remove all [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-863" title="twins" src="http://www.datavaluetalk.com/wp-content/uploads/2009/03/twins.gif" alt="twins" width="248" height="260" /></p>
<p>One of my current projects has been to take an intelligent approach to the removal of duplicates already on an existing system (SAP).</p>
<p>The client has already successfully used our software in their IT environment to effectively stop all new duplicates being entered into SAP. They now want to use the same technology to remove all existing duplicates. Their idea is so simple I am amazed that I have not heard of it being done elsewhere before.</p>
<p>Every evening the whole clients SAP database will be searched for duplicates in their Companies and Contacts (&gt; 3 million records deduplicated in less than an hour!) The results are stored in a master result table that SAP has been given access to. Now depending on the likelihood of the match, the duplicates can fall into one of three categories: automatic merging, manual merging or no merge. If the score for the whole duplicate group is above the threshold for automatic merging then the automatic merging process is started. <span id="more-856"></span></p>
<p>This merge process has been created by an external SAP consultancy group that does a lot of clever stuff in giving each record a score depending on its&#8217; financial relevance. E.g. open payments, current order status, payment reminders etc. (Hey, it&#8217;s SAP and in the world according to SAP only financial dealings have a value!) In the end the one record with the highest score is set to be the lead duplicate. All information from the other records in the duplicate group is placed onto the leading record to create a unique (&#8217;Golden&#8217;) record. All duplicate records with the exception of the lead duplicate are then removed from the system, in the case of SAP, these records are given a &#8217;set for deletion&#8217; flag and subsequently archived.</p>
<p>The &#8216;Non merges&#8217;, i.e. where the match score is below the accepted threshold level, are discarded and all remaining records are sent to a separate SAP mask for manual inspection for the following day. All that is required is to identify if the records shown belong in a duplicate group or not. After this decision has been made each duplicate group goes to the &#8216;merging&#8217; process. Just the same as the automatic merge process.</p>
<p>At the end of the day the whole process starts again. Wash, rinse, repeat! Simple! The first thing to happen is that over a short period of time all the secure duplicates disappear as they are merged automatically. This is highly visible, no more multiple identical records that pop up whenever a new record has been entered. The impact on the quality on the surrounding systems is just as direct. No sending out bills or marketing mails x times to the same person (having worked in Marketing before, I know the problem and it always leaves such a professional impression with the customer!) So it&#8217;s already something easy to sell to your managers and so far you have not had to lift a finger. Great!</p>
<p>The brilliance of the solution though lies elsewhere. The simple fact is that it really does not matter whether the rest of the results are worked through in 1 day, 1 month or a year &#8211; as they are always captured, every day anew. The net result is that the total level of duplicates is constantly decreasing. Where the merge process has taken place, the duplicates will disappear. Only a change on the record will force it to be rechecked in the next round of deduplication. This means that apart for the costs of enhancement of the current system the client has an effective DQ firewall that now not only protects them from duplicate data being entered onto their IT systems, but will now over time cleanse the system from within. Even if it means putting an employee to sporadically make a decision on the manual matches. It is something that the company/department can concentrate on where they have time/resources available. (That should be easy after showing what success you have had with it already!)</p>
<p>How about if it the process could be easily and readily monitored? Say by using Excel or a similar product. Bar graphs and pie charts always tell way more than actual figures! Then the impact on what is happening is all the more visible and easy to sell (a good budget retainer!)</p>
<p>Good luck in dealing with your duplicates.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datavaluetalk.com/2009/03/31/deduplication-first-time-wrong/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The added value of an integrated customer view</title>
		<link>http://www.datavaluetalk.com/2008/12/08/the-added-value-of-an-integrated-customer-view/</link>
		<comments>http://www.datavaluetalk.com/2008/12/08/the-added-value-of-an-integrated-customer-view/#comments</comments>
		<pubDate>Mon, 08 Dec 2008 14:44:56 +0000</pubDate>
		<dc:creator>Emile van de Klok</dc:creator>
				<category><![CDATA[MDM for customer data]]></category>
		<category><![CDATA[cdi]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[matching]]></category>
		<category><![CDATA[single customer view]]></category>

		<guid isPermaLink="false">http://www.datavaluetalk.com/?p=227</guid>
		<description><![CDATA[






The added value of an integrated customer view  depends strongly on the quality of that integrated customer view. Every  organization that is seriously planning to create a single customer view should  ask itself the following question: &#8220;What determines the quality of my customer  view and so the accompanying level of added [...]]]></description>
			<content:encoded><![CDATA[<div class="mceTemp">
<dl class="wp-caption alignleft" style="width: 159px;">
<dt class="wp-caption-dt">
<div style="text-align: auto;"><a href="http://www.datavaluetalk.com/mdmdemo/"><img src="http://www.watweetikvanmijnklant.nl/wp-content/uploads/2008/12/mdmdemoss-249x300.jpg" alt="MDM Demo" width="149" height="180" /></a></div>
</dt>
</dl>
</div>
<p>The added value of an integrated customer view  depends strongly on the quality of that integrated customer view. Every  organization that is seriously planning to create a single customer view should  ask itself the following question: &#8220;What determines the quality of my customer  view and so the accompanying level of added value?&#8221;</p>
<p>Prior to answering this question we need to take  one step back. Why does not every organization have a single view of the  customer? The cause lies in the fact that many organizations have their customer  data spread across multiple systems all facilitating separate business  processes. Additionally customer data is often highly polluted, fragmented and  incomplete.</p>
<p><span id="more-227"></span></p>
<p>So it appears that the data itself plays a crucial  role in the lack of an integrated customer view. Or more accurately, the better  the data &#8211; the better the customer view.   And the better the matching of customer records across separate systems  the better the integrated customer view.</p>
<p>So Data Quality and  Matching (Identity Resolution) determine in large parts the quality of the  integrated customer view and the added value that it delivers. <a title="MDM Demo" href="http://www.datavaluetalk.com/mdmdemo/" target="_blank">Take a look at  this demo</a> showing a step-by-step approach how to build a single customer view  and get a better idea of the role of Data Quality and Matching within this  process.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datavaluetalk.com/2008/12/08/the-added-value-of-an-integrated-customer-view/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
