<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Value Talk</title>
	<atom:link href="http://www.datavaluetalk.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.datavaluetalk.com</link>
	<description>Customer data is a valuable asset. Why not treat it that way?</description>
	<lastBuildDate>Wed, 01 Sep 2010 11:38:25 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Right witness or wrong criminal?</title>
		<link>http://www.datavaluetalk.com/2010/09/01/right-witness-or-wrong-criminal/</link>
		<comments>http://www.datavaluetalk.com/2010/09/01/right-witness-or-wrong-criminal/#comments</comments>
		<pubDate>Wed, 01 Sep 2010 08:13:57 +0000</pubDate>
		<dc:creator>Holger Wandt</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[database errors]]></category>
		<category><![CDATA[information gathering]]></category>
		<category><![CDATA[law enforcement]]></category>
		<category><![CDATA[police]]></category>
		<category><![CDATA[unique view]]></category>

		<guid isPermaLink="false">http://www.datavaluetalk.com/?p=1463</guid>
		<description><![CDATA[
As I was sitting on a terrace in Barcelona during my recent holiday, I found a copy of the Indenpendent, the well-known British newspaper. Having all the time in the world, I started reading and I came across  this article about the North Yorkshire Police storing data of more than 180,000 people, including their date of birth [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-medium wp-image-1467" title="handboeien" src="http://www.datavaluetalk.com/wp-content/uploads/2010/09/handboeien1-300x225.jpg" alt="handboeien" width="300" height="225" /></p>
<p>As I was sitting on a terrace in Barcelona during my recent holiday, I found a copy of the Indenpendent, the well-known British newspaper. Having all the time in the world, I started reading and I came across  this article about the North Yorkshire Police storing data of more than 180,000 people, including their date of birth and ethnicity. The vast majority of these people had given this information voluntarily and had not committed any crime.</p>
<p>When privacy campaigners questioned the need for compiling such a database, a police spokesperson answered: &#8221; The system is used by many police forces in the UK and internationally to record all information relevant to policing, everything from details of arrested individuals, suspects, victims, witnesses and sources of information as well as addresses, phone numbers and vehicles. The information logged and cross-referenced in the system is absolutely vital to allow us to provide the effective policing service that the people of North Yorkshire and the City of York demand.&#8221;</p>
<p>I think that this is a very dangerous comment. What about the possibility of mixing up data of witnesses and criminals? How do the police forces create an unique view on their &#8220;customers&#8221;? What will be the consequences of so called &#8220;database errors&#8221;?</p>
<p>Of course I understand that the police forces all over the world need information to do their work properly and to prevent crime and other undesirable behaviour. But reading a comment like the above, I really wonder whether law enforcement agencies are  really aware of the essential role of data quality in modern police work.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datavaluetalk.com/2010/09/01/right-witness-or-wrong-criminal/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>6-year old girl on no-fly list of suspected terrorists</title>
		<link>http://www.datavaluetalk.com/2010/07/02/6-year-girl-on-no-fly-list-of-suspected-terrorists/</link>
		<comments>http://www.datavaluetalk.com/2010/07/02/6-year-girl-on-no-fly-list-of-suspected-terrorists/#comments</comments>
		<pubDate>Fri, 02 Jul 2010 07:51:00 +0000</pubDate>
		<dc:creator>Holger Wandt</dc:creator>
				<category><![CDATA[Data Governance]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[anti-terror]]></category>
		<category><![CDATA[Homeland Security]]></category>
		<category><![CDATA[name check]]></category>
		<category><![CDATA[name matching]]></category>
		<category><![CDATA[no-fly lists]]></category>

		<guid isPermaLink="false">http://www.datavaluetalk.com/?p=1447</guid>
		<description><![CDATA[
When the Thomas family from Ohio embarked on a recent trip from Cleveland to Minneapolis, they were in for a huge, but unpleasant surprise. It appeared that 6-year old Alyssa Thomas&#8217; name was on the Homeland Security no-fly list; a list that is used to prevent individuals with known or suspected ties to terrorism from [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1448" title="meisje_734709d" src="http://www.datavaluetalk.com/wp-content/uploads/2010/07/meisje_734709d.jpg" alt="meisje_734709d" width="200" height="150" /></p>
<p>When the Thomas family from Ohio embarked on a recent trip from Cleveland to Minneapolis, they were in for a huge, but unpleasant surprise. It appeared that 6-year old Alyssa Thomas&#8217; name was on the Homeland Security no-fly list; a list that is used to prevent individuals with known or suspected ties to terrorism from flying. The girl&#8217;s father, Santhosh Thomas, states that the worst thing his daughter has ever done, is probably been mean to her sister, but that this should hardly be a matter for the Department of Homeland Security.</p>
<p>The Thomases were eventually allowed to fly that day, but they were told to contact Homeland Security to clear up the matter. Now Alyssa just received a letter from the government, notifying the six-year-old <strong>that nothing will be changed and they won&#8217;t confirm nor deny any information they have about her or someone else with the same name.<span id="more-1447"></span></strong></p>
<p>According to the Transportation Security Administration, the body responsible for the Secure Flight List,  it is likely that Alyssa never had problems flying before. The TSA used to check only international passengers&#8217; names against the no-fly list, but since earlier this month has decided to check domestic passengers as well. To us data quality professionals, this very much sounds like a case of incorrect identity matching&#8230;. Could it be that Homeland Security is in need of some intelligent tools?</p>
<p>For more information realated to this subject, please read the blogposts <a title="Permanent Link to Your name is too “common”…." rel="bookmark" href="http://www.datavaluetalk.com/2009/09/07/your-name-is-too-common/">Your name is too “common”….</a> and <a title="Permanent Link to Attempted bombing Christmas Day could have been prevented!" rel="bookmark" href="http://www.datavaluetalk.com/2010/01/13/attempted-bombing-christmas-day-could-have-been-prevented/">Attempted bombing Christmas Day could have been prevented!</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.datavaluetalk.com/2010/07/02/6-year-girl-on-no-fly-list-of-suspected-terrorists/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Chinglish &#8211; the most delightful side-effect of internationalization</title>
		<link>http://www.datavaluetalk.com/2010/04/09/chinglish-the-most-delightful-side-effect-of-internationalization/</link>
		<comments>http://www.datavaluetalk.com/2010/04/09/chinglish-the-most-delightful-side-effect-of-internationalization/#comments</comments>
		<pubDate>Fri, 09 Apr 2010 12:18:00 +0000</pubDate>
		<dc:creator>Holger Wandt</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Chinese characters]]></category>
		<category><![CDATA[fault-tolerance]]></category>
		<category><![CDATA[internationalisation]]></category>
		<category><![CDATA[internationalization]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[matching]]></category>

		<guid isPermaLink="false">http://www.datavaluetalk.com/?p=1439</guid>
		<description><![CDATA[
An increasing number of companies have to deal with data from the world’s fastest emerging economy: China. And the big question in this issue is of course: How can we compare these “strange” Chinese characters with our own writing set? 
Grammar and character set of our Western alphabet-languages (such as English, French, Dutch or German) differ [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-medium wp-image-1440" title="little grass has life" src="http://www.datavaluetalk.com/wp-content/uploads/2010/04/little-grass-has-life-300x198.jpg" alt="little grass has life" width="287" height="192" /></p>
<p>An increasing number of companies have to deal with data from the world’s fastest emerging economy: China. And the big question in this issue is of course: How can we compare these “strange” Chinese characters with our own writing set? </p>
<p>Grammar and character set of our Western alphabet-languages (such as English, French, Dutch or German) differ tremendously from Mandarin Chinese (which is the language spoken by most in the People’s Republic of China and abroad. Mandarin is a tonal language with an ideographic character set. Almost all characters have a semantic and a phonetic component. The different pithch in the pronunciation eventually determines the signification</p>
<p>Complicated? Definitely. But what about the other way around? Have you ever thought about the difficulties the Chinese have to face when trying to convert their language into meaningful English?</p>
<p>This phenomenon is sometimes hilariously being illustrated by the many public signs in China used to inform foreign visitors or to help them finding their way around.</p>
<p>This is truly a delightful side-effect of internationalization. …. <span id="more-1439"></span></p>
<p>The German sinologist Oliver Lutz Radtke christened these linguistic attempts “Chinglish” and collected many examples, which can be found virtually everywhere: on hotel room doors, on road signs along the highways, shampoo bottles and t-shirts. A small anthology:</p>
<ul>
<li>A warning sign for a steep slope: <strong><em>“Please, watch your slip”</em></strong></li>
<li>To avoid all misunderstandings, on the inside of a taxi door: <strong><em>“Don’t forget to carry your thing”</em></strong></li>
<li>A sign above a store entrance, to let our fantasy run free: <strong><em>“Welcome to presence”</em></strong></li>
</ul>
<p>Although this is all very funny, from a data quality point of view, this definitely leaves a thing or two to consider. For example: What should we think of fault-tolerance with regard to typo’s when we think of entering Chinese customer data into a database? What is the influence of typo’s in an ideographic writing set on searching, matching, enriching and correcting customer data?</p>
<p>There is still a lot of work to be done in international data quality. For more information, check out the Human Inference website on <a href="http://www.humaninference.com/Our%20products/HIquality%20Name%20Worldwide.aspx" target="_blank">HIquality Name Worldwide</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datavaluetalk.com/2010/04/09/chinglish-the-most-delightful-side-effect-of-internationalization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Stop using Customer Relationship Management systems &#8211; and learn about possibilities to make dealing with customer information easier</title>
		<link>http://www.datavaluetalk.com/2010/03/09/stop-using-customer-relationship-management-systems-and-learn-about-possibilities-to-make-dealing-with-customer-information-easier/</link>
		<comments>http://www.datavaluetalk.com/2010/03/09/stop-using-customer-relationship-management-systems-and-learn-about-possibilities-to-make-dealing-with-customer-information-easier/#comments</comments>
		<pubDate>Tue, 09 Mar 2010 11:24:54 +0000</pubDate>
		<dc:creator>Vincent van Hunnik</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Data Services]]></category>
		<category><![CDATA[campaign management]]></category>
		<category><![CDATA[contact details]]></category>
		<category><![CDATA[CRM-system]]></category>
		<category><![CDATA[mass mailing]]></category>
		<category><![CDATA[self service]]></category>

		<guid isPermaLink="false">http://www.datavaluetalk.com/?p=1433</guid>
		<description><![CDATA[
Have you ever tried to get contact details in and out of a CRM system, and ended up with a bigger mess? I have. The concept is easy: store all information about prospects and customers in one system, allowing you to have your communication efforts streamlined.
Reality, however, is harder: contact details entered on your website [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-thumbnail wp-image-1436" title="knock_1691050" src="http://www.datavaluetalk.com/wp-content/uploads/2010/03/knock_1691050-150x150.jpg" alt="knock_1691050" width="150" height="150" /></p>
<p>Have you ever tried to get contact details in and out of a CRM system, and ended up with a bigger mess? I have. The concept is easy: store all information about prospects and customers in one system, allowing you to have your communication efforts streamlined.</p>
<p>Reality, however, is harder: contact details entered on your website should be fed to the system automatically. Sending your periodic newsletter should be based on the details in your CRM system. Not to mention dealing with information on bounces. Integrating your CRM system(s) with mass mailing, campaign management and self service portals is helpful, but for some reason the major means of transporting lead and customer information still seems to be Excel&#8230; Leaving you with the necessity to mass import results, new contacts and changed information.<span id="more-1433"></span></p>
<p>What you really want is the ability to throw whatever information you have at the system, and let the system determine if you already know someone and take care of adding information to already existing contacts. In addition you do not want to spend time on maintaining information you have. Actually, wouldn’t it be nice if you would be able to have your website automatically add leads to your CRM system (e.g. via the Salesforce Web2Lead function), but without  adding duplicates?</p>
<p>The good news is that this can be done. There is tooling out there that keeps your contact data up-to-date, prevents adding duplicate records and makes dealing with contact data easy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datavaluetalk.com/2010/03/09/stop-using-customer-relationship-management-systems-and-learn-about-possibilities-to-make-dealing-with-customer-information-easier/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adieu Marcel &#8230;..</title>
		<link>http://www.datavaluetalk.com/2010/03/02/adieu-marcel/</link>
		<comments>http://www.datavaluetalk.com/2010/03/02/adieu-marcel/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 14:57:34 +0000</pubDate>
		<dc:creator>Jacques Baron</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Data Services]]></category>
		<category><![CDATA[civil registry]]></category>
		<category><![CDATA[French names]]></category>
		<category><![CDATA[processing French data]]></category>

		<guid isPermaLink="false">http://www.datavaluetalk.com/?p=1424</guid>
		<description><![CDATA[
Everybody who has ever been on holiday in France has probably had a neighbour named Gaston, Jacques, Louis, Claire or Françoise . We are used to those first names, they evocate the “France profonde”, sleepy villages at the end of a road, films of Pagnol or Rohmer. Walks along the Seine in de shadow of [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-thumbnail wp-image-1431" title="french-waiter 3" src="http://www.datavaluetalk.com/wp-content/uploads/2010/03/french-waiter-3-150x150.jpg" alt="french-waiter 3" width="150" height="150" /></p>
<p>Everybody who has ever been on holiday in France has probably had a neighbour named Gaston, Jacques, Louis, Claire or Françoise . We are used to those first names, they evocate the “France profonde”, sleepy villages at the end of a road, films of Pagnol or Rohmer. Walks along the Seine in de shadow of “Notre Dame” in the spring. Coffee at a terrace of the Boulevard Saint-Germain where an obsequious garçon, named Marcel, is looking at your girl friend or wife in a way you dot not really appreciate. This particular image of France is in danger. In a few years our total frame of reference could have disappeared.</p>
<p>Nowadays French parents let their imagination go freely when they are choosing first names for their children. Looking at recent entries in the civil registry, you will find rather unusual first names like Bulle, Héribert, Loeva, Hermès, Evolène, and Argan.<br />
These first names have all kind of origins. For example, they can be a combination of first names (Timéo, which is derived from Timothée and Théo),or they are different writing forms of known first names (Lilou becomes Lee-Lou). We can also find names from Greek or Celtic mythology or even from literature, like Arwen, a character from the novel Lord of the Rings.<span id="more-1424"></span></p>
<p>This interest for uncommon first names will of course have consequences for the processing of French data, especially if you take into consideration that these “new” first names, with a frequency less than 3000, are now in the majority. But this diversity will not necessary be a curse. Maybe we will be delivered from ambiguous names, of which we never know whether it is a first name or a surname. Consider the 3 most common surnames in France are Martin, Bernard and Thomas. <br />
But don’t worry; in order to keep the challenge going when you process French data, names like Jacqueline-Germain, Jean Marie Marie Luce, Louise Alexandrine will of course not disappear entirely.<br />
And next time you will be in France, enjoying a salade Niçoise with a cold glass of rosé, overlooking a harbor, where small fishing boots are dancing on the lazy waves, you just will have to get used to the fact that the waitress’s first name is not Marie but Fanchon or Eole.</p>
<p>Source:  Le Parisien 19-02-2010 and “Les 4 000 plus beaux prénoms rares “, de Stéphanie Rapoport, chez First, 8,90 €</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datavaluetalk.com/2010/03/02/adieu-marcel/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Fødselsnummer &#8211; Crossing centuries in Norway</title>
		<link>http://www.datavaluetalk.com/2010/02/15/f%c3%b8dselsnummer-crossing-centuries-in-norway/</link>
		<comments>http://www.datavaluetalk.com/2010/02/15/f%c3%b8dselsnummer-crossing-centuries-in-norway/#comments</comments>
		<pubDate>Mon, 15 Feb 2010 12:57:54 +0000</pubDate>
		<dc:creator>Winfried van Holland</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[personal identification number]]></category>

		<guid isPermaLink="false">http://www.datavaluetalk.com/?p=1332</guid>
		<description><![CDATA[
The Norwegian Fødselsnummer (Birthnumber) is an 11-digit number with 2 control digits. The 10-th digit is a control digit calculated with a weighted modulo 11 variant over the first 9 digits. The 11-th digit is a control digit calculated with another weighted modulo 11 variant over the first 9 digits combined with the 10-th control [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.datavaluetalk.com/wp-content/uploads/2010/02/Norwegianbirthnummer.jpg"><img class="alignleft size-medium wp-image-1333" src="http://www.datavaluetalk.com/wp-content/uploads/2010/02/Norwegianbirthnummer-300x177.jpg" alt="Norwegian Fødselsnummer examples" width="300" height="177" /></a></p>
<p>The <a href="http://no.wikipedia.org/wiki/F%C3%B8dselsnummer">Norwegian Fødselsnummer</a> (Birthnumber) is an 11-digit number with 2 control digits. The 10-th digit is a control digit calculated with a weighted modulo 11 variant over the first 9 digits. The 11-th digit is a control digit calculated with another weighted modulo 11 variant over the first 9 digits combined with the 10-th control digit.</p>
<p>As in other countries also this number is based on the<a href="http://www.datavaluetalk.com/2010/02/01/is-270368a172x-a-correct-finnish-henkilotunnus/" target="_blank"> date of birth</a>. The first 6 digits represent the birth date as “ddmmyy”. Problem with a 6-digit date is that you cannot identify the century – is a Fødselsnummer starting with 121009 someone born in 1909 or 2009? The Norwegian government has solved this by grouping the following 3 individual digits (individual number) in groups representing a certain era. If you are born between 1854-1899, then your individual number must be between 500 and 749, born between 1900-1999 then your number lies between 000 and 499, and for those born recently between 2000-2039 then your number lies between 500 and 999. With some exceptions for those with an individual number between 900 and 999.<span id="more-1332"></span></p>
<p><a href="http://www.datavaluetalk.com/2010/01/19/why-there-are-maximum-of-females-in-a-country/" target="_blank">Like in other countries</a> the odd individual numbers are given for males, the even for females.</p>
<p>Be aware that by validating national personal identification numbers, like the Fødselsnummer, that contain a date part you cannot rely only on the control digits. The Fødselsnummer “31046812355” is completely valid if we look to the 10th and 11th control digit – however the birth date April 31 in 1968 did never occur!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datavaluetalk.com/2010/02/15/f%c3%b8dselsnummer-crossing-centuries-in-norway/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>New Matching Engines go beyond apples and oranges</title>
		<link>http://www.datavaluetalk.com/2010/02/11/new-matching-engines-go-beyond-apples-and-oranges/</link>
		<comments>http://www.datavaluetalk.com/2010/02/11/new-matching-engines-go-beyond-apples-and-oranges/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 14:01:05 +0000</pubDate>
		<dc:creator>Winfried van Holland</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[apples and oranges]]></category>
		<category><![CDATA[atomic string comparison]]></category>
		<category><![CDATA[cultural differences]]></category>
		<category><![CDATA[information retrieval]]></category>
		<category><![CDATA[intelligent matching methods]]></category>
		<category><![CDATA[Lucene]]></category>

		<guid isPermaLink="false">http://www.datavaluetalk.com/?p=1323</guid>
		<description><![CDATA[
Professional matching engines are becoming more and more intelligent. Within Human Inference, we also see that our matching techniques are capable of using more and more intelligence, and needless to say that we incorporate and use this intelligence in our engines in order to adopt to the way that humans do their matching.
Traditional data quality or matching engines were based on atomic [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.datavaluetalk.com/wp-content/uploads/2010/02/beyond-apples-and-oranges2.jpg"><img class="alignleft size-medium wp-image-1328" src="http://www.datavaluetalk.com/wp-content/uploads/2010/02/beyond-apples-and-oranges2-300x215.jpg" alt="Beyond apples and oranges" width="300" height="215" /></a></p>
<p>Professional matching engines are becoming more and more intelligent. Within Human Inference, we also see that our matching techniques are capable of using more and more intelligence, and needless to say that we incorporate and use this intelligence in our engines in order to adopt to the way that humans do their matching.</p>
<p>Traditional data quality or matching engines were based on atomic string comparison functions like match-codes, phonetic comparison, Levenshtein string distance, n-gram comparisons or similar functions. These kinds of functions are relatively easy to implement and to use although a significant amount of plumbing is needed to get reasonable results. Open source projects like the<a href="http://lucene.apache.org/java/docs/" target="_blank"> Lucene search engine</a>, and variants, provide a solid and proven set of these functions. The drawback of these functions is that it’s not always clear for what purpose one needs to utilize a particular function. An even larger issue is the fact that these low-level DQ functions cannot distinguish between apples and oranges – you end up comparing family names with street names. We still see that,  for example BI vendors,  claim to provide data quality functionality, while they only provide these atomic comparisons.<span id="more-1323"></span></p>
<p>Within Human Inference we have been developing matching engines that look beyond these primary functions for years. Engines capable of identifying given names, surnames, family names, postal codes, titles,  initials, etc. The true benefit of this approach is that matching results are significantly higher, because you are comparing apples with apples and oranges with oranges. The glueing or plumbing in this approach to validate street or family names is completely under the hood for the data stewards. With a correct set of reference data, the right mix of atomic functions and – not the least – vivid domain knowledge, these matching engines are capable of quickly and adequately finding duplicates – beyond the ones that have simple typos.</p>
<p>The complexity in matching apples starts if you take into account the variants in apples, or to speak in Data Quality terminology, in case you take into account that per country or region people have more or less subtle differences in using names, streets, measurements and writing sets.</p>
<p>The moment you value these differences you also recognize new opportunities. You will notice that by looking at an apple, you get information on oranges. By looking at the name Белоусовa (Beloussowa), you might recognize a family name and that you’re dealing with a female. By looking to the number 681012-2355, you might recognize that this is a valid<a href="http://www.datavaluetalk.com/2010/01/19/why-there-are-maximum-of-females-in-a-country/" target="_blank"> Swedish personnummer</a>, and that the birth date of this male is October 12, 1968. By looking to an email like <a href="mailto:Winfried.vanHolland@humaninference.com">Winfried.vanHolland@humaninference.com</a> you might recognize a given name “Winfried”, that you’re dealing with a male, that he has surname “van Holland” and that he is working for a company called Human Inference, and I leave it up to you from which country he originates&#8230;. By retrieving additional information out of obvious information, the matching moves beyond the apples and oranges, and becomes easier, faster and more accurate.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datavaluetalk.com/2010/02/11/new-matching-engines-go-beyond-apples-and-oranges/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Is 270368A172X a correct Finnish Henkilötunnus?</title>
		<link>http://www.datavaluetalk.com/2010/02/01/is-270368a172x-a-correct-finnish-henkilotunnus/</link>
		<comments>http://www.datavaluetalk.com/2010/02/01/is-270368a172x-a-correct-finnish-henkilotunnus/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 16:10:46 +0000</pubDate>
		<dc:creator>Winfried van Holland</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Finland]]></category>
		<category><![CDATA[Human Inference]]></category>
		<category><![CDATA[National Identification Numbers]]></category>
		<category><![CDATA[personal identification number]]></category>

		<guid isPermaLink="false">http://www.datavaluetalk.com/?p=1311</guid>
		<description><![CDATA[
The Finnish national personal identification number is the Henkilötunnus, aka Hetu or Ht, has the following format &#8211; ddmmyyc999C. For details how to calculate the control character, I refer to the overview blog on National Identification Numbers.
Validating  the Hetu 270368A172X shows that it is indeed a correct number. The number 270368172 generates  indeed 29 for [...]]]></description>
			<content:encoded><![CDATA[<div class="mceTemp"><img class="alignleft size-full wp-image-1320" title="FinlandHetu270368A172X-150x150" src="http://www.datavaluetalk.com/wp-content/uploads/2010/02/FinlandHetu270368A172X-150x150.jpg" alt="FinlandHetu270368A172X-150x150" width="150" height="150" /></div>
<p>The Finnish national personal identification number is the<a title="Henkiklötonnus" href="http://fi.wikipedia.org/wiki/Henkil%C3%B6tunnus" target="_blank"> Henkilötunnus</a>, aka Hetu or Ht, has the following format &#8211; ddmmyyc999C. For details how to calculate the control character, I refer to the overview blog on <a href="http://www.datavaluetalk.com/2010/01/19/why-there-are-maximum-of-females-in-a-country/" target="_blank">National Identification Numbers</a>.</p>
<p>Validating  the Hetu 270368A172X shows that it is indeed a correct number. The number 270368172 generates  indeed 29 for the modulo 31 proof, represented by control character &#8220;X&#8221; in the checksum list. The number shows that this is the 86-th girl born on the 27th of March 2068.</p>
<p>The latter might is exactly the start for the discussion on validity. Althought the number itself is well formed, and passes all the automatic checks, dealing with this number in a data quality assessment will raise your digital eyebrow. In the data quality world we will nowadays say that this Hetu is a wrong Hetu, that it cannot be correct.</p>
<p>So always use a bit of human inference when dealing with finnish national personal identification numbers.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datavaluetalk.com/2010/02/01/is-270368a172x-a-correct-finnish-henkilotunnus/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Remarkable facts on Dutch National Personal Identification Number (Burgerservicenummer BSN)</title>
		<link>http://www.datavaluetalk.com/2010/01/19/remarkable-facts-on-dutch-national-personal-identification-number-burgerservicenummer-bsn/</link>
		<comments>http://www.datavaluetalk.com/2010/01/19/remarkable-facts-on-dutch-national-personal-identification-number-burgerservicenummer-bsn/#comments</comments>
		<pubDate>Tue, 19 Jan 2010 15:35:34 +0000</pubDate>
		<dc:creator>Winfried van Holland</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[11-proof]]></category>
		<category><![CDATA[Personal Identification Numbers]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.datavaluetalk.com/?p=1293</guid>
		<description><![CDATA[
The national personal identification number in the Netherlands is called the Burgerservicenummer (or abbreviated with BSN, introduced since november 2007). It is a 9-digit number where the number can be validated by a weighted 11-proof. Basically all the digits become a weighting factor and by calculating the sequential digits with their weight the final result [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-thumbnail wp-image-1307" title="bsn" src="http://www.datavaluetalk.com/wp-content/uploads/2010/01/bsn1-150x150.jpg" alt="bsn" width="150" height="150" /></p>
<p>The national <a href="http://en.wikipedia.org/wiki/Personal_identification_number" target="_blank">personal identification number</a> in the Netherlands is called the <a href="http://www.bprbzk.nl/BSN" target="_blank">Burgerservicenummer </a>(or abbreviated with BSN, introduced since november 2007). It is a 9-digit number where the number can be validated by a weighted 11-proof. Basically all the digits become a weighting factor and by calculating the sequential digits with their weight the final result must exactly be divisible by 11.</p>
<p>A nice effect of this weighted 11-proof is that there are at least 2 digits different between 2 individual numbers. You need to perform at least 2 changes to come from one number to another &#8211; it might be that there are 2 completely different digits (e.g., 1126827<strong>65</strong> and 1126827<strong>77</strong>) or the you need to swap one digit and change another (e.g., 4270965<strong>0</strong><span style="color: #ff0000">9</span> and 4270965<span style="color: #ff0000">1</span><strong>0</strong>).</p>
<p>Mathematically it might still be that there are two succeeding numbers like 4270961<strong>69</strong> and 4270961<strong>70</strong>, which still need 2 changes to come from the one to the other.<span id="more-1293"></span></p>
<p>This effect helps in preventing mistakes while typing these numbers, you need to make more than one mistake and some bad luck to get exactly a number that matches the proof.</p>
<p>For those who like statistics, there are exactly 90909090 possible combinations &#8211; which in itself is a nice number but doesn&#8217;t match the proof. The first possible number is 000000012 (assuming that 000000000 is not used), the last is 999999990.</p>
<p>For more on Personal Identification Numbers I refer to another summary blog on European numbers or to a handsome <a href="http://prezi.com/csnv3cynv4ai/" target="_blank">presentation</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.datavaluetalk.com/2010/01/19/remarkable-facts-on-dutch-national-personal-identification-number-burgerservicenummer-bsn/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Why there are maximum of (fe)males in a country</title>
		<link>http://www.datavaluetalk.com/2010/01/19/why-there-are-maximum-of-females-in-a-country/</link>
		<comments>http://www.datavaluetalk.com/2010/01/19/why-there-are-maximum-of-females-in-a-country/#comments</comments>
		<pubDate>Tue, 19 Jan 2010 13:38:24 +0000</pubDate>
		<dc:creator>Winfried van Holland</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[identification]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[privacy-sensitive]]></category>
		<category><![CDATA[social security number]]></category>
		<category><![CDATA[unique identification]]></category>

		<guid isPermaLink="false">http://www.datavaluetalk.com/?p=1288</guid>
		<description><![CDATA[Within Europe there is no such system as European Social Security Number or European Identification Number. A lot of countries have their own system, and other countries are struggling to get a system into place.
The struggle of some countries has to do with historical reasons and with privacy aspects. Unique identifiation is not always used in favour of [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone" src="http://4.bp.blogspot.com/_jQS2yW8CbuY/Sb43ZcL28WI/AAAAAAAAADg/cNBvLb2bq6o/s320/CartaoCidadao_f.jpg" alt="" width="320" height="207" />Within Europe there is no such system as European Social Security Number or European Identification Number. A lot of countries have their own system, and other countries are struggling to get a system into place.</p>
<p>The struggle of some countries has to do with historical reasons and with privacy aspects. Unique identifiation is not always used in favour of the community. And some of the used identification systems contain privacy-sensitive information, among others date of birth, gender and/or place of birth, where older systems might even contain religious or other privacy-senitive information.</p>
<p>A wide range of countries use the combination of date of birth, gender identification and the political region where you are born. In such a mechanism it is most common that part of the identification number is a 2-digit or 3-digit serial number to identify the unique male or female born on a specific date (or born on a specific month). Some countries provide odd serial numbers for male, and even for female. Bulgaria is the only one that wants &#8220;odd&#8221; females. Some countries like to divide on range (0-499 male, 500-999 female).  And some countries like Norway make nice combinations to include the century of birth or period of birth in the serial number.<span id="more-1288"></span></p>
<p>This &#8216;number&#8217; generation brings the effect that pretty soon you will encounter the maximum number of citizens that the system can handle on a specific day. Some systems run out of numbers if there are more than 500 males or females born on a day. The Denmark system encountered that situation in 2007, where due to immigration the population exceeded the system for January 1st 1965! The Denmark system (CPR-nummer)  has a 3-digit serial number where one of the digits is also the control digit (diminishing the possible numbers than from 500 to less than 50).</p>
<p>Remarkable to see what some countries are doing to solve the &#8216;century&#8217; issue, people with the same ID but born in the 19th, 20th or 21st century, they add 20 or 40 to the month. Same is true for foreigner identification, e.g. Sweden that is adding 60 to the day of birth. Or again Sweden that is adding 20 to the month to distinguish persons from organisations.</p>
<p>If you want to see the details on these systems you might watch <a href="http://prezi.com/csnv3cynv4ai/">http://prezi.com/csnv3cynv4ai/</a> or <a href="http://en.wikipedia.org/wiki/National_identification_number">http://en.wikipedia.org/wiki/National_identification_number</a>. Be prepared, definitely there have been PhDs around to invent these systems.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.datavaluetalk.com/2010/01/19/why-there-are-maximum-of-females-in-a-country/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
