Incomplete analysis – finding patterns in noise

Kaiser Fung, author of Numbers Rule You World posted a blog entry about ‘Muzzling Data’

The entry talks about some automatic data analysis done by zip code, and how it was projecting the deviation of average lifespan for individuals in a zip code broken out by First Name. He goes on to show how and why this type of analysis is incomplete. Without a complete view of the data (i.e. what the population’s lifespan variability is overall), it is easy to find patterns in the noise of the data. He theorizes that this type of incomplete analysis might yield headlines such as:

“Your first name reduces your life expectancy!!”, or “Margaret, it’s time to become Elizabeth!”. And why not “James, if you want to live longer, become Elizabeth now!”

 The analyst needs to ensure that they are not identifying patterns in noise, due to an artifact of their methodology or incomplete analysis.


Filed under Data, Systems Engineering

