Incomplete analysis – finding patterns in noise

Kaiser Fung, author of Numbers Rule You World posted a blog entry about ‘Muzzling Data’

The entry talks about some automatic data analysis done by zip code, and how it was projecting the deviation of average lifespan for individuals in a zip code broken out by First Name. He goes on to show how and why this type of analysis is incomplete. Without a complete view of the data (i.e. what the population’s lifespan variability is overall), it is easy to find patterns in the noise of the data. He theorizes that this type of incomplete analysis might yield headlines such as:

“Your first name reduces your life expectancy!!”, or “Margaret, it’s time to become Elizabeth!”. And why not “James, if you want to live longer, become Elizabeth now!”

 The analyst needs to ensure that they are not identifying patterns in noise, due to an artifact of their methodology or incomplete analysis.


Leave a comment

Filed under Data, Systems Engineering

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s