Incomplete analysis – finding patterns in noise

Kaiser Fung, author of Numbers Rule You World posted a blog entry about ‘Muzzling Data’ http://junkcharts.typepad.com/numbersruleyourworld/2012/03/we-sometimes-need-a-muzzle-.html

The entry talks about some automatic data analysis done by zip code, and how it was projecting the deviation of average lifespan for individuals in a zip code broken out by First Name. He goes on to show how and why this type of analysis is incomplete. Without a complete view of the data (i.e. what the population’s lifespan variability is overall), it is easy to find patterns in the noise of the data. He theorizes that this type of incomplete analysis might yield headlines such as:

“Your first name reduces your life expectancy!!”, or “Margaret, it’s time to become Elizabeth!”. And why not “James, if you want to live longer, become Elizabeth now!”

 The analyst needs to ensure that they are not identifying patterns in noise, due to an artifact of their methodology or incomplete analysis.

Advertisements

Leave a comment

Filed under Data, Systems Engineering

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s