Category Archives: Uncategorized

Data Visualization Guidance Compiled

Very helpful quick-reference for articles from Nature Methods about creating visualizations.


Leave a comment

Filed under Uncategorized

Personal Annual Reports

I had never seen this site before, but I came across it while watching something about Information Visualization:

I guess I should probably know Nicholas Felton, from his high-profile “about me”:

He is the co-founder of, and currently a member of the product design team at Facebook. His work has been profiled in publications including the Wall Street Journal, Wired and Good Magazine and has been recognized as one of the 50 most influential designers in America by Fast Company.

Anyway, he has been producing ‘Personal Annual Reports’ which reflect each year’s activities.  I haven’t dug too deeply, but the struck me as quite interesting and worth a deeper dive.

Leave a comment

Filed under Data Visualization, Uncategorized

Open data sources on the web

A colleague pointed me at this post, as we are working together on a project related to open data sources:

Many of the data sources I had already heard/known about, but a few new ones which I hadn’t.  For our project, we have been looking at the OpenNYC data sets, which has some really interesting data domains contained within it.  Hopefully we can publish our analysis of the open data sets and the obstacles for integration in the near future.

(Yet another data source list here:

Leave a comment

Filed under Uncategorized

Polyglot Persistence and the breaking of the ‘Shared Database Integration’ model

A colleague sent me this link the other day, and I appreciate the author’s point of view.  In the discussion that ensued, we started talking about the topics a bit more in-depth.  One item that came up is whether, in this polyglot world, every application would need to track every other’s use of every data item.  My response was:

While you don’t need every application to know every other application’s use of every data item, you do need (I contend) a unified (mental) model of the business constraints/requirements of the data items which are shared (or related/referenced).  This is a gap in the way the original author was describing this polyglot persistence (IMHO) – he seems to leave that as an ‘exercise for the reader’.

Contrived example:

If you have two medical insurance applications “Registration” and “Reimbursement”, then they need to agree on the higher order (conceptual and/or logical) data model against which both applications would.  For example, does Reimbursement make a check out to the patient, or is it the person who heads the household?  If Registration doesn’t have a concept of ‘head of household’, then how would Reimbursement be able to implement that?

The abstraction capability of a web service _does_ enable applications to conceal the nitnats of how exactly it stores its data (and so does SQL), but there has to be some exposed (and agreed-upon) data model.

I contend that this agreement at the conceptual/logical level is both the reason that the ‘Shared Database Integration’ has been so favored, and also the reason people rail against it and claim that the model is too monolithic and slow to change.  I have rarely encountered an organization that is cognizant of (let alone, effectively express) the data relationships in their business.  This lack of understanding (I claim) is the root cause to the glacial speed at which data models typically evolve.

Leave a comment

Filed under Uncategorized

ORACLE and Big Data

ORACLEA colleague pointed me at these two resources from ORACLE Openworld, recently held, as we are discussing Big Data with our clients.

Oracle Openworld

The link to the Big Data Keynote OpenWorld presentation which talked about Oracle’s jump into Big Data.  Check out the link below (it runs for 60 minutes but covers a lot of material with a real use-case example).

 Below is additional information related to Oracle’s Big Data Solutions, including an Oracle Loader for Hadoop, Oracle NoSQL databases, and Oracle ‘R’ Enterprise (an open source Statistical and graphics language).

Leave a comment

Filed under Uncategorized

Rebranding as Data Scientists

In a previous post The Data Scientist, I talked about the term and where it fits into the current paradigm.  The topic arose again this week in a post from Kaiser Fung, with an amusing twist — rebranding.

“You have to give it to the computer scientists. They are like the branding agencies of the engineering world. Everything they touch turns to PR gold. Steve Jobs, of course, was their standard bearer, with his infamous “reality distortion field”. Now, they are invading the statistician’s turf, and have already re-branded us as “data scientists”. MIT Technology Review noted this event recently”

I am amused, as Kaiser is, on how rebranding can hype ideas/terms/jobs/technology…  Here is my take:

I agree with the MIT article that it is not so much that ‘data scientists’ do anything differently than statisticians, in terms of their techniques.

 However, there is a clear gap from the ‘stats folks’ to the ‘business folks’.  One group speaks math, the other speaks English.  This is the void which I think needs to be filled.  My own personal (and COMPLETELY biased) mental model of ‘data scientist’ is the cross between statistics, data management, and system engineering.  The system engineering (a systemic viewpoint of SE, not a systematic viewpoint of SE) is the key to bridge the void.

This relates to the Susan Holmes statement (that >80% of a statisticians time is spent preparing the data ).  I would contend that it _should_ be more like 40% prepping and applying stats, and 60% describing/conveying what it means.


Leave a comment

Filed under Uncategorized