Month: April 2012

The many faces of statistics/data science: Can’t we all just get along and learn from each other?

Two blog posts in the last 24 hours caught my attention. First was this post by Jeff Leek noting that there are many fields which are applied statistics by another name (and I’d add operations research to his list). The second is an excellent post on Cloudera’s blog on constructing case-control studies. It is generally excellent, but has this rather unfortunate (in my view) statement:

Analyzing a case-control study is a problem for a statistician. Constructing a case-control study is a problem for a data scientist.

First of all, this ignores what biostatisticians have been doing in collaboration with epidemiologists for decades. The design of a study, as any statistician understands, is just as, if not more, important than the analysis, and statisticians have been at the forefront of pushing good study design. Second, it shows a fundamental lack of understanding of the breadth of what statistics as a discipline encompasses. Third, this almost reiterates Jeff’s point about the different fields, considered different but essentially “applied statistics”. There seems to be a strong push to claim a new field as different and sexier than what has come before (an issue of branding and worth, perhaps?) without understanding what is already out there.

Statistics as a field has been guilty of this as well. The most obvious and wasteful consequence of this is “re-inventing the wheel”, rather than leveraging the power of other discoveries. Ownership of an idea is a powerful concept, but there must be the recognition that while translating a concept for a new audience is useful and extremely necessary, merely claiming ownership while willfully ignoring the developments by colleagues in another field is wasteful and disingenuous.

A recent discussion with a colleague further reiterated this point even within statistics. Some of the newer developments in a relatively new methodologic space are along the same lines of theoretical development in an older methodologic space. The new guys are coming up against the same brick walls as the earlier researchers, and there seems to be a lack of understanding among the new researchers of the path already travelled (since the keywords are different and not necessarily directly related, Google Scholar fails).

The bottom line here is the strong need for more cross-talk between disciplines, more collaboration among researchers, having greater understanding for the knowledge already out there, and more breadth in our own training and knowledge.