Solved – a data scientist

careersdefinitionterminology

Having recently graduated from my PhD program in statistics, I had for the last couple of months began searching for work in the field of statistics. Almost every company I considered had a job posting with a job title of "Data Scientist". In fact, it felt like long gone were the days of seeing job titles of Statistical Scientist or Statistician. Had being a data scientist really replaced what being a statistician was or were the titles synonymous I wondered?

Well, most of the qualifications for the jobs felt like things that would qualify under the title of statistician. Most jobs wanted a PhD in statistics ($\checkmark$), most required understanding experimental design ($\checkmark$), linear regression and anova ($\checkmark$), generalized linear models ($\checkmark$), and other multivariate methods such as PCA ($\checkmark$), as well as knowledge in a statistical computing environment such as R or SAS ($\checkmark$). Sounds like a data scientist is really just a code name for statistician.

However, every interview I went to started with the question: "So are you familiar with machine learning algorithms?" More often than not, I found myself having to try and answer questions about big data, high performance computing, and topics on neural networks, CART, support vector machines, boosting trees, unsupervised models, etc. Sure I convinced myself that these were all statistical questions at heart, but at the end of every interview I couldn't help but leave feeling like I knew less and less about what a data scientist is.

I am a statistician, but am I a data scientist? I work on scientific problems so I must be a scientist! And also I work with data, so I must be a data scientist! And according to Wikipedia, most academics would agree with me (https://en.wikipedia.org/wiki/Data_science, etc. )

Although use of the term "data science" has exploded in business
environments, many academics and journalists see no distinction
between data science and statistics.

But if I am going on all these job interviews for a data scientist position, why does it feel like they are never asking me statistical questions?

Well after my last interview I did want any good scientist would do and I sought out data to solve this problem (hey, I am a data scientist after all). However, after many countless Google searches later, I ended up right where I started feeling as if I was once again grappling with the definition of what a data scientist was. I didn't know what a data scientist was exactly since there was so many definitions of it, (http://blog.udacity.com/2014/11/data-science-job-skills.html, http://www-01.ibm.com/software/data/infosphere/data-scientist/) but it seemed like everyone was telling me I wanted to be one:

Well at the end of the day, what I figured out was "what is a data scientist" is a very hard question to answer. Heck, there were two entire months in Amstat where they devoted time to trying to answer this question:

Well for now, I have to be a sexy statistician to be a data scientist but hopefully the cross validated community might be able to shed some light and help me understand what it means to be a data scientist. Aren't all statisticians data scientists?


(Edit/Update)

I thought this might spice up the conversation. I just received an email from the American Statistical Association about a job positing with Microsoft looking for a Data Scientist. Here is the link: Data Scientist Position. I think this is interesting because the role of the position hits on a lot of specific traits we have been talking about, but I think lots of them require a very rigorous background in statistics, as well as contradicting many of the answers posted below. In case the link goes dead, here are the qualities Microsoft seeks in a data scientist:

Core Job Requirements and Skills:

Business Domain Experience using Analytics

  • Must have experience across several relevant business domains in the utilization of critical thinking skills to conceptualize complex business problems and their solutions using advanced analytics in large scale real-world business data sets
  • The candidate must be able to independently run analytic projects and help our internal clients understand the findings and translate them into action to benefit their business.

Predictive Modeling

  • Experience across industries in predictive modeling
  • Business problem definition and conceptual modeling with the client to elicit important relationships and to define the system scope

Statistics/Econometrics

  • Exploratory data analytics for continuous and categorical data
  • Specification and estimation of structural model equations for enterprise and consumer behavior, production cost, factor demand, discrete choice, and other technology relationships as needed
  • Advanced statistical techniques to analyze continuous and categorical data
  • Time series analysis and implementation of forecasting models
  • Knowledge and experience in working with multiple variables problems
  • Ability to assess model correctness and conduct diagnostic tests
  • Capability to interpret statistics or economic models
  • Knowledge and experience in building discrete event simulation, and dynamic simulation models

Data Management

  • Familiarity with use of T-SQL and analytics for data transformation and the application of exploratory data analysis techniques for very large real-world data sets
  • Attention to data integrity including data redundancy, data accuracy, abnormal or extreme values, data interactions and missing values.

Communication and Collaboration Skills

  • Work independently and able to work with a virtual project team that will research innovative solutions to challenging business problems
  • Collaborate with partners, apply critical thinking skills, and drive analytic projects end-to-end
  • Superior communication skills, both verbal and written
  • Visualization of analytic results in a form that is consumable by a diverse set of stakeholders

Software Packages

  • Advanced Statistical/Econometric software packages: Python, R, JMP, SAS, Eviews, SAS Enterprise Miner
  • Data exploration, visualization, and management: T-SQL, Excel, PowerBI, and equivalent tools

Qualifications:

  • Minimum 5+ years of related experience required
  • Post graduate degree in quantitative field is desirable.

Best Answer

There are a few humorous definitions which were not yet given:

Data Scientist: Someone who does statistics on a Mac.

I like this one, as it plays nicely on the more-hype-than-substance angle.

Data Scientist: A Statistician who lives in San Francisco.

Similarly, this riffs on the West Coast flavour of all this.

Personally, I find the discussion (in general, and here) somewhat boring and repetitive. When I was thinking about what I wanted to---maybe a quarter century or longer ago---I aimed for quantitative analyst. That is still what I do (and love!) and it mostly overlaps and covers what was given here in various answers.

(Note: There is an older source for quote two but I can't find it right now.)

Related Question