This page may be out of date. Submit any pending changes before refreshing this page.
Hide this message.

What is data science?

And what is it not?
58 Answers
Luis Otavio Martins

I just left an interview where they asked me the same question. After reading the other 41 answers, I will try to adress a simple and more correct one:

WHAT IS

  1. It is a little bit of a misnomer and a buzz word that media is using to describe everything. However, it’s good to have this dicussion to come into an agreement.
  2. The questions is about Data science. So I will not talk about Data Scientists. Go to What is a data scientist? if you are interested.
  3. The biggest error that I found in most of the answers was some sort of “Data Science is when you are dealing with Big Data, large ammounts of data”. That is not true, Data Science can be applied to a data set with one thousand lines, there is no problem with this.
  4. If we are goig to call as “science” we need to consider the Science and Scientific Method definition. According to this, Data Science is not only about the practical or empirical methods, it needs scientific foundations.
  5. No one talked about the difference between Data and Information.
    1. Data is a raw, unorganized set o things that need to be processed to have a meaning.
    2. Information is when data was processed, organized, structured or presented in a given context so as to make it useful
    3. Based on this, we would have Data science and Information science. Right now, people have a bias to talk about Data science including Information science.
  6. It was clearly being used in a lot of fields for the past years:
    1. Statistics/Mathematics
    2. Business analytics
    3. Market intelligence
    4. Strategic Consulting
    5. Many others…
    6. The craziest part is that you see professionals of these areas updating their resumes with something like “I worked with Data Science…”
  7. The creation of data science in a simple way. Two sides that were not totally connected, but with the new fast paced and technological world would have to merge together:
    1. Statistics/mathematics: formulate proper models to generate insights.
    2. Computer science: make the bridge between the models and the data in a feasible time to come with the result.
  8. Topics/tools that a person neeed to understand or have some knowledge when working with Data Science:
    1. Linear algebra
    2. Non-linear systems
    3. Analytical geometry
    4. Optimization
    5. Calculus
    6. Statistics
    7. Programming language (R, Python, SAS)
    8. Softwares: Excel, SPSS by IBM
    9. General platforms: Watson Anlytics by IBM, Azure Machine Learning, Google Cloud machine learning,
    10. Data visualizations: Power BI, Tableau, R/Python using plotly/ggplot
    11. Machine Learning (supervised, unsupervised and reinforcement learning)
    12. Big Data
    13. Big Data Frameworks (Hadoop and Spark)
    14. Hardware (CPU, GPU, TPU, FPGA, ASIC)
  9. One Picture Worth Ten Thousand Words. The Drew Conway’s Data Science Venn Diagram . The Substantive expertise (or Domain expertise) is the specific knowledge of the area that you are applying Data Science. To know more about the lack substantive expertise in data science: What's Missing in Data Science Talks - As Risky As It Gets

WHAT IS NOT

  1. Machine Learning is not a branch of Data science. Machine Learning originated from Artificial Intelligence. Data science is only using ML as a tool. The reason is that it produces amazing and autonomous results for specific tasks
  2. It’s not the salvation of companies that never measured anything and now want to get insights from their data. “Garbage in, garbage out” Data science will be as good as the data generated on the following years.
  3. Just present data using some Excel charts without any insight about the data.

Always upvote answers that you find useful. Everyone can be wrong so be respectful and polite.

Quora User
As this is a very open ended generic question, so would like to state a very broad answer for the same from my blog here: What is Data Science? by Pronojit Saha on Journey to planet Datum & Beyond

Data Science is the practice of:
  1. Asking questions (formulating hypothesis), answers to which solve known problems or unearth unknown solutions that in turn drive business value,
  2. Defining the data needed or working with an existing data set and employing tools (computer science based) to collect, store and explore such data generally in huge volume & variety (often more than 1 TB and 1000s of dimensions),
  3. Identifying the type of analysis to be done to get to the answers and performing such analysis by implementing various algorithms/tools (statistics based), often in a distributed and parallel architecture,
  4. Communicating the insights gathered from the analysis in the form of simple stories/visualizations/dashboards (the Data Product) that a non-data scientist can understand and build conversation out of it. (It should be kept in mind that a product can also be an piece of code that is internal to a company and is used by various departments. The presentation, maintenance, scalability, etc of the code are then the product features, which is often not practiced in many organizations)
  5. Building a higher level abstraction that does steps 2-3-4 in an autonomous way, analyzing & taking actions on new data as they are fed to the system.
Data scientist performs research and analysis on data and helps companies to improve business by predicting growth, trends and business insights based on huge amounts of data.

Armed with data and analytical results, a top-tier data scientist will then communicate informed conclusions and recommendations across an organizations leadership structure.

Successful big data scientists will be in high demand and will be able to earn very nice salaries. But in order to be successful, big data scientists need to have a wide range of skills that until now did not even fit into one department.

Learning how to become data scientist can be quite costly, with an average cost of $9,600 (according to Harvard Extension School). But if you know which skills employers are looking for you can find many free resources online. That is exactly what we did for you!

Below is the required skills set for becoming a data scientist with top 2-3 free resources to learn each skill online.

1. Python
Learn Python Programming From Scratch by Udemy
Learn to program in Python by CodeCademy
LearnPython.org interactive Python tutorial

2. Machine Learning
Machine learning online
Operational Intelligence and Machine Data with Splunk

3. R Language
R Basics – R Programming Language Introduction by Udemy
Introduction to R at DataCamp
Learn R at Code school

4. Big Data
Big Data University
Big Data and Hadoop Essentials by Udemy
Basic overview of Big Data Hadoopby- Udemy

5. Statistics
Statistics One by Coursera
Statistics and Probability
Probability & Statistics

6. Data Mining
Data Mining and Web Scraping: How to Convert Sites into Data by Udemy
Data Mining by Coursera

7. SQL
Interactive Online SQL Training for Beginners
Sachin Quickly Learns (SQL) – Structured Query Language by Udemy
SQL Tutorial by w3schools

8. Java
Learn Java: The Java Programming Tutorial For Beginners by Udemy
Learn Java – Free Interactive Java Tutorial
Learn Java Programming From Scratch – Udemy

For more information: How to become data scientist for free and from scratch