skip to content
Edernal Blog
Table of Contents

What is Data Science

To simply put in my understanding it’s basically collecting Data -> Modeling -> Forecasting Conclusion.

That is pretty much it. The whole pipeline from collecting Data to Forecasting is long and hard but an interesting one for sure.

Going a little Deeper

Here is the average pipeline / workflow will look like.

Business Understanding

We first need to understand the domain of the business require us to do. If we are not clear about what the business does or what is expects from what type of data we won’t be able to move or predict meaningful conclusion

Data Collection

We collect data as per the requirement, this data could be from the databases or pdf files or plain text files anything of value.

Data Understanding

Once we collected the data we need to understand on what are the features, column, missing values, importance / inference or feature is. (A feature might be a column in a data base like user Id or user balance)

EDA - Exploratory Data Analysis / Data Processing

Taking Raw Data -> Quality Data, maximum time is spent on this since getting quality data is the most important aspect of data science

Data Modeling

Using algorithm on the data we collected to get the the desired output.

Data Evaluation

Last Step in many companies it is evaluating the output of the data is satisfactory or not. If the data is not satisfactory we have to go back to EDA -> Data Modeling and loop until we have something satisfactory.

Deployment

Not sure do we deploy the model like open ai model or something so that others can use. This to my understand should require frontend development wether it be web dev or native app.

Things to Learn

Keep in mind that there are many roles in this field so there might be some difference.

  • Maths

    • Linear Algebra
    • Calculus
    • Statistics
    • Probability
  • Programming

    • Specially python even though we can use any language the amount of community support and libraries in python are massive and is easy enough to get work fast enough.
  • ML (Machine Learning)

    • Mainly deals with numeric data set
  • DL (Deep Learning)

    • Deals with Image based sets
  • NLP (Natural Language Processing)

    • Deals with textual data sets
  • Generative AI