Day 1: Data Science Cohort
/ 3 min read
Table of Contents
What is Data Science
To simply put in my understanding it’s basically collecting Data -> Modeling -> Forecasting Conclusion.
That is pretty much it. The whole pipeline from collecting Data to Forecasting is long and hard but an interesting one for sure.
Going a little Deeper
Here is the average pipeline / workflow will look like.
Business Understanding
We first need to understand the domain of the business require us to do. If we are not clear about what the business does or what is expects from what type of data we won’t be able to move or predict meaningful conclusion
Data Collection
We collect data as per the requirement, this data could be from the databases or pdf files or plain text files anything of value.
Data Understanding
Once we collected the data we need to understand on what are the features, column, missing values, importance / inference or feature is. (A feature might be a column in a data base like user Id or user balance)
EDA - Exploratory Data Analysis / Data Processing
Taking Raw Data -> Quality Data, maximum time is spent on this since getting quality data is the most important aspect of data science
Data Modeling
Using algorithm on the data we collected to get the the desired output.
Data Evaluation
Last Step in many companies it is evaluating the output of the data is satisfactory or not. If the data is not satisfactory we have to go back to EDA -> Data Modeling and loop until we have something satisfactory.
Deployment
Not sure do we deploy the model like open ai model or something so that others can use. This to my understand should require frontend development wether it be web dev or native app.
Things to Learn
Keep in mind that there are many roles in this field so there might be some difference.
-
Maths
- Linear Algebra
- Calculus
- Statistics
- Probability
-
Programming
- Specially python even though we can use any language the amount of community support and libraries in python are massive and is easy enough to get work fast enough.
-
ML (Machine Learning)
- Mainly deals with numeric data set
-
DL (Deep Learning)
- Deals with Image based sets
-
NLP (Natural Language Processing)
- Deals with textual data sets
-
Generative AI