There are many questions at the beginning of each data science project. Do I need to train a machine learning model or do ETL operations suffice? Do I need a labelled data set? What if I do not have it? What to do in case of unevenly distributed classes? Or if training examples for one of the classes are completely missing? Is the procedure different for structured and unstructured data? When should I use time series analysis? Do I really need real-time execution in deployment? And probably many more questions like these.
While the general development of a data science project is relatively standard, following for example the CRISP-DM cycle, each project often needs some special customization — that special ingredient to adapt to the particular dataset, goals, constraints, domain knowledge, or even budget.
These are the slides to the webinar, Rosaria Silipo, author of "Practicing Data Science", answered some of the common questions we ask at the beginning of each data science project.
The webinar is available here: https://www.youtube.com/watch?v=YE02NMRkfEc