The 80% data cleaning fallacy

December 12, 2023

It is common to hear that the real fight is data cleaning when developing a data product, because it takes about 80% of our time.

Regarding algorithm development it is true.

Nevertheless, about 50%-60% of the workload of a data product is on the shoulders of the IT team. Meaning, that 80% of the workload that we promise to our bosses to be the real struggle, is just the tip of the iceberg, a third (~35% at maximum) of the creation of a data product!

When the data part of the data product is finished, the worst part is still pending, to create the back end, the front end, the systems so that it can work properly in an automated way. And cross fingers everyone is aligned, otherwise your product vision will be lost in time, like tears into the rain (yeah, quoting Blade runner 1982).

Download this post in PDF to have it whenever you want

The download will start in a few seconds. If you have any problem, download following the button
Oops! Something went wrong while submitting the form.

Other post from Shimoku

Create a Free Account
Get inspired by our catalog of products
Would you like to see one of our products live? Create your free account and explore all the potential.
Get started for free