10 Reasons Data Science projects fail … and how you can Succeed

Now is the time to invest in data science, in fact most companies are using big data analytics. There are great visions of data science to address risk, churn, customer satisfaction. Yet businesses face a known data science project failure rate. Failing = when a project isn’t completed within budget or on schedule and fails to deliver what it set out to. Why is that? Here is a round-up of reasons from industry experts. Read on for the pitfalls …. and a way to avoid them.

Stuck on you

A root cause of failure is the difficulty of grafting modern big data practices onto existing infrastructure and into company cultures that are ill-prepared to embrace big data, says Matt Asay.


Let the force be with you

Ronda Swaney finds that a sea change in company mindset is the only way to transition to a data-driven business.Trusting data to guide decision-making is an entirely new way to think.

Instead, business leaders surveyed trust their gut and what experience has taught them, not their data.


Can’t touch that

Out of reach data impedes industries trying to capture value from data, according to a McKinsey Global Institute report. This data is siloed out of reach in legacy systems and within different branches of a company or government agency


Ain’t nobody got time for that

“Businesses will need one million data scientists by 2018” declares KD Nuggets, referring to a shortage of US workers with analytical skills and data management and interpretation capabilities. “This is America’s Hottest Job” brags a Bloomberg article on a recent surge in job listings and salaries for data scientists, although the URL brands it a  “demand for data geeks.”


Off the mark

Mired in muddy data, spending 80% of their time finding, cleaning and reorganizing data, data scientists compromise to meet deadlines and settle for “good enough,” rather than optimal results. Armand Ruiz: “Hasty decisions during model development can lead to widely different outputs.”


Missed the bus

The flip side of investing time massaging data is that key elements of the pipeline often are given short shrift. Ganes Kesari says that data wrangling, exploratory analysis, imputation and engineering are all steps that often get skipped to the detriment of machine learning projects. In a project conducted by the Cork University Business school, executives found that only 3% of their enterprise data met basic quality standards, and the cost of working with flawed data is staggering.


One by one

Data scientists usually focus on one model at a time: “If something goes wrong, they are forced to start all over again,” rues Ruiz.


Out of touch

When data scientists strive for perfection, creating what Daniel Carroll calls a black hole for investment producing a “pristine model” with “nice scores and beautiful underlying data,” they could be omitting the obvious: they discover that the model doesn’t address nor solve the customer’s actual problem.


Hold my hand

When the data is passed over to the data science team, it’s often a one-way hand-off. Without continuous iteration and dialogue with business professionals, machine learning models remain in isolation. In fact, data science projects are doomed from the start when they are treated as a technical, not business, initiative and expected to pinpoint earth-shattering insights on their own.


Getting there

It’s one thing to pull off a data science project. It’s another to bring it to fruition as a contributor to the bottom line. Nick Heudecker: “Organizations can succeed but they need a plan to get to production. Most don’t plan and treat big data as technology retail therapy.”

The solutions are straightforward:

Gather the best of the best: Draw on the best complement of models to form an ensemble. Apply the same principle to people: Ensure that the data science team has the broadest coverage you can muster, suggests Eric Luellen.

Pay attention to the entire pipeline. Bad data adversely affects work quality and insights into how to improve. Steps worth attention include data cleaning at multiple steps along the pipeline, imputation, balancing, feature engineering, feature stacking, embedding and selecting the right features.

Take the load off. If infrastructure maintenance is not your priority, let the provisioning, load balancing, high availability and network management be managed securely by a machine learning provider that can.

Look out for the unusual. When data scientists are encouraged to find what’s out of place in the data, what surprises them, then they might just light upon an anomaly worth following up on.

Keep your business goals at the forefront of all of your machine learning efforts. Build models, put them in front of business users and keep iterating. Involve business users in defining the problem and reviewing the analysis. Results are the best when the business side can use them to improve performance.  

Leave a Reply