“Poor data quality is enemy number one to the widespread, profitable use of machine learning. “
Data is absolutely key to the success of any machine learning process. If there is bad data in the process, the bad effects will be seen twice- first in the historical data used to train the predictive model and second in the new data used by that model to make future decisions. The recipe for a strong predictive model is accurate, quality data and having the right data. The right data for the model is lots of unbiased data, over the entire range of inputs for which one aims to develop the predictive model. Most data quality work focuses on one criterion or the other, but for machine learning, you must work on both simultaneously.