Evaluating data quality issues from an industrial data set

Gernot Liebchen and Bheki Twala (Brunel, UK), Mark Stephens (EDS, UK)

OBJECTIVE – It is attempted to compare three noise handling methods.

METHOD – A large software management dataset is cleared of cases containing missing values and it will be subjected to three noise handling techniques, namely polishing, noise elimination and robust algorithms. Each technique will create in turn a list of singled out noisy cases, which will be examined by a metrics expert knowledgeable with the problem domain.