Martin Shepperd is Professor of Software Technology and
Modelling at Brunel University
London in the Department
of Computer Science. I am also a Fellow of the British
My research publications can be viewed (and many downloaded) from google scholar or from Researchgate.
I am member of the Brunel
Software Engineering Laboratory (BSEL). My research
interests include empirical software engineering and machine
learning. In 2014 in conjunction with Tracy Hall and David Bowes
we published a paper entitled "Researcher
bias: The use of machine learning in software defect prediction"
that describes the results of a meta-analysis of computational
experiments in software engineering that showed that research
group is a far more important explanatory variable than the choice
of algorithm under investigation. This triggered a response
from Tantithamthavorn, McIntosh, Hassan,and Matsumoto, "Comments on
'researcher bias: The use of machine learning in software defect
prediction' " in 2016 where they suggest the effect is less
strong than we propose due to problems of collinearity. We
have responded (in
a recently accepted rejoinder for TSE) indicating that
whilst we appreciate their interest and scientific discourse we
disagree with their arguments because (i) they use a small subset
of our data and (ii) collinearity for categorical variables with a
sparse matrix will almost invariably have some levels that are a
linear combination of others. In such cases the corresponding
coefficients are not estimated, but the remaining estimates are
not affected unlike in the case of collinearity for continuous
Recently I have been working with my former PhD student Dr Boyce Sigweni on realistic ways to validate software project prediction systems. Our initial ideas were published at EASE 2016 and we were very pleased to be awarded a prize for the best short paper. Our point is cross-validation techniques that fail to account for time are biased in that they consistently underestimate the true variance and measures of location for the model errors. Our approach is called Grow-One-At-a-Time (GOAT)!
Our work on replication and blind analysis with Dr Davide Fucci (Oulu) as the lead author recently received the Best Full Paper Award at ESEM 2016.
D. Fucci, G. Scanniello, S. Romano, M. Shepperd, B. Sigweni, F. Uyaguari, et al., "An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach," 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), Ciudad Real, Spain, 2016.Recently I've completed an analysis of imbalanced learners for dealing with highly imbalanced software defect data sets. This work was with Prof Qinbao Song and Yuchen Guo (from the Xi'an Jiaotong University). The paper "A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction" is under review, however a draft is available. Comments welcome.
Click here for details of 2017-18 FYPs.
For examples of problematic empirical research we can study the careers of Dr Johnny Researcher and his colleague Prof Suzie Important-Person incuding two of their recent publications. The first  shows some nice data trawling and statistical mumbo jumbo, whilst  is a classic work of flawed and under-powered experimental design. Researcher, J. and Important-Person, S., "An Empirical Study of How X47 is a Strong Determinant of Y", IEEE Transactions on Sophistry, Vol. 11, No. 4, December 2015
Email: [first name] DOT [surname] @brunel.ac.uk