Applying Wave Processing Techniques to Clustering of Gene Expressions
Paul D. O’Neill1, George D. Magoulas2, Xiaohui Liu1
1 School of Information Systems, Computing and Maths
Brunel University, Uxbridge,
Middlesex, UB8 3PH. U.K.
2 School of Computer Science and Information Systems
Birkbeck College, University of London
Malet Street, London WC1E 7HX, UK
ABSTRACT
This paper examines the current process of clustering gene expression time series data and proposes a novel application of filtering techniques with the intention of reducing the noise that is commonly found in this type of data. Currently most of the noise reduction that is performed on gene expression data is restricted to just individual points of expression such as the removal of background noise. This paper proposes that multiple samples of each gene can be treated as a waveform and therefore standard wave smoothing techniques such as a moving average or Fourier transform filtering can improve the quality of the data. This hypothesis has been tested on a synthetic, Human Herpesvirus 8 and Yeast cell cycle gene expression experiments. The paper illustrates that the use of these techniques generally improves results of clustering the dataset. This is illustrated by contrasting the quality of the clusters generated by k-means, partitioning around medoids and hierarchical clustering algorithms. These improvements are demonstrated using techniques including homogeneity, separation, and a weighted-kappa based metric. The clustering results are also verified biologically by contrasting the effect filtering has on common proximity metrics used by clustering algorithms and then verified against domain knowledge.
KEYWORDS: Gene Expression, Clustering, Digital Filtering, Pre-Processing, Time Series.