Applying Wave Processing Techniques to Clustering of Gene Expressions

 

Paul D. O’Neill1, George D. Magoulas2, Xiaohui Liu1

1 School of Information Systems, Computing and Maths

Brunel University, Uxbridge,

Middlesex, UB8 3PH. U.K.

 

2 School of Computer Science and Information Systems

Birkbeck College, University of London

Malet Street, London WC1E 7HX, UK

 

 

  

ABSTRACT

This paper examines the current process of clustering gene expression time series data and proposes a novel application of filtering techniques with the intention of reducing the noise that is commonly found in this type of data. Currently most of the noise reduction that is performed on gene expression data is restricted to just individual points of expression such as the removal of background noise. This paper proposes that multiple samples of each gene can be treated as a waveform and therefore standard wave smoothing techniques such as a moving average or Fourier transform filtering can improve the quality of the data. This hypothesis has been tested on a synthetic, Human Herpesvirus 8 and Yeast cell cycle gene expression experiments. The paper illustrates that the use of these techniques generally improves results of clustering the dataset. This is illustrated by contrasting the quality of the clusters generated by k-means, partitioning around medoids and hierarchical clustering algorithms. These improvements are demonstrated using techniques including homogeneity, separation, and a weighted-kappa based metric. The clustering results are also verified biologically by contrasting the effect filtering has on common proximity metrics used by clustering algorithms and then verified against domain knowledge.

KEYWORDS: Gene Expression, Clustering, Digital Filtering, Pre-Processing, Time Series.