Identifying pattern in microarray expression series using algorithmic information theory

ORAL

Abstract

We introduce a method of detecting pattern in data series independent of the nature of the pattern. This is achieved by calculating a lower bound on the Algorithmic Information Content (AIC) of the data series, the exact value of the AIC being fundamentally uncomputable. This bound also provides us with a measure of the algorithmic compressibility. Data series which are highly compressible are more likely to result from simple underlying mechanisms than series which are incompressible. We show that the compression in bits is a universal currency by which we can order data series according to their significance, even if they are from different experiments or exhibit different kinds of pattern or noise. We test our method on microarray time series of yeast cell cycle and show that is very successful at blindly selecting genes identified by independent experimental studies, without making any assumptions about what kind of pattern these data series contain.

Authors

  • Sebastian Ahnert

    University of Cambridge, UK

  • Karen Willbrand

    Ecole des Mines de Paris, Fontainebleau, France

  • Francis Brown

    Ecole Normale Superieure, Paris, France

  • Thomas Fink

    Insitiut Curie, Paris, France