The Footprint Sorting Problem
Claudia Fried, Wim Hordijk, Sonja J. Prohaska, Claus R. Stadler, and Peter F. Stadler
Journal of Chemimal Information and Computer Sciences 44, 332-338, 2004.
Phylogenetic footprints are short pieces of non-coding DNA sequence in the vicinity of a gene that are conserved between evolutionary distant species. A seemingly simple problem is to sort footprints in their order along the genomes. It is complicated by the fact that not all footprints are co-linear: they may cross each other. The problem thus becomes the identification of the crossing footprints, the sorting of the remaining co-linear cliques, and finally the insertion of the non-colinear ones at ``reasonable'' positions. We show that solving the footprint sorting problem requires the solution of the ``Minimum Weight Vertex Feedback Set Problem'', which is known to be NP-complete and APX-hard. Nevertheless good approximations can be obtained for datasets of interest. The remaining steps of the sorting process are straight forward: computation of the transitive closure of an acyclic graph, linear extension of the resulting partial order, and finally sorting w.r.t. the linear extension. Alternatively, the footprint sorting problem can be rephrased as a combinatorial optimization problem for which approximate solutions can be obtained by means of general purpose heuristics. Footprint sortings obtained with different methods can be compared using a version of multiple sequence alignment that allows the identification of unambiguously ordered sub-lists.
Due to an unfortunate copyright policy by the publisher, it is not possible to provide an electronic copy here... :-(