Statistical considerations underpinning an alignment-free sequence comparison method

Jing, Junmei, Burden, Conrad J., Forêt, Sylvain, and Wilson, Susan R. (2010) Statistical considerations underpinning an alignment-free sequence comparison method. Journal of the Korean Statistical Society, 39 (3). pp. 325-335.

[img]PDF (Published Version) - Repository staff only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
827Kb

DOI: 10.1016/j.jkss.2010.02.009

View at Publisher Website: http://dx.doi.org/10.1016/j.jkss.2010.02...

Abstract

The D2 statistic is defined as the number of word matches of prespecified length k, with up to t mismatches, shared between two given sequences. This statistic finds its application in alignment-free comparisons of biological sequences. It has two main advantages over alignment-based methods for nucleotide and amino-acid sequence comparisons, such as BLAST (basic local alignment search tool). These are (i) D2 does not assume that homologous segments are contiguous, and (ii) the algorithm is computationally extremely fast, the runtime being proportional to the size of the sequences in the case of exact matches. This review article summarises results to date on determining the distributional properties of the D2 statistic for a range of biologically relevant parameters, describes existing applications of the method, and outlines future research directions.

ID Code:17014
Item Type:Article (Refereed Research - C1)
Keywords:Sequence comparison; k-word match; D2 distribution; D2 statistics
FoR Codes:01 MATHEMATICAL SCIENCES > 0104 Statistics > 010402 Biostatistics @ 100%
SEO Codes:97 EXPANDING KNOWLEDGE > 970101 Expanding Knowledge in the Mathematical Sciences @ 100%
Deposited On:15 May 2011 18:37
Last Modified:07 May 2013 01:35
Downloads:Total: 2
Last 12 Months: 0
Statistics:More Statistics
Citation Counts with External Providers:Web of Science: 0

Repository Staff Only: item control page