#sequence-alignment #fasta #constant #count #sites #multiple #cases

yanked constant_sites

Compute the count of cases in constant sites in a (FASTA) multiple sequence alignment

0.1.0 Oct 21, 2019

#20 in #sites

MIT license

8KB
82 lines

count_constant_sites

Given a FASTA file with a multiple sequence alignment of nucleotides, this tool counts the sites in the alignment that are constant. The output is a line suitable for use in IQTREE's -fconst, thus 4 numbers with commas expressing the count of As, Cs, Gs and Ts.

A constant site is one where the entire column of the alignment is one nucleotide. This tool is not case sensitive. Only As, Cs, Ts and Gs are considered (i.e. gaps and ambiguous nucleotides are not considered).

TODO:

  • extend to work with protein alphabets

Dependencies

~2.6–3.5MB
~63K SLoC