RMAP

The RMAP short read mapping algorithm

RMAP is aimed to map accurately reads from next-generation sequencing technology. RMAP can map reads with or without error probability information (quality scores) and supports paired-end reads or bisulfite-treated reads mapping. There are no limitations on read widths or number of mismatches.

Click to download RMAP-2.1 or view  RMAP source code on Github.

System requirements

64-bit machine and GCC version ≥ 4.1 (to support TR1)

Install

To install rmap, download the compressed archive and unpack it:

$ tar -jxvf rmap-2.1.tar.bz2

Change directories into the unpacked source directory, and type

$ make install

Quick usage guide

Here are some examples showing how to run RMAP. Complete parameter lists can be found by typing the program name with -help in the shell. More details are described in the RMAP manual.

To map next-generation sequencing reads to a reference genome. Use -o to specify the output filename (BED format); use -c to specify the target file or the file directory that contains chromosome sequence files (FASTA format). The last parameter is a FASTA/FASTQ file that contains read sequences. Additionally, you can add -v to show the mapping progresses. Please note, each read can only occupy one line; meaning RMAP will stop and show an error message if read sequences are span across several lines.

$ rmap -o mapped_locations.bed -c chromosomes_dir reads.fa

Or

$ rmap -o mapped_locations.bed -c chromosomes_dir reads.fq

To indicate the number of allowed mismatches (-m, default: 10) in the mapping and to specify seed structures, seed number (-S, default: 3) and seed weight (-h, default: 11).

$ rmap -S 4 -h 8 -m 20 -o mapped_locations.bed -c chromosomes_dir reads.fa

To output ambiguously mapped reads. The amb_mapped.txt file only contains read names. By default, reads that are mapped at two or more locations will be considered as ambiguously mapped reads. One can add -M x in the command, and reads that are mapped more than x times will be reported in the amb_mapped.txt file; reads that are mapped less than x times will be reported in the file with every mapped locations.

$ rmap -a amb_mapped.txt -M 10 -o mapped_locations.bed -c chromosomes_dir reads.fa

To utilize full quality score information (PRB file from Illumina/Solexa pipeline: four quality scores for one nucleotide), use -p prb_filename.

$ rmap -p reads.prb -o mapped_locations.bed -c chromosomes_dir reads.fa

With quality score information (input file must be FASTQ or PRB file), one can also use wildcard matching method (-W) with or without a user-defined cutoff or weight-matrix matching method (-Q).

$ rmap -W -o mapped_locations.bed -c chromosomes_dir reads.fq
$ rmap -P 10 -o mapped_locations.bed -c chromosomes_dir reads.fq
$ rmap -Q -p reads.prb -o mapped_locations.bed -c chromosomes_dir reads.fa

To map paired-end reads and to specify the minimal and maximal separation between ends. The default values for -min-sep and -max-sep are 0 and 200, respectively. Please note, there should be only one input file and it should contain both ends. Two ends are concatenated into one read sequence, i.e., reads of ends width 36nt should be 72nt in the pe_reads.fa file.

$ rmappe -min-sep 200 -max-sep 600 -o mapped_pe_locations.bed -c chromosomes_dir pe_reads.fa

To map bisulfite-treated reads (there is no need to convert Cs to Ts in reads or the reference genome). Please note rmapbs can only map single-end bisulfite-treated reads.

$ rmapbs -o mapped_bs_locations.bed -c chromosomes_dir bs_reads.fa

Citations

Smith AD, Chung WY, Hodges E, Kendall J, Hannon G, Hicks J, Xuan Z and Zhang MQ (2009) Updates to the RMAP short-read mapping software. Bioinformatics 25(21):2841-2842. [PDF]

A.D. Smith, Z. Xuan, and M. Q. Zhang. Using quality scores and longer reads improves accuracy of solexa read mapping. BMC Bioinformatics, 9:128, 2008. [PDF]