In order to run MacSyFinder on your favorite dataset as soon as you have installed it, you can simply follow the next steps:
Type: “macsyfinder -h” to see all options available. All command-line options are described in the Command-line options section.
On a “metagenomic” dataset for example:
“macsyfinder --db-type unordered --sequence-db metagenome.fasta all” will detect all systems modelled in .xml files placed in the default definition folder in a metagenomic dataset.
“macsyfinder --db-type unordered --sequence-db metagenome.fasta -d mydefinitions/ all” will detect all systems modelled in .xml files placed in the “mydefinitions” folder.
On a completely assembled genome (where the gene order is known, and is relevant for systems detection):
“macsyfinder --db-type ordered-replicon --sequence-db mygenome.fasta -d mydefinitions/ SystemA SystemB” will detect the systems “SystemA” and “SystemB” in a complete genome from “SystemA.xml” and “SystemB.xml” definition files placed in the folder “mydefinitions”.
See Input dataset for more on input datasets.
Note
Systems have to be spelled in a case-sensitive way to run their detection from the command-line. The name of the system corresponds to the suffix defined for xml files (.xml by default), for example “toto” for a system defined in “toto.xml”.
The “all” keyword allows to detect all systems available in the definition folder in a single run. See the Command-line options.
We included a test dataset in the MacSyFinder package. By default, it will be installed in /share/macsyfinder or /usr/share/macsyfinder. But it can be located elsewhere if it was specified during installation.
This dataset consists in the detection of CRISPR-Cas SubTypes with the definitions in the /share/macsyfinder/DEF folder, using the profiles in the /share/macsyfinder/profiles folder. This classification was previously described in Makarova et al. 2011, and the profiles are from the TIGRFAM database (release 13 of August 15 2012) and some of them were specifically designed for CRISPR-Cas classification (Haft et. al, 2005). The definitions are detailed in the MacSyFinder’s paper.
They were concatenated in a single fasta file, following the “gembase” format proposed here, and thus MacSyfinder will treat the three different replicons separately for systems inference.
To run the detection and classification of all subtypes, type:
"macsyfinder --db-type gembase --sequence-db
/share/macsyfinder/sequence_data/datatest_gembase.fasta all"
To run the detection of the Type-IE subtype only, type:
"macsyfinder --db-type gembase --sequence-db
/share/macsyfinder/sequence_data/datatest_gembase.fasta CAS-TypeIE"
A sample topology file is included /share/macsyfinder/sequence_data/datatest_gembase.topology, and follows the convention in here. It allows to specify a different topology “linear” or “circular” for each replicon in the “gembase” format. Otherwise, by default the topology is set to “circular”. It can also be specified in the commmand-line (see the Command-line options).
To run the detection using the topology file, type:
"macsyfinder --db-type gembase --sequence-db
/share/macsyfinder/sequence_data/datatest_gembase.fasta
--topology-file /share/macsyfinder/sequence_data/datatest_gembase.topology all"
To have an idea of what should be detected with the above test dataset, run MacSyView, the web-browser application for MacSyFinder’s results visualization. To do that, open the expected JSON result file with MacSyView: /share/macsyfinder/sequence_data/results.macsyfinder.json.
A screenshot of MacSyView is included here.