.. MacSyFinder - Detection of macromolecular systems in protein datasets using systems modelling and similarity search. Authors: Sophie Abby, Bertrand Néron Copyright © 2014 Institut Pasteur, Paris. See the COPYRIGHT file for details MacsyFinder is distributed under the terms of the GNU General Public License (GPLv3). See the COPYING file for details. .. _quickstart: MacSyFinder Quick Start ======================= In order to run MacSyFinder on your favorite dataset as soon as you have installed it, you can simply follow the next steps: * Type: "``macsyfinder -h``" to see all options available. All command-line options are described in the :ref:`Command-line options section <command-line-label>`. * On a "metagenomic" dataset for example: "``macsyfinder --db-type unordered --sequence-db metagenome.fasta all``" will detect all systems modelled in .xml files placed in the default definition folder in a metagenomic dataset. "``macsyfinder --db-type unordered --sequence-db metagenome.fasta -d mydefinitions/ all``" will detect all systems modelled in .xml files placed in the *"mydefinitions"* folder. * On a completely assembled genome (where the gene order is known, and is relevant for systems detection): "``macsyfinder --db-type ordered-replicon --sequence-db mygenome.fasta -d mydefinitions/ SystemA SystemB``" will detect the systems *"SystemA"* and *"SystemB"* in a complete genome from *"SystemA.xml"* and *"SystemB.xml"* definition files placed in the folder *"mydefinitions"*. See :ref:`input-dataset-label` for more on input datasets. .. The systems available for detection are the: - "Flagellum" -- the bacterial flagellum, involved in motility - "T1SS" -- the type 1 secretion system, involved in the secretion of degrading enzymes, toxins,... - "T2SS" -- the type 2 secretion system, also involved in the secretion of degrading enzymes, toxins,... - "T3SS" -- the type 3 secretion, related to the flagellum and dedicated to the secretion into eukaryotic cells - "cT4SS" -- the conjugative type 4 secretion system, involved in the transfer of genetic material to other cells - "pT4SSi" -- the MPFi-like T4SS dedicated to protein secretion - "pT4SSt" -- the MPFt-like T4SS dedicated to protein secretion - "T5aSS" -- the "classical" autotransporter - "T5bSS" -- the "two-partner" secretion system - "T5cSS" -- the "trimeric" autotransporter - "T6SS" -- the type 6 secretion system, involved in protein secretion into bacterial and eukaryotic cells - "T4P" -- the type IV pilus, involved in twitching motility, adhesion to cells,... - "Tad" -- the Tad pilus, involved in adhesion,... .. note:: Systems have to be spelled in a case-sensitive way to run their detection from the command-line. The name of the system corresponds to the suffix defined for xml files (.xml by default), for example *"toto"* for a system defined in *"toto.xml"*. The *"all"* keyword allows to detect all systems available in the definition folder in a single run. See the :ref:`Command-line options <command-line-label>`. .. _datatest: First trial with a test dataset ******************************* We included a test dataset in the MacSyFinder package. **By default, it will be installed** in /share/macsyfinder or /usr/share/macsyfinder. But it can be located elsewhere if it was specified during installation. This dataset consists in the detection of CRISPR-Cas SubTypes with the definitions in the /share/macsyfinder/DEF folder, using the profiles in the /share/macsyfinder/profiles folder. This classification was previously described in `Makarova et al. 2011 <http://www.ncbi.nlm.nih.gov/pubmed/21552286>`_, and the profiles are from the `TIGRFAM database <http://www.jcvi.org/cgi-bin/tigrfams/index.cgi>`_ (release 13 of August 15 2012) and some of them were specifically designed for CRISPR-Cas classification (`Haft et. al, 2005 <http://www.ncbi.nlm.nih.gov/pubmed/16292354>`_). The definitions are detailed in the MacSyFinder's paper. As a sequence dataset, we propose three replicons in /share/macsyfinder/sequence_data/datatest_gembase.fasta: - *Escherichia coli* str. K-12 substr. MG1655 chromosome (ESCO001c01a). Genbank accession number: `NC_000913 <http://www.ncbi.nlm.nih.gov/nuccore/NC_000913>`_. - *Haloarcula marismortui* ATCC 43049 plasmid pNG400 (HAMA001p04a). Genbank accession number: `NC_006392 <http://www.ncbi.nlm.nih.gov/nuccore/NC_006392>`_. - *Legionella pneumophila* str. Paris, complete genome (LEPN003c01a). Genbank accession number: `NC_006368 <http://www.ncbi.nlm.nih.gov/nuccore/NC_006368>`_. They were concatenated in a single fasta file, following the "gembase" format proposed :ref:`here <gembase_convention>`, and thus MacSyfinder will treat the three different replicons separately for systems inference. To run the detection and classification of all subtypes, type:: "macsyfinder --db-type gembase --sequence-db /share/macsyfinder/sequence_data/datatest_gembase.fasta all" To run the detection of the Type-IE subtype only, type:: "macsyfinder --db-type gembase --sequence-db /share/macsyfinder/sequence_data/datatest_gembase.fasta CAS-TypeIE" A sample topology file is included /share/macsyfinder/sequence_data/datatest_gembase.topology, and follows the convention in :ref:`here <topology-files>`. It allows to specify a different topology "linear" or "circular" for each replicon in the "gembase" format. Otherwise, by default the topology is set to "circular". It can also be specified in the commmand-line (see the :ref:`Command-line options <command-line-label>`). To run the detection using the topology file, type:: "macsyfinder --db-type gembase --sequence-db /share/macsyfinder/sequence_data/datatest_gembase.fasta --topology-file /share/macsyfinder/sequence_data/datatest_gembase.topology all" Visualizing expected results with MacSyView ******************************************* To have an idea of what should be detected with the above test dataset, run :ref:`MacSyView <macsyview>`, the web-browser application for MacSyFinder's results visualization. To do that, open the expected JSON result file with MacSyView: /share/macsyfinder/sequence_data/results.macsyfinder.json. A screenshot of MacSyView is included :ref:`here <screenshot>`.