MacSyFinder is run from the command-line using a variety of input files and options. See Input dataset for more details.
Initially, MacSyFinder searches for the components of a system by sequence similarity search.
From the list of systems to detect, a non-redundant list of components to search is built. For each system, the list includes:
- mandatory components
- accessory components
- forbidden components
- homologs and/or analogs of these three types of components in the case they are “exchangeable”
Hmmer is run on the corresponding set of HMM profiles, and the hits are filtered according to criteria defined by the user (see Hmmer options and HMMReport API). This step, and the extraction of significant hits can be performed in parallel (-w command-line option). See the Command-line options, and the search_genes API for more details.
The following steps depend on whether the input dataset is ordered (complete or nearly complete genome(s)), or unordered (metagenomes, or unassembled genome) (see Input dataset). In the case of ordered datasets, the hits of the previous analysis are used to build clusters of co-localized genes as defined in the XML files. These clusters are then scanned to check for the model specifications like minimal quorum of “Mandatory” or “Accessory” genes or the absence of “Forbidden” components. When the gene order is unknown the power of the analysis is more limited. In this case, and depending on the type of dataset, the presence of systems can be suggested only on the basis of the quorum of genes. The results are outputted in a tabular and graphical form (see Output format).
Note
When the “multi_loci” option is turned on, a single “multi-loci” system is assessed per replicon, even if it could correspond to multiple scattered systems. Thus, the “single-locus” systems correspond to a more powerful mode of detection.
Warning
Cases where systems are consecutive will be treated, and separate systems will be detected, but complex cases of detection, i.e. when systems’ components are intermingled will not be considered.
Note
The “unordered” mode of detection is less powerful, as a single occurrence of a given system is filled for an entire dataset with hits that origin is unknown. Please consider “systems assessments” with caution in this mode.