Configuration API
Options to run MacSyFinder can be specified in a Configuration file. The API described below handles all configuration options for MacSyFinder. The Config object provides some default values, and performs some validations of the values.
Config API reference
-
class macsypy.config.Config(cfg_file='', sequence_db=None, db_type=None, replicon_topology=None, topology_file=None, inter_gene_max_space=None, min_mandatory_genes_required=None, min_genes_required=None, max_nb_genes=None, multi_loci=None, hmmer_exe=None, index_db_exe=None, e_value_res=None, i_evalue_sel=None, coverage_profile=None, def_dir=None, res_search_dir=None, res_search_suffix=None, profile_dir=None, profile_suffix=None, res_extract_suffix=None, out_dir=None, log_level=None, log_file=None, worker_nb=None, config_file=None, previous_run=None, build_indexes=None)[source]
Parse configuration files and handle the configuration according to the following file location precedence:
/etc/macsyfinder/macsyfinder.conf < ~/.macsyfinder/macsyfinder.conf < .macsyfinder.conf
If a configuration file is given on the command-line, this file will be used.
In fine the arguments passed on the command-line have the highest priority.
-
__init__(cfg_file='', sequence_db=None, db_type=None, replicon_topology=None, topology_file=None, inter_gene_max_space=None, min_mandatory_genes_required=None, min_genes_required=None, max_nb_genes=None, multi_loci=None, hmmer_exe=None, index_db_exe=None, e_value_res=None, i_evalue_sel=None, coverage_profile=None, def_dir=None, res_search_dir=None, res_search_suffix=None, profile_dir=None, profile_suffix=None, res_extract_suffix=None, out_dir=None, log_level=None, log_file=None, worker_nb=None, config_file=None, previous_run=None, build_indexes=None)[source]
Parameters: |
- cfg_file (string) – the path to the MacSyFinder configuration file to use
- previous_run (string) – the path to the results directory of a previous run
- sequence_db (string) – the path to the sequence input dataset (fasta format)
- db_type (string) – the type of dataset to deal with.
“unordered_replicon” corresponds to a non-assembled genome,
“unordered” to a metagenomic dataset,
“ordered_replicon” to an assembled genome, and
“gembase” to a set of replicons where sequence identifiers follow this convention “>RepliconName_SequenceID”.”
- replicon_topology (string) – the topology (‘linear’ or ‘circular’) of the replicons. This option is meaningful only if the db_type is ‘ordered_replicon’ or ‘gembase’
- topology_file (string) – a tabular file of mapping between replicon names and the corresponding topology (e.g. “RepliconA linear”)
- inter_gene_max_space (list of list of 2 elements [[ string system, integer space] , ...]) –
- min_mandatory_genes_required (list of list of 2 elements [[ string system, integer ] , ...]) –
- min_genes_required (list of list of 2 elements [[ string system, integer ] , ...]) –
- max_nb_genes (list of list of 2 elements [[ string system, integer ] , ...]) –
- multi_loci (string) –
- hmmer_exe (string) – the Hmmer “hmmsearch” executable
- index_db_exe (string) – the indexer executable (“makeblastdb” or “formatdb”)
- e_value_res (float) – maximal e-value for hits to be reported during Hmmer search
- i_evalue_sel (float) – maximal independent e-value for Hmmer hits to be selected for system detection
- coverage_profile (float) – minimal profile coverage required in the hit alignment to allow the hit selection for system detection
- def_dir (string) – the path to the directory containing systems definition files (.xml)
- res_search_dir (string) – the path to the directory where to store MacSyFinder search results directories.
- out_dir (string) – The results are written in a directory. By default the directory is named macsyfinder-{date}, but this option
allow to override this behavior. If out-dir option is set out-dir wiil be created if outdir already exists it must be empty.
if out-dir and res-search-dir are sets res-search-dir will be ignore.
- res_search_suffix (string) – the suffix to give to Hmmer raw output files
- res_extract_suffix (string) – the suffix to give to filtered hits output files
- profile_dir (string) – path to the profiles directory
- profile_suffix (string) – the suffix of profile files. For each ‘Gene’ element, the corresponding profile is searched in the ‘profile_dir’, in a file which name is based on the Gene name + the profile suffix.
- log_level (int) – the level of log output
- log_file (string) – the path to the directory to write MacSyFinder log files
- worker_nb (int) – maximal number of processes to be used in parallel (multi-thread run, 0 use all cores availables)
- build_indexes (boolean) – build the indexes from the sequence dataset in fasta format
|
-
__weakref__
list of weak references to the object (if defined)
-
_validate(cmde_line_opt, cmde_line_values)[source]
Get all configuration values and check the validity of their values.
Create the working directory
Parameters: |
- cmde_line_opt (dict, all values are cast in string) – the options from the command line
- cmde_line_values (dict, values are not cast) – the options from the command line
|
Returns: | all the options for this execution
|
Return type: | dictionary
|
-
build_indexes[source]
Returns: | True if the indexes must be rebuilt, False otherwise |
Return type: | boolean |
-
coverage_profile[source]
Returns: | the coverage threshold used to select a hit for systems detection and for the Hmmer report (filtered hits) |
Return type: | float |
-
db_type[source]
Returns: | the type of the input sequence dataset. The allowed values are :’unordered_replicon’, ‘ordered_replicon’, ‘gembase’, ‘unordered’ |
Return type: | string |
-
def_dir[source]
Returns: | the path to the directory where are stored definitions of secretion systems (.xml files) |
Return type: | string |
-
e_value_res[source]
Returns: | The e_value threshold used by Hmmer to report hits in the Hmmer raw output file |
Return type: | float |
-
hmmer_dir[source]
Returns: | the name of the directory where the hmmer results are stored |
Return type: | string |
-
hmmer_exe[source]
Returns: | the name of the binary to execute for homology search from HMM protein profiles (Hmmer) |
Return type: | string |
-
i_evalue_sel[source]
Returns: | the i_evalue threshold used to select a hit for systems detection and for the Hmmer report (filtered hits) |
Return type: | float |
-
index_db_exe[source]
Returns: | the name of the binary to index the input sequences dataset for Hmmer |
Return type: | string |
-
inter_gene_max_space(system)[source]
Parameters: | system (string) – the name of a system |
Returns: | the maximum number of components with no match allowed between two genes with a match to consider them contiguous (at the system level) |
Return type: | integer |
-
max_nb_genes(system)[source]
Parameters: | system (string) – the name of a system |
Returns: | the maximum number of genes to assess the system presence |
Return type: | integer |
-
min_genes_required(system)[source]
Parameters: | system (string) – the name of a system |
Returns: | the genes (mandatory+accessory) quorum to assess the system presence |
Return type: | integer |
-
min_mandatory_genes_required(system)[source]
Parameters: | system (string) – the name of a system |
Returns: | the mandatory genes quorum to assess the system presence |
Return type: | integer |
-
multi_loci(system)[source]
Parameters: | system (string) – the name of a system |
Returns: | the genes (mandatory+accessory) quorum to assess the system presence |
Return type: | boolean |
-
previous_run[source]
Returns: | the path to the previous run directory to use (to recover Hmmer raw output) |
Return type: | string |
-
profile_dir[source]
Returns: | the path to the directory where are the HMM protein profiles which corresponds to Gene |
Return type: | string |
-
profile_suffix[source]
Returns: | the suffix for profile files |
Return type: | string |
-
replicon_topology[source]
Returns: | the topology of the replicons. Two values are supported ‘linear’ (default) and circular. Only relevant for ‘ordered’ datasets |
Return type: | string |
Returns: | the suffix of extract files (tabulated files after HMM output parsing and filtering of hits) |
Return type: | string |
-
res_search_dir[source]
:return the path to the directory to store results of MacSyFinder runs
:rtype: string
-
res_search_suffix[source]
Returns: | the suffix for Hmmer raw output files |
Return type: | string |
-
save(dir_path)[source]
save the configuration used for this run in the ini format file
-
sequence_db[source]
Returns: | the path to the input sequence dataset (in fasta format) |
Return type: | string |
-
topology_file[source]
Returns: | the path to the file of replicons topology. |
Return type: | string |
-
worker_nb[source]
Returns: | the maximum number of parallel jobs |
Return type: | int |
-
working_dir[source]
Returns: | the path to the working directory to use for this run |
Rtpe: | string |