Stores a set of contiguous hits. The Cluster object can have different states regarding its content in different genes’ systems:
- ineligible: not a cluster to analyze
- clear: a single system is represented in the cluster
- ambiguous: several systems are represented in the cluster => might need a disambiguation
Parameters: | systems_to_detect (a list of macsypy.system.System) – the list of systems to be detected in this run |
---|
Returns: | the length of the Cluster, i.e., the number of hits stored in it |
---|---|
Return type: | integer |
print of the Cluster’s hits stored in terms of components, and corresponding sequence identifier and positions
list of weak references to the object (if defined)
Add a Hit to a Cluster. Hits are always added at the end of the cluster (appended to the list of hits). Thus, ‘begin’ and ‘end’ positions of the Cluster are always the position of the 1st and of the last hit respectively.
Parameters: | hit (a macsypy.report.Hit) – the Hit to add |
---|---|
Raise: | a macsypy.macsypy_error.SystemDetectionError |
Returns: | the list of the names of compatible systems represented by the cluster |
---|---|
Return type: | string |
Returns: | the name of the putative system represented by the cluster |
---|---|
Return type: | string |
Check the status of the cluster regarding systems which have hits in it. Update systems represented, and assign a putative system (self._putative_system), which is the system with most hits in the cluster. The systems represented are stored in a dictionary in the self.systems variable. The execution of this function can be forced, even if it has already run for the cluster with the option force=True.
Deals with sets of clusters found in a dataset. Conceived to store only clusters from a same replicon.
Parameters: | cfg (macsypy.config.Config) – The configuration object built from default and user parameters. |
---|
list of weak references to the object (if defined)
This function takes into account the circularity of the replicon by merging clusters when appropriate (typically at replicon’s ends). It has to be called only if the replicon_topology is set to “circular”.
Parameters: |
|
---|
Creates and stores the names of detected systems. Ensures the uniqueness of the names.
list of weak references to the object (if defined)
This class is instantiated for a specific system that has been asked for detection. It can be filled step by step with hits. A decision can then be made according to the parameters defined e.g. quorum of genes.
Parameters: | system (macsypy.system.System) – the system to “fill” with hits. |
---|
Returns: | Information of the component content of the SystemOccurence. |
---|---|
Return type: | string |
list of weak references to the object (if defined)
Parameters: | gene_dict (dict) – a dictionary with gene’s names as keys and number of occurrences as values |
---|---|
Returns: | the list of genes with no occurence in the gene counter. |
Return type: | list |
Returns the length of the system, all loci gathered, in terms of protein number (even those not matching any system gene)
Parameters: | rep_info (a namedTuple “RepliconInfo” macsypy.database.RepliconInfo) – an entry extracted from the macsypy.database.RepliconDB |
---|---|
Return type: | integer |
Counts the number of genes with at least one occurrence in a dictionary with a counter of genes.
Parameters: | gene_dict (dict) – a dictionary with gene’s names as keys and number of occurrences as values |
---|---|
Return type: | integer |
Counts the number of matches in a dictionary with a counter of genes, independently of the nb of genes matched.
Parameters: | gene_dict (dict) – a dictionary with gene’s names as keys and number of occurrences as values |
---|---|
Return type: | integer |
Counts the number of genes with no occurence in the gene counter.
Parameters: | gene_dict (dict) – a dictionary with gene’s names as keys and number of occurrences as values |
---|---|
Return type: | integer |
When a decision is made, the status (self.status) of the macsypy.search_systems.SystemOccurence is set either to:
- “single_locus” when a complete system in the form of a single cluster was found
- “multi_loci” when a complete system in the form of several clusters was found
- “uncomplete” when no system was assessed (quorum not reached)
- “empty” when no gene for this system was found
- “exclude” when no system was assessed (at least one forbidden gene was found)
Returns: | a printable message of the output decision with this SystemOccurrence |
---|---|
Return type: | string |
Adds hits from a cluster to a system occurence, and check which are their status according to the system definition. Set the system occurence state to “no_decision” after calling of this function.
Parameters: | cluster (macsypy.search_systems.Cluster) – the set of contiguous genes to treat for macsypy.search_systems.SystemOccurence inclusion. |
---|
Adds hits to a system occurence, and check what are their status according to the system definition. Set the system occurence state to “no_decision” after calling of this function.
Note
Forbidden genes will only be included if they do belong to the current system (and not to another specified with “system_ref” in the current system’s definition).
Parameters: | hits – a list of Hits to treat for macsypy.search_systems.SystemOccurence inclusion. |
---|
This function fills the SystemOccurrence with genes putatively coming from other systems (feature “multi_system”). Those genes are used only if the occurrence of the corresponding gene was not yet filled with a gene from a cluster of the system.
Parameters: | multi_systems_hits – a list of hits of genes that are “multi_system” which correspond to mandatory or accessory genes from the current system for which to fill a SystemOccurrence |
---|
Parameters: | forbid_exclude (boolean) – exclude the forbidden components if set to True. False by default. |
---|---|
Returns: | A dictionary ready for printing in system summary, with genes (mandatory, accessory and forbidden if specified) occurences in the system occurrence. |
Parameters: | gene (macsypy.gene.Gene, or macsypy.gene.Homolog or macsypy.gene.Analog object) – the gene to get it’s gene reference |
---|---|
Returns: | object macsypy.gene.Gene or None |
Return type: | macsypy.gene.Gene object or None |
Raise: | KeyError if the system does not contain any gene gene. |
Gives a summary of the system occurrence in terms of gene content and localization.
Parameters: |
|
---|---|
Returns: | a tabulated summary of the macsypy.search_systems.SystemOccurence |
Return type: | string |
Returns a string with the description of the summary returned by self.get_summary()
Return type: | string |
---|
Gives a summary of the system occurrence in terms of gene content only (specific of “unordered” datasets).
Parameters: | replicon_name (string) – the name of the replicon |
---|---|
Returns: | a tabulated summary of the macsypy.search_systems.SystemOccurence |
Return type: | string |
Attributes a name to the system occurrence for an “unordered” dataset => generating a generic name based on the system name and the suffix given.
Parameters: | suffix (string) – the suffix to be used for generating the systemOccurrence’s name |
---|---|
Returns: | a name for a system in an “unordered” dataset to the macsypy.search_systems.SystemOccurence |
Return type: | string |
Attributes unique name to the system occurrence with the class macsypy.search_systems.SystemNameGenerator. Generates the name if not already set.
Parameters: | replicon_name (string) – the name of the replicon |
---|---|
Returns: | the unique name of the macsypy.search_systems.SystemOccurence |
Return type: | string |
Test for SystemOccurrence completeness.
Returns: | True if the state of the SystemOccurrence is “single_locus” or “multi_loci”, False otherwise. |
---|---|
Return type: | boolean |
Analyzes sets of contiguous hits (clusters) stored in a ClustersHandler for system detection:
Only for “ordered” datasets representing a whole replicon. Reports systems occurence.
Parameters: |
|
---|---|
Returns: | a set of systems occurence filled with hits found in clusters |
Return type: | a list of macsypy.search_systems.SystemOccurence |
Gets sets of contiguous hits according to the minimal inter_gene_max_space between two genes. Only for “ordered” datasets.
Parameters: |
|
---|---|
Returns: | a set of clusters and a dictionary with “multi_system” genes stored in a system-wise way for further utilization. |
Return type: |
This disambiguation step is used on clusters with hits for multiple systems (when cluster.state is set to “ambiguous”). It returns a “cleansed” list of clusters, ready to use for system occurence detection (and that are “clear” cases). It:
Parameters: | cluster (macsypy.search_systems.Cluster) – the cluster to “disambiguate” |
---|
Returns from a putatively redundant list of hits, a list of best matching hits. Analyzes quorum and co-localization if required for system detection. By default, hits are already sorted by position, and the hit with the best score is kept, then the best i-evalue. Possible criteria are:
Parameters: |
|
---|---|
Returns: | the list of best matching hits |
Return type: | list of macsypy.report.Hit |
Raise: |
Returns the intersection of the two input systems lists.
Parameters: | systems_list2 (systems_list1,) – two lists of systems |
---|---|
Returns: | a list of systems, or an empty list if no common system |
Return type: | a list of macsypy.system.System |
Runs search of systems from a set of hits. Criteria for system assessment will depend on the kind of input dataset provided:
- analyze quorum and co-localization for “ordered_replicon” and “gembase” datasets.
- analyze quorum only (and in a limited way) for “unordered_replicon” and “unordered” datasets.
Parameters: |
|
---|
Parameters: |
|
---|
Parameters: |
|
---|
Builds a counter of systems per replicon, with different “states” separated (single-locus vs multi-loci systems)
Returns: | the counter of systems |
---|---|
Return type: | Counter |
Writes a report of sequences forming the detected systems, with information in their status in the system, their localization on replicons, and statistics on the Hits.
Parameters: |
|
---|
Writes a report with the summary of systems detected in replicons. For each system, a summary is done including:
- the number of mandatory/accessory genes in the reference system (as defined in XML files)
- the number of mandatory/accessory genes detected
- the number and list of missing genes
- the number of loci encoding the system
Parameters: |
|
---|
Generates the report in json format
Parameters: |
|
---|
Write a tabulated output with number of detected systems for each replicon.
Parameters: |
|
---|---|
Return type: | string |
Mandatory and accessory genes only are reported in the “json” and “report” output, but all hits matching a system component are reported in the “summary”.
Parameters: | systems_occurences_list (list of macsypy.search_systems.SystemOccurence) – the list of system’s occurrences to consider |
---|
Generates the report in json format
Parameters: | path (string) – the path to a file where to write the report in json format |
---|
Writes a report of sequences forming the detected systems, with information in their status in the system, their localization on replicons, and statistics on the Hits.
Parameters: |
|
---|
Writes a report with the summary for putative systems in an unordered dataset. For each system, a summary is done including:
- the number of mandatory/accessory genes in the reference system (as defined in XML files)
- the number of mandatory/accessory genes detected
Parameters: |
|
---|
Encapsulates a macsypy.report.Hit This class stores a Hit that has been attributed to a detected system. Thus, it also stores:
It also aims at storing information for results extraction:
Parameters: |
|
---|
list of weak references to the object (if defined)