ngesh package

Submodules

ngesh.output module

Module with auxiliary function for output generation.

ngesh.output.tree2nexus(tree: ete3.coretype.tree.TreeNode) → str

Returns a string with the representation of a tree in NEXUS format.

Parameters:tree – The ete3 tree whose NEXUS representation will be returned.
Returns:A string with the full representation of the tree in NEXUS format.
ngesh.output.tree2wordlist(tree: ete3.coretype.tree.TreeNode) → str

Returns a string with the representation of a tree in wordlist format.

Parameters:tree – The ete3 tree whose CSV representation will be returned.
Returns:A string with the full representation of the tree in CSV format.

ngesh.random_tree module

Random Phylogenetic Tree Generator.

This script provides function to generate random phylogenetic trees in a Yule (birth only) or Birth-Death model, setting different generation parameters and limiting the tree in terms of number of leaves and/or evolution time.

ngesh.random_tree.add_characters(tree: ete3.coretype.tree.TreeNode, num_characters: int, k: float, th: float, mut_exp: float = 1.0, k_hgt: Optional[float] = None, th_hgt: Optional[float] = None, seed: Optional[Hashable] = None) → ete3.coretype.tree.TreeNode

Add random characters to the nodes of a tree.

Characters are added according to parameters of gamma distributions which are related to the length of each branch. The two possible events are mutation (assumed to be always to a new character, i.e., no parallel evolution) and horizontal gene transfer. No perturbation, such as the simulation of errors in sequencing/data collection, is performed by this function.

Parameters:
  • tree – The ete3 tree to which characters will be added. Any previous characters will be overridden.
  • num_characters – The number of characters for each taxa.
  • k – The k parameter for the gamma distribution related to mutation events.
  • th – The theta parameter for the gamma distribution related to mutation events.
  • k_hgt – The k parameter for the gamma distribution related to horizontal gene transfer events. Defaults to None (in case HGT should be modelled but the user is unsure about an appropriate value for k_hgt, it suggested to set it to 1.5 times k).
  • th_hgt – The theta parameter for the gamma distribution related to horizontal gene transfer events. Defaults to None (in case HGT should be modelled but the user is unsure about an appropriate value for th_hgt, it suggested to set it to the same value as th).
  • mut_exp – The exponent for correction of mutation probability of each character. Defaults to 1.0 (no correction).
  • seed – An optional seed for the random number generator. Defaults to None.
Returns:

The provided tree, with random characters added.

ngesh.random_tree.gen_tree(birth: float = 1.0, death: float = 0.5, method: str = 'standard', min_leaves: Optional[int] = None, num_leaves: Optional[int] = None, max_time: Optional[float] = None, lam: float = 0.0, prune: bool = False, labels: Optional[str] = 'enum', seed: Optional[Hashable] = None) → ete3.coretype.tree.TreeNode

Return a random phylogenetic tree.

At least one stopping criterion must be informed, with the tree being returned when such criterion, or either criteria, is/are met.

This function wraps the internal __gen_tree() function which cannot guarantee that a valid tree will be generated given the user parameters and the random sampling. It will try as many times as necessary to provide a valid (and reproducible, given a seed) tree, within the limits of an internal parameter for maximum number of attempts.

Parameters:
  • birth – The birth rate (lambda) for the generated tree. Defaults to 1.0.
  • death – The death rate (mu) for the generated tree. Must be explicitly set to zero for Yule model (i.e., birth only). Defaults to 0.5.
  • method – The generation method to use. Available methods are “default” and “fast” (contributed by Nicola de Maio).
  • min_leaves – A stopping criterion with the minimum number of extant leaves. The generated tree will have at least the number of requested extant leaves (possibly more, as the last speciation event might produce more leaves than the minimum specified. Defaults to None.
  • num_leaves – A stopping criterion with the number of leaves in the tree, including non extant ones (note that this differs from min_leaves). The generated tree will have exactly the number of requested leaves, performing pruning if necessary. Note that, if combined with prune, this option might result in trees with fewer nodes than what has been specified. Defaults to None.
  • max_time – A stopping criterion with the maximum allowed time for evolution. Defaults to None.
  • lam – The expectation of interval for sampling a Poisson distribution during speciation, with a minimum of two descendants. Should be used if more than two descendants are to be allowed. Defaults to zero, meaning that all speciation events will have two and only two descendents.
  • prune – A flag indicating whether any non-extant leaves should be pruned from the tree before it is returned.
  • labels – The model to be used for generating random labels, either “enum” (for enumerated labels), “human” (for random single names), “bio” (for random biological names” or None. Defaults to “enum”.
  • seed – An optional seed for the random number generator. Defaults to None.
Returns:

The tree randomly generated according to the parameters.

ngesh.random_tree.label_tree(tree: ete3.coretype.tree.TreeNode, model: str = 'enum', seed: Optional[Hashable] = None)

Labels the nodes of a tree according to a model.

Linguistic labels are unique names generated in a way intended to be readable.

Please note that the tree object is changed in place (no return).

Parameters:
  • tree – The tree whose nodes will be labeled in place.
  • model – A string indicating which model for label generation should be used. Possible values are “enum” (for enumerated labels), “human” (for random single names), and “bio” (for random biological names).
  • seed – An optional seed for the random number generator, only used in case of linguistic and biological labels. Defaults to None.
ngesh.random_tree.simulate_bad_sampling(tree: ete3.coretype.tree.TreeNode, cutoff: Optional[float] = None, seed: Optional[Hashable] = None)

Modify a tree in place simulating bad sampling.

Bad sampling is currently simulated in an uniform distribution, i.e., all existing leaves have the same probability of being removed. Note that if a full simulation of tree topology and characters is performed, this task must be carried out after the simulation of character evolution, as otherwise they would fit the sampled tree and not the original one.

As the bad sampling simulation is also based on random numbers, while unlikely it is possible that no actual simulation takes place.

Parameters:
  • tree – ETE3 Tree object for bad sampling simulation.
  • cutoff – The approximate percentage of extant leaves to remove from the tree before returning, simulating uniform bad sampling. As this is performed randomly, there is no guarantee that any leaf will actually be removed. Default to None (no bad sampling simulation).
  • seed – An optional seed for the random number generator. Defaults to None.

Module contents

ngesh __init__.py

ngesh.add_characters(tree: ete3.coretype.tree.TreeNode, num_characters: int, k: float, th: float, mut_exp: float = 1.0, k_hgt: Optional[float] = None, th_hgt: Optional[float] = None, seed: Optional[Hashable] = None) → ete3.coretype.tree.TreeNode

Add random characters to the nodes of a tree.

Characters are added according to parameters of gamma distributions which are related to the length of each branch. The two possible events are mutation (assumed to be always to a new character, i.e., no parallel evolution) and horizontal gene transfer. No perturbation, such as the simulation of errors in sequencing/data collection, is performed by this function.

Parameters:
  • tree – The ete3 tree to which characters will be added. Any previous characters will be overridden.
  • num_characters – The number of characters for each taxa.
  • k – The k parameter for the gamma distribution related to mutation events.
  • th – The theta parameter for the gamma distribution related to mutation events.
  • k_hgt – The k parameter for the gamma distribution related to horizontal gene transfer events. Defaults to None (in case HGT should be modelled but the user is unsure about an appropriate value for k_hgt, it suggested to set it to 1.5 times k).
  • th_hgt – The theta parameter for the gamma distribution related to horizontal gene transfer events. Defaults to None (in case HGT should be modelled but the user is unsure about an appropriate value for th_hgt, it suggested to set it to the same value as th).
  • mut_exp – The exponent for correction of mutation probability of each character. Defaults to 1.0 (no correction).
  • seed – An optional seed for the random number generator. Defaults to None.
Returns:

The provided tree, with random characters added.

ngesh.gen_tree(birth: float = 1.0, death: float = 0.5, method: str = 'standard', min_leaves: Optional[int] = None, num_leaves: Optional[int] = None, max_time: Optional[float] = None, lam: float = 0.0, prune: bool = False, labels: Optional[str] = 'enum', seed: Optional[Hashable] = None) → ete3.coretype.tree.TreeNode

Return a random phylogenetic tree.

At least one stopping criterion must be informed, with the tree being returned when such criterion, or either criteria, is/are met.

This function wraps the internal __gen_tree() function which cannot guarantee that a valid tree will be generated given the user parameters and the random sampling. It will try as many times as necessary to provide a valid (and reproducible, given a seed) tree, within the limits of an internal parameter for maximum number of attempts.

Parameters:
  • birth – The birth rate (lambda) for the generated tree. Defaults to 1.0.
  • death – The death rate (mu) for the generated tree. Must be explicitly set to zero for Yule model (i.e., birth only). Defaults to 0.5.
  • method – The generation method to use. Available methods are “default” and “fast” (contributed by Nicola de Maio).
  • min_leaves – A stopping criterion with the minimum number of extant leaves. The generated tree will have at least the number of requested extant leaves (possibly more, as the last speciation event might produce more leaves than the minimum specified. Defaults to None.
  • num_leaves – A stopping criterion with the number of leaves in the tree, including non extant ones (note that this differs from min_leaves). The generated tree will have exactly the number of requested leaves, performing pruning if necessary. Note that, if combined with prune, this option might result in trees with fewer nodes than what has been specified. Defaults to None.
  • max_time – A stopping criterion with the maximum allowed time for evolution. Defaults to None.
  • lam – The expectation of interval for sampling a Poisson distribution during speciation, with a minimum of two descendants. Should be used if more than two descendants are to be allowed. Defaults to zero, meaning that all speciation events will have two and only two descendents.
  • prune – A flag indicating whether any non-extant leaves should be pruned from the tree before it is returned.
  • labels – The model to be used for generating random labels, either “enum” (for enumerated labels), “human” (for random single names), “bio” (for random biological names” or None. Defaults to “enum”.
  • seed – An optional seed for the random number generator. Defaults to None.
Returns:

The tree randomly generated according to the parameters.

ngesh.set_seeds(seed: Optional[Hashable] = None)

Set seeds globally from the user provided one.

The function takes care of reproducibility and allows to use strings and floats as seed for numpy as well.

ngesh.show_random_tree()

Shows a random tree using ETE3.

This function is intended for a quick demonstration of the library.

ngesh.simulate_bad_sampling(tree: ete3.coretype.tree.TreeNode, cutoff: Optional[float] = None, seed: Optional[Hashable] = None)

Modify a tree in place simulating bad sampling.

Bad sampling is currently simulated in an uniform distribution, i.e., all existing leaves have the same probability of being removed. Note that if a full simulation of tree topology and characters is performed, this task must be carried out after the simulation of character evolution, as otherwise they would fit the sampled tree and not the original one.

As the bad sampling simulation is also based on random numbers, while unlikely it is possible that no actual simulation takes place.

Parameters:
  • tree – ETE3 Tree object for bad sampling simulation.
  • cutoff – The approximate percentage of extant leaves to remove from the tree before returning, simulating uniform bad sampling. As this is performed randomly, there is no guarantee that any leaf will actually be removed. Default to None (no bad sampling simulation).
  • seed – An optional seed for the random number generator. Defaults to None.
ngesh.sorted_newick(newick)

Build a sorted representation of a Newick string.

An internal function parses a Newick string by identifying tokens with a regular expression, which might fail for complex trees such as those carrying information other than branch length and node name.

Parameters:newick – The Newick tree to be sorted.
Returns:A corresponding but sorted Newick tree.
ngesh.tree2nexus(tree: ete3.coretype.tree.TreeNode) → str

Returns a string with the representation of a tree in NEXUS format.

Parameters:tree – The ete3 tree whose NEXUS representation will be returned.
Returns:A string with the full representation of the tree in NEXUS format.
ngesh.tree2wordlist(tree: ete3.coretype.tree.TreeNode) → str

Returns a string with the representation of a tree in wordlist format.

Parameters:tree – The ete3 tree whose CSV representation will be returned.
Returns:A string with the full representation of the tree in CSV format.