Beginners’ guide

The ngesh library is designed to generate simulated data in reproducible manner: the pseudo-random generator seeds are used not only in a way that guarantees reproducibility (that is, the same seed returning the same tree) but also in an incremental way that will return similar trees when different parameters are used to the same seed.

A pure random tree can be generated with a call to the ngesh.gen_tree() function. Either a minimum number of leaves (min_leaves) or maximum time (max_time) must be specified as a stopping criterion.

>>> import ngesh
>>> tree1 = ngesh.gen_tree(min_leaves=5)
>>> print(tree1)

      /-L1
   /-|
  |   \-L2
--|
  |   /-L3
   \-|
     |   /-L4
      \-|
         \-L5

>>> tree2 = ngesh.gen_tree(max_time=1.0)
>>> print(tree2)

      /-L1
   /-|
--|   \-L2
  |
   \-L3

The main parameters are the birth and death rates, which default to 1.0 and 0.5 respectively and are the first and second parameter. A Yule tree can be simulated by setting the death rate to 0.0 (i.e., ngesh.gen_tree(1.0, 0.0)).

For reproducibility, the main parameter is the seed, which takes any Python object that can be represented as a string. The equivalence of the generated tree can be verified by generating trees from the same seed with different labels:

>>> tree3 = ngesh.gen_tree(1.0, 0.33, min_leaves=5, labels="human", seed="ngesh")
>>> print(tree3)

      /-Fupsikmu
   /-|
  |   \-Enafu
--|
  |   /-Zuhu
   \-|
     |   /-Poizes
      \-|
         \-Buge

>>> tree4 = ngesh.gen_tree(1.0, 0.33, min_leaves=5, labels="bio", seed="ngesh")
>>> print(tree4)

      /-Ennapus satvo
   /-|
  |   \-Sbuges asecis
--|
  |   /-Spoizes rovues
   \-|
     |   /-Spupsicmus essis
      \-|
         \-Zusis spesus

The progression of the random generation can be verified by expanding the number of leaves in the tree with human labels above:

>>> tree5 = ngesh.gen_tree(1.0, 0.33, min_leaves=6, labels="human", seed="ngesh")
>>> print(tree5)

      /-Fupsikmu
   /-|
  |   \-Enafu
--|
  |   /-Zuhu
  |  |
   \-|      /-Poizes
     |   /-|
      \-|   \-Buge
        |
         \-Hesi

Note that a single new taxon was created, “Buge”, while the rest of tree is equivalent to one above.

All trees are normal ETE3 Tree objects, which means that all methods from ETE3, including those for visualization and export, can be used directly. For example, we can easily iterate over all the descendants to grab the node distances, that is, the age in relation to the root. Note that internal nodes are not named by default:

>>> for node in tree5.iter_descendants():
>>>     print([node.name, node.dist])

["", 0.8181260108242158]
["", 1.3503435886693707]
["Fupsikmu", 1.1736487266924596]
["Enafu", 1.1736487266924596]
["Zuhu", 0.6414311488473046]
["", 0.5001786833989259]
["", 0.07984912684272014]
["Hesi", 0.1412524654483787]
["Poizes", 0.06140333860565855]
["Buge", 0.06140333860565855]

Random characters, matching the topology, can be added with the ngesh.add_characters() function. A new tree is returned (characters are not added in-place) and it is necessary to specify, at least, the number of characters to be simulated along with the k and theta arguments for the gamma distribution related to mutation events. Note that the function also allows to simulate events equivalent to horizontal gene transfer.

>>> tree_char = ngesh.add_characters(tree5, 10, 2.0, 1.0)
>>> for node in tree_char.iter_descendants():
>>>     if node.name:
>>>         print("%10s - " % node.name, " ".join(["%02i" % v for v in node.chars]))

  Fupsikmu -  15 28 02 17 04 05 06 07 29 09 10 11 30 13 19
     Enafu -  15 16 02 17 04 24 06 25 26 09 27 11 12 13 19
      Zuhu -  00 01 02 31 20 05 06 21 32 09 22 23 33 13 14
      Hesi -  00 01 02 03 20 05 06 21 08 09 22 23 12 13 14
    Poizes -  00 01 02 03 20 05 06 21 08 09 22 23 12 13 14
      Buge -  00 01 02 03 20 05 06 21 08 09 22 23 12 13 14

The trees can be exported to different formats, as described in the modules documentation and perfomed in the tests.

Newick representations of trees can be “sorted”, solving comparison issues of these structures (remember that phylogenetic trees are like “hanging mobiles”). The module is self-contained and can be called from the command-line: