You can specify a wide range of non-homogeneous models, by combining different options.
This option share the same parameters as the homogeneous case, since the same kind of model is used for each branch. The additional options are the following:
The '*' wildcard can be used, as in *theta* for all the parameters whose name has theta in it.
Bio++ provides a general syntax to specify almost any non-homogeneous model.
You now have to configure each model individually, using the syntax introduced for the homogeneous case, excepted that model will be numbered, for instance:
model1 = T92(theta=0.39, kappa=2.79)
The additional option is available to attach the model to branches in the tree, specified by the id of the upper node in the tree:
You can also make a given model share parameters with another one by writing for instance:
model2 = T92(theta=0.39, kappa=model1.T92.kappa)
Please note the syntax, parameters are referred to as [model name].[parameter name] in that case. Only parameter from identical models can be aliased in this manner. To link parameters from different models, you have to use the more general option (warning, currently beta feature!)
model1 = T92(theta=0.4, kappa=4) model2 = GTR(theta=0.4, a = 1.1, b=0.4, c=0.4, d=0.25, e=0.1) nonhomogeneous.alias=GTR.theta1->T92.theta1
This option can be used to link parameters of the root frequencies if the model is non-stationary:
nonhomogeneous.root_freq=Full(init=balanced) nonhomogeneous.alias=Full.theta1->GTR.theta1_1
Note that this option is only available with the 'general' nonhomogeneous substitution models and will be ignored if used with "one_per_branch".
Finally, you may find useful the following options:
To define constraints for sites between submodels, we can set "paths" that any site must follow. For example, in the following description:
nonhomogeneous = general nonhomogeneous.number_of_models = 3 model1=T92() model2=MixedModel(model=T92(kappa=Simple(values=(4,10,20),probas=(0.1,0.5,0.4)))) model3=MixedModel(model=TN93(theta1=Simple(values=(0.1,0.5,0.9),probas=(0.3,0.2,0.5)))) model1.nodes_id=0:1 model2.nodes_id=2:3 model3.nodes_id=4:5
In this case, on branches 2 & 3 a site follows any submodel of model 2 (but the same submodel on both branches), and on branches 4 & 5, a site follows any submodel of model 3 (the same on both branches as well). But there is no constraint between models 2 & 3, which means that a site can follow any submodel of model 2 and any submodel of model 3.
If the user wants that a site with T92.kappa=4 in model 2 has TN93.theta1=0.1 in model 3, that a site with T92.kappa=10 in model 2 has TN93.theta1=0.9 in model 3, and that other cases are free (in this case it means that T92.kappa=20 in model 2 is linked with TN93.theta1=0.5 in model 3), then we can use the declarations:
site.number_of_paths=2 site.path1=model2[T92.kappa_1] & model3[TN93.theta1_2] site.path2=model2[T92.kappa_2] & model3[TN93.theta1_3]
The third path (for the remaining submodels) is automatically computed.
It is possible to link mixtures of submodels. For example,
site.path1=model2[T92.kappa_1] & model3[TN93.theta1_2] & model3[TN93.theta1_3]
means that a site that has T92.kappa=4 in model2 has either TN93.theta1=0.5 or TN93.theta1=0.9 in model3.
Because of these constraints, the probabilities of the submodels are linked. In the first example, probability of T92.kappa=4 in model 2 equals the probability of TN93.theta1=0.5 in model 3. Since it is contradictory with the probabilities defined in models 2 or 3, the reference probabilities are the ones of the first numbered mixed model, here model 2. In this case, the probabilities in model 3 may have no use, but with the second example the probability of submodel T92.kappa=4 equals the sum of the probabilities of submodels TN93.theta1=0.5 or TN93.theta1=0.9. The relative proportion of those models used in the declaration of model 3 is then used. Here their respective probabilities are then: 0.1*0.2/ (0.2+0.5)=0.0286 and 0.1*0.5/(0.2+0.5)=0.0714.
Concerning the optimization procedure, this choice may entail the non- identifiability of several parameters (here the probabilities in model 3), so the user should be careful about this.
Another example in the case of mixtures of mixed models, where the submodels are defined by their names;
nonhomogeneous = general nonhomogeneous.number_of_models = 2 model1=LLG08_UL2() model2=LLG08_UL3() site.number_of_paths=2 site.path1=model1[LLG08_UL2.M2] & model2[LLG08_UL3.Q1] site.path2=model1[LLG08_UL2.M1] & model2[LLG08_UL3.Q2] & model2[LLG08_UL3.Q3]
When nonhomogeneity option is one_per_branch, each site is constrained to follow the same submodel from leaves to root.
In case of nonstationary models, the ancestral frequencies are distinct parameters. If a model is assumed to be stationary, the “None” parameter value can be used, which is strictly equivalent to setting nonhomogeneous.stationary=yes.
When the model is a mixture model, since there is not a set of equilibrium frequencies, with this option the root frequencies are set to be the average (with the respective probabilities of the submodels) of the equilibrium frequencies of the submodels.
As since version 0.4.0, BppSuite uses the keyval syntax to set up root frequencies,
The Frequencies set used can be any of the ones described below See Frequencies sets, depending on the alphabet used.