Running Phylip's contrast application for trait pairs from R

Scott Chamberlain




Here is some code to run Phylip's contrast application from R and get the output within R to easily manipulate yourself. Importantly, the code is written specifically for trait pairs only as the regular expression work in the code specifically grabs data from contast results when only two traits are input. You could easily change the code to do N traits. Note that the p-value calculated for the chi-square statistic is not output from contrast, but is calculated within the function 'PhylipWithinSpContr'. In the code below there are two functions that make a lot of busy work easier: 'WritePhylip' and 'PhylipWithinSpContr'. The first function is nice because the formatting required for data input to Phylip programs is so, well, awkward  - and this function does it for you. The second function runs contrast and retrieves the output data. The example data set I produce in the code below has multiple individuals per species, so that contrasts are calculated taking into account within species variation. Get Phylip's contrast documentation here.

Note that the data input format allows only 10 characters for the species name, so I suggest if your species names are longer than 10 characters use the function abbreviate, for example, to shorten all names to no longer than 10 characters. Also, within the function WritePhylip I concatenate species names and their number of individuals per species leaving plenty of space.

Also, mess around with the options in the "system" call to get what you want. For example, I used "R", "W" and "Y", meaning replace old outfile (R), then turn on within species analyses (W), then accept all options (Y). E..g, if you don't have an old outfile, then you obviously don't need to replace the old file with the "R" command.

(p.s. I have not tried this on a windows machine).




Here is example output:


> datout
names2 dat...1. dat...2.
1 VarAIn_VarAest 0.000110 -0.000017
2 VarAIn_VarAest -0.000017 0.000155
3 VarAIn_VarEest 0.790783 -0.063097
4 VarAIn_VarEest -0.063097 0.981216
5 VarAIn_VarAreg 1.000000 -0.107200
6 VarAIn_VarAreg -0.151800 1.000000
7 VarAIn_VarAcorr 1.000000 -0.127600
8 VarAIn_VarAcorr -0.127600 1.000000
9 VarAIn_VarEreg 1.000000 -0.064300
10 VarAIn_VarEreg -0.079800 1.000000
11 VarAIn_VarEcorr 1.000000 -0.071600
12 VarAIn_VarEcorr -0.071600 1.000000
13 VarAOut_VarEest 0.790734 -0.063104
14 VarAOut_VarEest -0.063104 0.981169
15 VarAOut_VarEreg 1.000000 -0.064300
16 VarAOut_VarEreg -0.079800 1.000000
17 VarAOut_VarEcorr 1.000000 -0.071600
18 VarAOut_VarEcorr -0.071600 1.000000
19 logL_withvar_df -68.779770 6.000000
20 logL_withoutvar_df -68.771450 3.000000
21 chisq_df -0.016640 3.000000
22 chisq_p 1.000000 -999.000000

Posted in  Phylogenetics R


blog comments powered by Disqus
Fork me on GitHub