Prof. Dr. W. F. Martin
How to reach us
|Red and Problematic Green Phylogenetic Signals among Thousands of Nuclear Genes from the Photosynthetic and Apicomplexa-Related Chromera velia.|
Christian Woehle, Tal Dagan, William F. Martin, and Sven B. Gould
Genome Biol Evol 3:1220-30 (2011).
< UPDATE 2012 >
117 (113 of the clustered) likely contaminated contigs were removed from the datasets. They are summarized in "contamination.txt". We apologize for any inconvenience.
All contigs we used for our analysis. The header starts with our own identifier followed by the corresponding GenBank id.
Our translations of the contigs. The Methods are described in the paper. The header starts with our own identifier followed by the GenBank id. The following tab seperated terms are described here:
- The last one indicates which reading frame we used.
- The second term indicates if a full N-terminus (start), a full C-terminus (end), both (full) or neither (fragment) was predicted.
- The first of this three describes if the protein was predicted denovo (denovo) or by the best BLAST (blast,fulllength,frameshift) hit.
In the later case it was marked "fulllength", if the length of the best BLAST hit on the contig sequence is at least 90% of the length of the hitted protein and additionally "full" (see above) was predicted. If there were hits in different reading frames of the best hit it was indicated as "frameshift".
Translated C. velia contigs corresponding to the 3151 (3038 now without contamination) clusters used in the publication. The header consists of our identifier followed by the matching GenBank id.
"Nuclear-encoded, predicted plastid proteins" as in the supplementary data, but with GenBank ids added.
"Secretory sequences" as in the supplementary data, but with GenBank ids added.
GenBank ids with the corresponding versions of our identifiers (tab seperated).
Protein database used with own sequence headers.
EST database used with own sequence headers.
Assignment data to get original headers from our own sequence headers (protein_database.fa.gz;EST_database.fa.gz) and additional features.
We merged multiple identical sequences in single entries of own headers seperated by ";".