headerheader
     
  tree

    bullet Home

    bullet Prof. Dr. W. F. Martin

    bullet Secretary

    bullet Team

    bullet Lehre

    bullet Research

    bullet Publications

    bullet How to reach us

    bullet Jobs

    bullet Resources

      back

    Supplementary resources    
    Woehle et al. 2011    
Red and Problematic Green Phylogenetic Signals among Thousands of Nuclear Genes from the Photosynthetic and Apicomplexa-Related Chromera velia.
Christian Woehle, Tal Dagan, William F. Martin, and Sven B. Gould
Genome Biol Evol 3:1220-30 (2011).

< UPDATE 2012 >
117 (113 of the clustered) likely contaminated contigs were removed from the datasets. They are summarized in "contamination.txt". We apologize for any inconvenience.

supplementary_data.doc
Supplementary data.

Cvelia_contigs.fa
All contigs we used for our analysis. The header starts with our own identifier followed by the corresponding GenBank id.

Cvelia_translated.fa
Our translations of the contigs. The Methods are described in the paper. The header starts with our own identifier followed by the GenBank id. The following tab seperated terms are described here:
  • The last one indicates which reading frame we used.

  • The second term indicates if a full N-terminus (start), a full C-terminus (end), both (full) or neither (fragment) was predicted.

  • The first of this three describes if the protein was predicted denovo (denovo) or by the best BLAST (blast,fulllength,frameshift) hit.
    In the later case it was marked "fulllength", if the length of the best BLAST hit on the contig sequence is at least 90% of the length of the hitted protein and additionally "full" (see above) was predicted. If there were hits in different reading frames of the best hit it was indicated as "frameshift".

Cvelia_clustered.fa
Translated C. velia contigs corresponding to the 3151 (3038 now without contamination) clusters used in the publication. The header consists of our identifier followed by the matching GenBank id.

plastid_proteins.fa
"Nuclear-encoded, predicted plastid proteins" as in the supplementary data, but with GenBank ids added.

secretory_proteins.fa
"Secretory sequences" as in the supplementary data, but with GenBank ids added.

GBtoOUR.txt
GenBank ids with the corresponding versions of our identifiers (tab seperated).

protein_database.fa.gz
Protein database used with own sequence headers.

EST_database.fa.gz
EST database used with own sequence headers.

assignment_database.txt.gz
Assignment data to get original headers from our own sequence headers (protein_database.fa.gz;EST_database.fa.gz) and additional features.
We merged multiple identical sequences in single entries of own headers seperated by ";".


    Files    
Cvelia_clustered.fa
Cvelia_contigs.fa
Cvelia_translated.fa
EST_database.fa.gz
GBtoOUR.txt
assignment_database.txt.gz
contamination.txt
plastid_proteins.fa
protein_database.fa.gz
secretory_proteins.fa
supplementary_data.doc

    Datenschutz    
    DSGVO    
Datenschutzerklärung nach den Vorgaben der DSGVO.