How to contribute annotations to PR2 reference database


  1. Contact one member of the Core Team
  2. Explain which group you want to annotate. It can be a genus, a class or any other taxonomic level.
  3. We will send you an Excel file and a fasta file containing all existing PR2 sequences for the group you are expert in.
  4. You can alternatively download the data from the web interface
  5. Follow the instructions below to update or add data.
  6. Send back the updated Excel file
  7. Your contribution will be added to the next release of PR2 (we are doing 2 to 3 releases per year).
  8. You will be acknowledged as a contributor on the PR2 web site

Files provided

Two files will be provided to you

  • An excel file with 2 sheets (taxonomy, sequences)
  • A fasta file with the current taxonomy

Please edit the Excel file by marking all your changes in yellow.

Excel - Taxonomy - do not edit

Taxonomy sheet.
  • This sheet provides a summary of the current taxonomy of the group with the number of sequences for each species (n).
  • Please do not edit this file directly, this is only for your information.

Excel - Sequences - edit only this file

Sequences sheet.
  • Each sequence has a unique identifier (pr2_accession) which is based on the GenBank accession (genbank_accession).
  • For each sequence, the full taxonomic path is provided along with metadata (see here for a full description of the fields).

Modifying or adding entries

Only change entries in the Sequence table

Update sequence entries.
  • You can
    • modify the taxonomy of a given entry
    • add new metadata. If your metadata do not fit the existing columns, just add more columns and we will see how to incorporate them.
  • You can change the ranks (supergroup to genus) if necessary but you must make sure that:
    • you follow exactly the PR2 conventions which are detailed here (see the second paragraph). In particular, any taxonomic name can only appear in a single column (taxonomic level). Use the _X convention to distinguish different levels with the same name.
    • you are consistent for all sequences belonging to the same taxon.
  • You can also add new species as needed.
  • Please see the figure above for some examples of changes
    • 1 - These entries are unchanged
    • 2 - These entries have been reassigned to a new species
    • 3 - These are new entries. Provide the following information
      • genbank_accession. We will download the sequence and all genbank metadata, so no need for you to do it
      • taxonomy assignation. If the species is already present in the database you can just provide the species name
      • if the sequence is not limited to the 18S, but also contains the ITS, please provide the coordinates on the sequence of the start and end of the 18S rRNA gene.
    • 4 - You can also indicate whether the new sequence is a reference sequence. Reference sequences are have a high quality, preferentially full length and are representative a given taxon.
Daniel Vaulot
CNRS, France

Focusing on marine (pico)phytoplankton .