PR2 version 4.12


Database structure

  • Table pr2_main - add fields
    • gene - 18S_RNA, 16S_RNA
    • organelle - nucleus, plastid, mitochondria, nucleomorph, apicoplast (left empty for cyanobacteria)
  • Table pr2_metadata - add or modify fields
    • gb_organelle - import the corresponding gb field
    • pr2_sequence_origin - add other possibilities such as genome and metagenome
    • pr2_continent, pr2_country, pr2_country_lat, pr2_country_lon - geographical origin extracted from gb_country field
    • pr2_location, pr2_location_lat, pr2_location_lon - geographical origin extracted from gb_country field.
    • pr2_ocean, pr2_sea, pr2_sea_lat, pr2_sea_lon - extracted from gb_country field and gb_isolation_source
  • Table pr2_sequences - add fields
  • Table pr2_taxonomy - add fields
    • taxon_trophic_mode - detailed trophic mode (e.g. “C-fixation constitutive; Mixotroph”)

Clean up

  • 1692 sequences that had more than 2 consecutive “NN” have been removed

Files provided

  • We are now providing separate files for 18S nuclear and 16S plastid sequences for UTAX, dada2, fasta and mothur/Qiime formats.
  • The merged file contains both 18S and 16S sequences.
  • The metadata file is not provided any more since metadata can be found in the merged file.
  • The whole pr2 database is also provided as an SQLite file. It contains the different tables making up pr2.

Taxonomy changed

  • Apicomplexa
    • Taxonomy completely revised following del Campo et al. (2019)
    • New sequences: 2619
    • Updated sequences: 5889+239
    • Removed sequences: 89
  • Stramenopiles - Higher ranks changed according to Massana et al. 2014, Derelle et al 2016, and Adl et al. 2019 compiled by R. Massana.
  • Diatoms - Chaetoceros - 196 new sequences have been added from Gaonkar et al. (2019) with help from B. Edvardsen
  • Chlorophyta
    • Mamiellophyceae - Micromonas clades have been updated according to Tragin and Vaulot 2019.
    • Prasinophytes clade IX - Separation between clades IXA and IXB removed waiting for analysis.
  • Cryptophyceae - Cryptomonadales moved from family to order.
  • Cercozoa - Class Chlorarachniophyceae replaces Filosa-Chlorarachnea

Plastid 16S sequences and cyanos

Data originate from the PhytoRef database (Decelle et al. 2015). Taxonomy has been harmonized with the PR2 taxonomy framework. In particular going from 12 levels to 8 taxonomy levels. This integration of plastid sequences should be helpful to researchers that get metabarcodes for both 16S and 18S rRNA. * 16S plastid sequences added: 6049 * 16S cyanobacteria sequence added: 42


Sequence geo-localisation : Following up on the very good post of Margaret Brisbin, the geoname server ( and the fuzzywuzzy Python library has been used to provide information about sequence location origin. Country and/or ocean are now provided for 90,788 GenBank entries with countries/ocean coordinates.


R Scripts used

Daniel Vaulot
CNRS, France

Focusing on marine (pico)phytoplankton .