PR2 version 4.12
Contributors
- Javier del Campo - U of Miami - Apicomplexa
- Ramon Massana - ICM, Barcelona - Stramenopiles
- Daniel Vaulot - CNRS Roscoff - Metadata geolocalisation - based on a idea of Margaret Brisbin
- Johan Decelle and Daniel Vaulot - CNRS Grenoble and Roscoff - Plastid sequences (PhytoRef database)
- Chetan Gaonkar - Naples - Chaetoceros - with help from B. Edvardsen
Database structure
- Table pr2_main - add fields
- gene - 18S_RNA, 16S_RNA
- organelle - nucleus, plastid, mitochondria, nucleomorph, apicoplast (left empty for cyanobacteria)
- Table pr2_metadata - add or modify fields
- gb_organelle - import the corresponding gb field
- pr2_sequence_origin - add other possibilities such as genome and metagenome
- pr2_continent, pr2_country, pr2_country_lat, pr2_country_lon - geographical origin extracted from gb_country field
- pr2_location, pr2_location_lat, pr2_location_lon - geographical origin extracted from gb_country field.
- pr2_ocean, pr2_sea, pr2_sea_lat, pr2_sea_lon - extracted from gb_country field and gb_isolation_source
- Table pr2_sequences - add fields
- sequence_hash - hash value of the sequence (using R function
digest::sha1
, see https://cran.r-project.org/web/packages/openssl/vignettes/crypto_hashing.html)
- sequence_hash - hash value of the sequence (using R function
- Table pr2_taxonomy - add fields
- taxon_trophic_mode - detailed trophic mode (e.g. “C-fixation constitutive; Mixotroph”)
Clean up
- 1692 sequences that had more than 2 consecutive “NN” have been removed
Files provided
- We are now providing separate files for 18S nuclear and 16S plastid sequences for UTAX, dada2, fasta and mothur/Qiime formats.
- The merged file contains both 18S and 16S sequences.
- The metadata file is not provided any more since metadata can be found in the merged file.
- The whole pr2 database is also provided as an SQLite file. It contains the different tables making up pr2.
Taxonomy changed
- Apicomplexa
- Taxonomy completely revised following del Campo et al. (2019)
- New sequences: 2619
- Updated sequences: 5889+239
- Removed sequences: 89
- Stramenopiles - Higher ranks changed according to Massana et al. 2014, Derelle et al 2016, and Adl et al. 2019 compiled by R. Massana.
- Diatoms - Chaetoceros - 196 new sequences have been added from Gaonkar et al. (2019) with help from B. Edvardsen
- Chlorophyta
- Mamiellophyceae - Micromonas clades have been updated according to Tragin and Vaulot 2019.
- Prasinophytes clade IX - Separation between clades IXA and IXB removed waiting for analysis.
- Cryptophyceae - Cryptomonadales moved from family to order.
- Cercozoa - Class Chlorarachniophyceae replaces Filosa-Chlorarachnea
Plastid 16S sequences and cyanos
Data originate from the PhytoRef database (Decelle et al. 2015). Taxonomy has been harmonized with the PR2 taxonomy framework. In particular going from 12 levels to 8 taxonomy levels. This integration of plastid sequences should be helpful to researchers that get metabarcodes for both 16S and 18S rRNA. * 16S plastid sequences added: 6049 * 16S cyanobacteria sequence added: 42
Metadata
Sequence geo-localisation : Following up on the very good post of Margaret Brisbin, the geoname server (http://www.geonames.org/export/geonames-search.html) and the fuzzywuzzy Python library has been used to provide information about sequence location origin. Country and/or ocean are now provided for 90,788 GenBank entries with countries/ocean coordinates.
References
- Adl, S.M., Bass, D., Lane, C.E., Lukeš, J., Schoch, C.L., Smirnov, A., Agatha, S. et al. 2019. Revisions to the Classification, Nomenclature, and Diversity of Eukaryotes. J. Eukaryot. Microbiol. 66:4–119.
- Decelle, J., Romac, S., Stern, R.F., Bendif, E.M., Zingone, A., Audic, S., Guiry, M.D. et al. 2015. PhytoREF: a reference database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy. Mol. Ecol. Resour. 15:1435–1445.
- Derelle, R., López-García, P., Timpano, H. & Moreira, D. 2016. A Phylogenomic Framework to Study the Diversity and Evolution of Stramenopiles (=Heterokonts). Mol. Biol. Evol. 33:2890–8.
- Gaonkar, C.C., Piredda, R., Minucci, C., Mann, D.G., Montresor, M., Sarno, D. & Kooistra, W.H.C.F. 2018. Annotated 18S and 28S rDNA reference sequences of taxa in the planktonic diatom family Chaetocerotaceae. PLoS One. 13:e0208929.
- Massana, R., del Campo, J., Sieracki, M.E., Audic, S. & Logares, R. 2014. Exploring the uncultured microeukaryote majority in the oceans: reevaluation of ribogroups within stramenopiles. ISME J. 8:854–66.
- del Campo J., Heger T., Rodríguez-Martínez R., Worden AZ., Richards TA., Massana R., Keeling PJ. 2018. A new framework for the study of apicomplexan diversity across environments. bioRxiv 1. DOI: 10.1101⁄494880.