Database fields

We are providing as part of the pr2database package two data frames as functions:

  • pr2_database(): the main PR2 reference database containing both 18S rRNA and plastid 16S rRNA sequences
  • pr2_taxonomy(): the main PR2 reference database containing both 18S rRNA and plastid 16S rRNA sequences


The PR2 reference database is provided as a data frame called through the functionpr2database::pr2_database(). This is a join between the following tables:

  • pr2_main
  • pr2_taxonomy
  • pr2_sequence
  • pr2_metadata
  • pr2_countries
  • pr2_traits
  • pr2_assign_silva
  • eukribo_v2

The metadata contains several types of fields:

  • gb_ : originating from the GenBank entry
  • eukref_ : annotated by the Eukref project
  • pr2_ : annotated by pr2 such latitude and longitude
  • eukribo_ : from the EukRibo database
  • silva_ : from the Silva database

Detailed description of fields

Fields Comment
pr2_accession PR2 specific accession number
genbank_accession Genbank accession number (without the vresion)
start Start of sequence in Genbank entry
end End of sequence in Genbank entry
label Label explaining origin of sequence
G: genomic sequence containing a described intron (rDNA)
R: the previous genomic rRNA sequence, without the intron(s)
U: no intron described, but intron(s) may be present
UC: introns were detected in silico and removed from the sequence (putative rRNA)
gene 18S_rRNA or 16S_rRNA
organelle nucleus, plastid, mitochondria, nucleomorph, apicoplast (left empty for cyanobacteria)
reference_sequence = 1, this is a reference sequence that can be used for example for alignements
added_version PR2 version when sequence was added
remark Remark concerning the sequence
domain rank 1
supergroup rank 2
division rank 3
subdivision rank 4
class rank 5
order rank 6
family rank 7
genus rank 8
species Assigned species - rank 9
reference Reference in the litterature concerning the taxonomy
sequence Sequence
sequence_length Length of sequence
ambiguities Number of ambiguities
sequence_hash Hash value of sequence
gb_date Genbank: Date
gb_locus Genbank: Locus
gb_definition Genbank: Definition
gb_organism Genbank: Organism
gb_taxonomy Genbank: Taxonomy
gb_strain Genbank: Strain
gb_culture_collection Genbank: Culture Collection
gb_clone Genbank: Clone
gb_isolate Genbank: Isolate
gb_isolation_source Genbank: Isolation Source
gb_specimen_voucher Genbank: Voucher
gb_host Genbank: Host
gb_collection_date Genbank: Date of Collection
gb_environmental_sample Genbank: Environmental Sample
gb_country Genbank: Country
gb_lat_lon Genbank: lat Lon
gb_collected_by Genbank: Collected by
gb_note Genbank: Note
gb_references Genbank: Full references not parsed
gb_publication Genbank: Publication
gb_authors Genbank: Authors
gb_journal Genbank: Journal
pubmed_id Genbank: Pubmed ID
eukref_name Eukref: Name use in EukRef, usually either the species name or the clone name
eukref_source Eukref: Source of the sequence : Isolate or Environmental
eukref_env_material Eukref: uses ENVO keywords
eukref_env_biome Eukref: uses ENVO keywords
eukref_biotic_relationship Eukref: eg parasite
eukref_specific_host Eukref: Specific Host annotated
eukref_geo_loc_name Eukref: Location name annotated
eukref_notes Eukref: Notes made during Eukref annotation
pr2_sample_type PR2: e.g. culture, isolate, environmental, unknown
pr2_sample_method PR2: e.g. filtration, flow cytometry sorting
pr2_latitude PR2: Parsed from GenBank entry
pr2_longitude PR2: Parsed from GenBank entry
pr2_ocean PR2: e.g. Arctic Ocean
pr2_sea PR2: e.g. North Sea
pr2_sea_lat PR2: latitude of sea or ocean
pr2_sea_lon PR2: longitude of sea or ocean
pr2_continent PR2: e.g. Asia
pr2_country PR2: e.g. France
pr2_country_geocode PR2: 2 letter code from genonames - e.g. FR
pr2_country_lat PR2: latitude of country
pr2_country_lon PR2: longitude of country
pr2_location PR2: from gb_country field - e.g. Paris, France
pr2_location_geoname PR2: e.g. Paris
pr2_location_geotype PR2: e.g. bay
pr2_location_lat PR2: latitude of location
pr2_location_lon PR2: longtitude of location
pr2_sequence_origin PR2: clone library, metabarcode, PCR
pr2_size_fraction PR2: Name of size fraction, e.g. pico, nano
pr2_size_fraction_min PR2: Minimum size filtered, e.g. 0.2 µm
pr2_size_fraction_max PR2: Maximum size filtered, e.g. 20 µm
mixoplanton from the Mixoplankton database (MDB)
* CM - Constitutive Mixoplankton
* GNCM - Generalist Non-Constitutive Mixoplankton
* pSNCM - plastidic Specialist Non-Constitutive Mixoplankton
* eSNCM - endosymbiotic Specialist Non-Constitutive Mixoplankton
metadata_remark PR2: Any remark on metadata
eukribo_UniEuk_taxonomy_string Taxonomy assignment from EukRibo database
eukribo_V4 Information about presence and completeness V4 region from EukRibo database
eukribo_V9 Information about presence and completeness V9 region from EukRibo database
silva_taxonomy taxonomy from Silva version 138
Daniel Vaulot
CNRS, France

Focusing on marine (pico)phytoplankton .