Database fields
We are providing as part of the pr2database package two data frames as functions:
- pr2_database(): the main PR2 reference database containing both 18S rRNA and plastid 16S rRNA sequences
- pr2_taxonomy(): the main PR2 reference database containing both 18S rRNA and plastid 16S rRNA sequences
pr2_database()
The PR2 reference database is provided as a data frame called through the functionpr2database::pr2_database(). This is a join between the following tables:
- pr2_main
- pr2_taxonomy
- pr2_sequence
- pr2_metadata
- pr2_countries
- pr2_traits
- pr2_assign_silva
- eukribo_v2
The metadata contains several types of fields:
- gb_ : originating from the GenBank entry
- eukref_ : annotated by the Eukref project
- pr2_ : annotated by pr2 such latitude and longitude
- eukribo_ : from the EukRibo database
- silva_ : from the Silva database
Detailed description of fields
| Fields | Comment |
|---|---|
| pr2_accession | PR2 specific accession number |
| genbank_accession | Genbank accession number (without the vresion) |
| start | Start of sequence in Genbank entry |
| end | End of sequence in Genbank entry |
| label | Label explaining origin of sequence |
| G: genomic sequence containing a described intron (rDNA) | |
| R: the previous genomic rRNA sequence, without the intron(s) | |
| U: no intron described, but intron(s) may be present | |
| UC: introns were detected in silico and removed from the sequence (putative rRNA) | |
| gene | 18S_rRNA or 16S_rRNA |
| organelle | nucleus, plastid, mitochondria, nucleomorph, apicoplast (left empty for cyanobacteria) |
| reference_sequence | = 1, this is a reference sequence that can be used for example for alignements |
| added_version | PR2 version when sequence was added |
| remark | Remark concerning the sequence |
| domain | rank 1 |
| supergroup | rank 2 |
| division | rank 3 |
| subdivision | rank 4 |
| class | rank 5 |
| order | rank 6 |
| family | rank 7 |
| genus | rank 8 |
| species | Assigned species - rank 9 |
| reference | Reference in the litterature concerning the taxonomy |
| sequence | Sequence |
| sequence_length | Length of sequence |
| ambiguities | Number of ambiguities |
| sequence_hash | Hash value of sequence |
| gb_date | Genbank: Date |
| gb_locus | Genbank: Locus |
| gb_definition | Genbank: Definition |
| gb_organism | Genbank: Organism |
| gb_taxonomy | Genbank: Taxonomy |
| gb_strain | Genbank: Strain |
| gb_culture_collection | Genbank: Culture Collection |
| gb_clone | Genbank: Clone |
| gb_isolate | Genbank: Isolate |
| gb_isolation_source | Genbank: Isolation Source |
| gb_specimen_voucher | Genbank: Voucher |
| gb_host | Genbank: Host |
| gb_collection_date | Genbank: Date of Collection |
| gb_environmental_sample | Genbank: Environmental Sample |
| gb_country | Genbank: Country |
| gb_lat_lon | Genbank: lat Lon |
| gb_collected_by | Genbank: Collected by |
| gb_note | Genbank: Note |
| gb_references | Genbank: Full references not parsed |
| gb_publication | Genbank: Publication |
| gb_authors | Genbank: Authors |
| gb_journal | Genbank: Journal |
| pubmed_id | Genbank: Pubmed ID |
| eukref_name | Eukref: Name use in EukRef, usually either the species name or the clone name |
| eukref_source | Eukref: Source of the sequence : Isolate or Environmental |
| eukref_env_material | Eukref: uses ENVO keywords |
| eukref_env_biome | Eukref: uses ENVO keywords |
| eukref_biotic_relationship | Eukref: eg parasite |
| eukref_specific_host | Eukref: Specific Host annotated |
| eukref_geo_loc_name | Eukref: Location name annotated |
| eukref_notes | Eukref: Notes made during Eukref annotation |
| pr2_sample_type | PR2: e.g. culture, isolate, environmental, unknown |
| pr2_sample_method | PR2: e.g. filtration, flow cytometry sorting |
| pr2_latitude | PR2: Parsed from GenBank entry |
| pr2_longitude | PR2: Parsed from GenBank entry |
| pr2_ocean | PR2: e.g. Arctic Ocean |
| pr2_sea | PR2: e.g. North Sea |
| pr2_sea_lat | PR2: latitude of sea or ocean |
| pr2_sea_lon | PR2: longitude of sea or ocean |
| pr2_continent | PR2: e.g. Asia |
| pr2_country | PR2: e.g. France |
| pr2_country_geocode | PR2: 2 letter code from genonames - e.g. FR |
| pr2_country_lat | PR2: latitude of country |
| pr2_country_lon | PR2: longitude of country |
| pr2_location | PR2: from gb_country field - e.g. Paris, France |
| pr2_location_geoname | PR2: e.g. Paris |
| pr2_location_geotype | PR2: e.g. bay |
| pr2_location_lat | PR2: latitude of location |
| pr2_location_lon | PR2: longtitude of location |
| pr2_sequence_origin | PR2: clone library, metabarcode, PCR |
| pr2_size_fraction | PR2: Name of size fraction, e.g. pico, nano |
| pr2_size_fraction_min | PR2: Minimum size filtered, e.g. 0.2 µm |
| pr2_size_fraction_max | PR2: Maximum size filtered, e.g. 20 µm |
| mixoplanton | from the Mixoplankton database (MDB) |
| * CM - Constitutive Mixoplankton | |
| * GNCM - Generalist Non-Constitutive Mixoplankton | |
| * pSNCM - plastidic Specialist Non-Constitutive Mixoplankton | |
| * eSNCM - endosymbiotic Specialist Non-Constitutive Mixoplankton | |
| metadata_remark | PR2: Any remark on metadata |
| eukribo_UniEuk_taxonomy_string | Taxonomy assignment from EukRibo database |
| eukribo_V4 | Information about presence and completeness V4 region from EukRibo database |
| eukribo_V9 | Information about presence and completeness V9 region from EukRibo database |
| silva_taxonomy | taxonomy from Silva version 138 |