Only queries galah for taxa not already in taxonomy_file
. Can return a list,
for several levels of taxonomic hierarchy, with the 'best' match at that
level. For example, if 'genus' is provided in needed_ranks
, the returned
list will have an element 'genus' that contains, in a column named taxa
,
and for each of the original names provided, the best result at genus level
or higher (in cases where no genus level match was available).
make_taxonomy(
df = NULL,
taxa_col = "original_name",
taxonomy_file = tempfile(),
force_new = list(original_name = NULL, timediff = as.difftime(26, units = "weeks")),
remove_taxa = c("bold:", "BOLD:", "unverified", "annual herb", "annual grass",
"incertae sedis", "\\?", "another species", "not naturalised in SA",
"unidentified", "unverified", "annual tussock grass", "*no id", "spec\\."),
remove_strings = c("\\s\\-\\-\\s.*", "\\ssp\\.$", "\\sssp\\.$",
"\\sspec\\.$", "dead"),
not_names = c("sp", "ssp", "var", "subsp", "subspecies", "form", "race", "nov", "aff",
"cf", "lineage", "group", "et", "al", "and", "pl", "revised", "nov"),
tri_strings = c("\\sssp\\s", "\\sssp\\.\\s", "\\svar\\s",
"\\svar\\.\\s", "\\ssubsp\\.", "\\ssubspecies", "\\sform\\)",
"\\sform\\s", "\\sf\\.", "\\srace\\s", "\\srace\\)",
"\\sp\\.v\\."),
atlas = c("Australia"),
tweak_species = TRUE,
return_taxonomy = TRUE,
limit = TRUE,
needed_ranks = c("species"),
overrides = NULL
)
Dataframe with taxa_col
. Can be NULL
only if taxonomy_file
already exists.
Character or index. Name or index of column with taxa names.
Each unique taxa in this column will be queried against galah::search_taxa
and appear in the results list element lutaxa
in a column called
original_name
Character. File path to save results to. File type ignored. .parquet file used.
List with elements difftime
and any column name from
taxonomy_file
. If taxonomy_file
already exists any column matches between
force_new
and taxonomy_file
, matching levels within that column will be
requeried. Likewise any original_name
that has not been searched since
difftime
will be requeried. Set either to NULL
to ignore.
Character. Rows with regular expressions in taxa_col
that match remove_taxa
are removed (rows are removed).
Character. Text that matches remove_strings
is
removed from the taxa_col
before searching (text, not row, is removed).
Character. Text that matches non_name_strings
is used to
remove non-names from original_names before a word count to indicate (guess)
if the original_name is trinomial (original_is_tri field in lutaxa).
Character. Text that matches tri_strings
is
used to indicate if the original_name is trinomial (original_is_tri field in
output lutaxa).
Character. Name of galah atlas to use.
Logical. If TRUE
, a list is returned containing the
best match for each original_name in lutaxa
and additional elements named
for their rank (see envClean::lurank
) with unique rows for that rank. One
element per rank provided in needed_ranks
Logical. If TRUE
the returned list will be limited to those
original_name
s in df
Character vector of ranks required in the returned list.
Can be "all" or any combination of ranks from envClean::lurank
greater than
or equal to subspecies.
Used to override results returned by galah::search_taxa()
.
Dataframe with (at least) columns: taxa_col
and taxa_to_search
.
Can also contain any number of use_x
columns where x
is any of
kingdom, phylum, class, order, family, genus, species, subspecies, variety and form. A two step process then attempts
to find better results than if searched on taxa_col
. Step 1 searches for
taxa_to_search
instead of taxa_col
. If any use_x
columns are present,
step 2 then checks that the results from step 1 have a result at x
. If not,
level x
results will be taken from use_x
.
Logical. If TRUE
(default) and the returned species
column result ends in a full stop, the values returned in the species
column will be directly taken from the scientific_name
column. See details.
Null or list (depending on return_taxonomy
). Writes
taxonomy_file
. taxa_col
will be original_name
in any outputs. Note that
taxa_col
, as original_name
, will have any quotes removed.
If list, then elements:
raw - the 'raw' results returned from galah::search_taxa()
, tweaked
by: column rank
is an ordered factor as per envClean::lurank
;
rank_adj is a new column that will reflect the rank column unless rank is
less than subspecies, in which case it will be subspecies; and
original_is_tri is a new column
needed_ranks - One element for each rank specified in needed_ranks
.
lutaxa - dataframe. For each unique name in taxa_col
, the best
taxa
taxonomic bin to use, for each original_name
, taking into
account each level of needed_ranks
original_name - unique values from taxa_col
match_type - directly from galah::search_taxa()
matched_rank - rank
column from galah::search_taxa()
returned_rank - the rank of the taxa
returned for each
original_name
. This will never be lower than needed_rank
but
may be higher than needed_rank
if no match was available at
needed_rank
. Use this 'rank' to filter bins in a cleaning
workflow
taxa - the best taxa available for original_name
at
needed_rank
, perhaps taking into account overrides
override - is the taxa
the result of an override?
original_is_tri - Experimental. Is the original_name
a
trinomial? Highlights cases where the matched rank is > subspecies
but the original_name
is probably a subspecies. Guesses
are based on a word count after removal of: not_names
; numbers;
punctuation; capitalised words that are not the first word; and single
letter 'words'. tri_strings
override the guess - flagging TRUE.
Note, clearly, this is only an (informed) guess at whether the
original_name
is trinomial.
taxonomy - dataframe. For each taxa
in lutaxa
a row of
taxonomic hierarchy
The argument tweak_species
replaces the galah::search_taxa()
result in
the species
column with the result in the scientific_name
column. This
attempt to deal with instances where galah::search_taxa()
returns odd
results in species
but good results in scientific_name
. e.g.
galah::search_taxa("Acacia sp. Small Red-leaved Wattle (J.B.Williams 95033)")
returns spec.
in the species column but
Acacia sp. Small Red-leaved Wattle (J.B.Williams 95033)
in the
scientific_name
column
Previous envClean::make_taxonomy()
function is still available via
envClean::make_gbif_taxonomy()
# setup
# library("envClean")
temp_file <- tempfile()
taxa_df <- tibble::tibble(taxa = c("Charadrius rubricollis"
, "Thinornis cucullatus"
, "Melithreptus gularis laetior"
, "Melithreptus gularis gularis"
, "Eucalyptus viminalis"
, "Eucalyptus viminalis cygnetensis"
, "Eucalyptus"
, "Charadrius mongolus all subspecies"
, "Bettongia lesueur Barrow and Boodie Islands subspecies"
, "Lagorchestes hirsutus Central Australian subspecies"
, "Perameles gunnii Victorian subspecies"
, "Pterostylis sp. Rock ledges (pl. 185, Bates & Weber 1990)"
, "Spyridium glabrisepalum"
, "Spyridium eriocephalum var. glabrisepalum"
, "Petrogale lateralis (MacDonnell Ranges race)"
, "Gehyra montium (revised)"
, "Korthalsella japonica f. japonica"
, "Galaxias sp. nov. 'Hunter'"
, "Not a taxa"
)
)
# make taxonomy (returns list and writes taxonomy_file)
taxonomy <- make_taxonomy(df = taxa_df
, taxa_col = "taxa"
, taxonomy_file = temp_file
, needed_ranks = c("kingdom", "genus", "species", "subspecies")
)
#> Joining with `by = join_by(original_name)`
#> Matched 17 of 19 taxonomic search terms in selected atlas (Australia).
#> 2 unmatched search terms:
#> • "Galaxias sp. nov. Hunter", "Not a taxa"
#>
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> saving results to H:/temp/nige\RtmpYfG5ks\file4c801f265321.parquet
#> The following were completely unmatched: Galaxias sp. nov. Hunter and Not a taxa. Consider providing more taxonomic levels, or an override, for each unmatched taxa?
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
taxonomy$raw
#> # A tibble: 19 × 19
#> original_name search_term scientific_name scientific_name_auth…¹
#> <chr> <chr> <chr> <chr>
#> 1 Bettongia lesueur Barrow … Bettongia … Bettongia lesu… NA
#> 2 Charadrius mongolus all s… Charadrius… Charadrius (Ch… Pallas, 1776
#> 3 Charadrius rubricollis Charadrius… Phalaropus lob… (Linnaeus, 1758)
#> 4 Eucalyptus Eucalyptus Eucalyptus L'Hér.
#> 5 Eucalyptus viminalis Eucalyptus… Eucalyptus vim… Labill.
#> 6 Eucalyptus viminalis cygn… Eucalyptus… Eucalyptus vim… Boomsma
#> 7 Galaxias sp. nov. Hunter Galaxias s… NA NA
#> 8 Gehyra montium (revised) Gehyra mon… Gehyra montium Storr, 1982
#> 9 Korthalsella japonica f. … Korthalsel… Korthalsella j… (Thunb.) Engl.
#> 10 Lagorchestes hirsutus Cen… Lagorchest… Lagorchestes h… Gould, 1844
#> 11 Melithreptus gularis gula… Melithrept… Melithreptus (… (Gould, 1837)
#> 12 Melithreptus gularis laet… Melithrept… Melithreptus (… Gould, 1875
#> 13 Not a taxa Not a taxa NA NA
#> 14 Perameles gunnii Victoria… Perameles … Perameles gunn… NA
#> 15 Petrogale lateralis (MacD… Petrogale … Petrogale late… Gould, 1842
#> 16 Pterostylis sp. Rock ledg… Pterostyli… Pterostylis R.Br.
#> 17 Spyridium eriocephalum va… Spyridium … Spyridium erio… J.M.Black
#> 18 Spyridium glabrisepalum Spyridium … Spyridium erio… J.M.Black
#> 19 Thinornis cucullatus Thinornis … Thinornis cucu… (Vieillot, 1818)
#> # ℹ abbreviated name: ¹scientific_name_authorship
#> # ℹ 15 more variables: taxon_concept_id <chr>, rank <ord>, match_type <chr>,
#> # kingdom <chr>, phylum <chr>, class <chr>, order <chr>, family <chr>,
#> # genus <chr>, species <chr>, vernacular_name <chr>, stamp <dttm>,
#> # subspecies <chr>, rank_adj <ord>, original_is_tri <lgl>
taxonomy$kingdom
#> $lutaxa
#> # A tibble: 17 × 6
#> original_name match_type matched_rank returned_rank taxa original_is_tri
#> <chr> <chr> <ord> <ord> <chr> <lgl>
#> 1 Bettongia lesueu… exactMatch subspecies kingdom Anim… TRUE
#> 2 Charadrius mongo… higherMat… species kingdom Anim… FALSE
#> 3 Charadrius rubri… exactMatch species kingdom Anim… FALSE
#> 4 Eucalyptus exactMatch genus kingdom Plan… FALSE
#> 5 Eucalyptus vimin… exactMatch species kingdom Plan… FALSE
#> 6 Eucalyptus vimin… exactMatch subspecies kingdom Plan… TRUE
#> 7 Gehyra montium (… canonical… species kingdom Anim… FALSE
#> 8 Korthalsella jap… higherMat… species kingdom Plan… TRUE
#> 9 Lagorchestes hir… canonical… species kingdom Anim… TRUE
#> 10 Melithreptus gul… exactMatch subspecies kingdom Anim… TRUE
#> 11 Melithreptus gul… exactMatch subspecies kingdom Anim… TRUE
#> 12 Perameles gunnii… exactMatch subspecies kingdom Anim… TRUE
#> 13 Petrogale latera… canonical… species kingdom Anim… TRUE
#> 14 Pterostylis sp. … exactMatch genus kingdom Plan… FALSE
#> 15 Spyridium erioce… exactMatch variety kingdom Plan… TRUE
#> 16 Spyridium glabri… exactMatch variety kingdom Plan… TRUE
#> 17 Thinornis cucull… exactMatch species kingdom Anim… FALSE
#>
#> $taxonomy
#> # A tibble: 2 × 2
#> taxa kingdom
#> <chr> <chr>
#> 1 Animalia Animalia
#> 2 Plantae Plantae
#>
taxonomy$genus
#> $lutaxa
#> # A tibble: 17 × 6
#> original_name match_type matched_rank returned_rank taxa original_is_tri
#> <chr> <chr> <ord> <ord> <chr> <lgl>
#> 1 Bettongia lesueu… exactMatch subspecies genus Bett… TRUE
#> 2 Charadrius mongo… higherMat… species genus Char… FALSE
#> 3 Charadrius rubri… exactMatch species genus Phal… FALSE
#> 4 Eucalyptus exactMatch genus genus Euca… FALSE
#> 5 Eucalyptus vimin… exactMatch species genus Euca… FALSE
#> 6 Eucalyptus vimin… exactMatch subspecies genus Euca… TRUE
#> 7 Gehyra montium (… canonical… species genus Gehy… FALSE
#> 8 Korthalsella jap… higherMat… species genus Kort… TRUE
#> 9 Lagorchestes hir… canonical… species genus Lago… TRUE
#> 10 Melithreptus gul… exactMatch subspecies genus Meli… TRUE
#> 11 Melithreptus gul… exactMatch subspecies genus Meli… TRUE
#> 12 Perameles gunnii… exactMatch subspecies genus Pera… TRUE
#> 13 Petrogale latera… canonical… species genus Petr… TRUE
#> 14 Pterostylis sp. … exactMatch genus genus Pter… FALSE
#> 15 Spyridium erioce… exactMatch variety genus Spyr… TRUE
#> 16 Spyridium glabri… exactMatch variety genus Spyr… TRUE
#> 17 Thinornis cucull… exactMatch species genus Thin… FALSE
#>
#> $taxonomy
#> # A tibble: 13 × 7
#> taxa kingdom phylum class order family genus
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Bettongia Animalia Chordata Mammalia Diprotodontia Potoroi… Bett…
#> 2 Charadrius Animalia Chordata Aves Charadriiformes Charadr… Char…
#> 3 Phalaropus Animalia Chordata Aves Charadriiformes Scolopa… Phal…
#> 4 Eucalyptus Plantae Charophyta Equisetopsida Myrtales Myrtace… Euca…
#> 5 Gehyra Animalia Chordata Reptilia Squamata Gekkoni… Gehy…
#> 6 Korthalsella Plantae Charophyta Equisetopsida Santalales Santala… Kort…
#> 7 Lagorchestes Animalia Chordata Mammalia Diprotodontia Macropo… Lago…
#> 8 Melithreptus Animalia Chordata Aves Passeriformes Melipha… Meli…
#> 9 Perameles Animalia Chordata Mammalia Peramelemorphia Peramel… Pera…
#> 10 Petrogale Animalia Chordata Mammalia Diprotodontia Macropo… Petr…
#> 11 Pterostylis Plantae Charophyta Equisetopsida Asparagales Orchida… Pter…
#> 12 Spyridium Plantae Charophyta Equisetopsida Rosales Rhamnac… Spyr…
#> 13 Thinornis Animalia Chordata Aves Charadriiformes Charadr… Thin…
#>
taxonomy$species
#> $lutaxa
#> # A tibble: 17 × 6
#> original_name match_type matched_rank returned_rank taxa original_is_tri
#> <chr> <chr> <ord> <ord> <chr> <lgl>
#> 1 Bettongia lesueu… exactMatch subspecies species Bett… TRUE
#> 2 Charadrius mongo… higherMat… species species Char… FALSE
#> 3 Charadrius rubri… exactMatch species species Phal… FALSE
#> 4 Eucalyptus exactMatch genus genus Euca… FALSE
#> 5 Eucalyptus vimin… exactMatch species species Euca… FALSE
#> 6 Eucalyptus vimin… exactMatch subspecies species Euca… TRUE
#> 7 Gehyra montium (… canonical… species species Gehy… FALSE
#> 8 Korthalsella jap… higherMat… species species Kort… TRUE
#> 9 Lagorchestes hir… canonical… species species Lago… TRUE
#> 10 Melithreptus gul… exactMatch subspecies species Meli… TRUE
#> 11 Melithreptus gul… exactMatch subspecies species Meli… TRUE
#> 12 Perameles gunnii… exactMatch subspecies species Pera… TRUE
#> 13 Petrogale latera… canonical… species species Petr… TRUE
#> 14 Pterostylis sp. … exactMatch genus genus Pter… FALSE
#> 15 Spyridium erioce… exactMatch variety species Spyr… TRUE
#> 16 Spyridium glabri… exactMatch variety species Spyr… TRUE
#> 17 Thinornis cucull… exactMatch species species Thin… FALSE
#>
#> $taxonomy
#> # A tibble: 14 × 8
#> taxa kingdom phylum class order family genus species
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Bettongia lesueur Animalia Chordata Mammal… Dipr… Potor… Bett… Betton…
#> 2 Charadrius mongolus Animalia Chordata Aves Char… Chara… Char… Charad…
#> 3 Phalaropus lobatus Animalia Chordata Aves Char… Scolo… Phal… Phalar…
#> 4 Eucalyptus Plantae Charophyta Equise… Myrt… Myrta… Euca… NA
#> 5 Eucalyptus viminalis Plantae Charophyta Equise… Myrt… Myrta… Euca… Eucaly…
#> 6 Gehyra montium Animalia Chordata Reptil… Squa… Gekko… Gehy… Gehyra…
#> 7 Korthalsella japonica Plantae Charophyta Equise… Sant… Santa… Kort… Kortha…
#> 8 Lagorchestes hirsutus Animalia Chordata Mammal… Dipr… Macro… Lago… Lagorc…
#> 9 Melithreptus gularis Animalia Chordata Aves Pass… Melip… Meli… Melith…
#> 10 Perameles gunnii Animalia Chordata Mammal… Pera… Peram… Pera… Perame…
#> 11 Petrogale lateralis Animalia Chordata Mammal… Dipr… Macro… Petr… Petrog…
#> 12 Pterostylis Plantae Charophyta Equise… Aspa… Orchi… Pter… NA
#> 13 Spyridium eriocephalum Plantae Charophyta Equise… Rosa… Rhamn… Spyr… Spyrid…
#> 14 Thinornis cucullatus Animalia Chordata Aves Char… Chara… Thin… Thinor…
#>
taxonomy$subspecies
#> $lutaxa
#> # A tibble: 17 × 6
#> original_name match_type matched_rank returned_rank taxa original_is_tri
#> <chr> <chr> <ord> <ord> <chr> <lgl>
#> 1 Bettongia lesueu… exactMatch subspecies subspecies Bett… TRUE
#> 2 Charadrius mongo… higherMat… species species Char… FALSE
#> 3 Charadrius rubri… exactMatch species species Phal… FALSE
#> 4 Eucalyptus exactMatch genus genus Euca… FALSE
#> 5 Eucalyptus vimin… exactMatch species species Euca… FALSE
#> 6 Eucalyptus vimin… exactMatch subspecies subspecies Euca… TRUE
#> 7 Gehyra montium (… canonical… species species Gehy… FALSE
#> 8 Korthalsella jap… higherMat… species species Kort… TRUE
#> 9 Lagorchestes hir… canonical… species species Lago… TRUE
#> 10 Melithreptus gul… exactMatch subspecies subspecies Meli… TRUE
#> 11 Melithreptus gul… exactMatch subspecies subspecies Meli… TRUE
#> 12 Perameles gunnii… exactMatch subspecies subspecies Pera… TRUE
#> 13 Petrogale latera… canonical… species species Petr… TRUE
#> 14 Pterostylis sp. … exactMatch genus genus Pter… FALSE
#> 15 Spyridium erioce… exactMatch variety subspecies Spyr… TRUE
#> 16 Spyridium glabri… exactMatch variety subspecies Spyr… TRUE
#> 17 Thinornis cucull… exactMatch species species Thin… FALSE
#>
#> $taxonomy
#> # A tibble: 16 × 9
#> taxa kingdom phylum class order family genus species subspecies
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Bettongia lesueur… Animal… Chord… Mamm… Dipr… Potor… Bett… Betton… Bettongia…
#> 2 Charadrius mongol… Animal… Chord… Aves Char… Chara… Char… Charad… NA
#> 3 Phalaropus lobatus Animal… Chord… Aves Char… Scolo… Phal… Phalar… NA
#> 4 Eucalyptus Plantae Charo… Equi… Myrt… Myrta… Euca… NA NA
#> 5 Eucalyptus vimina… Plantae Charo… Equi… Myrt… Myrta… Euca… Eucaly… NA
#> 6 Eucalyptus vimina… Plantae Charo… Equi… Myrt… Myrta… Euca… Eucaly… Eucalyptu…
#> 7 Gehyra montium Animal… Chord… Rept… Squa… Gekko… Gehy… Gehyra… NA
#> 8 Korthalsella japo… Plantae Charo… Equi… Sant… Santa… Kort… Kortha… NA
#> 9 Lagorchestes hirs… Animal… Chord… Mamm… Dipr… Macro… Lago… Lagorc… NA
#> 10 Melithreptus gula… Animal… Chord… Aves Pass… Melip… Meli… Melith… Melithrep…
#> 11 Melithreptus gula… Animal… Chord… Aves Pass… Melip… Meli… Melith… Melithrep…
#> 12 Perameles gunnii … Animal… Chord… Mamm… Pera… Peram… Pera… Perame… Perameles…
#> 13 Petrogale lateral… Animal… Chord… Mamm… Dipr… Macro… Petr… Petrog… NA
#> 14 Pterostylis Plantae Charo… Equi… Aspa… Orchi… Pter… NA NA
#> 15 Spyridium eriocep… Plantae Charo… Equi… Rosa… Rhamn… Spyr… Spyrid… Spyridium…
#> 16 Thinornis cuculla… Animal… Chord… Aves Char… Chara… Thin… Thinor… NA
#>
# query more taxa (results are added to taxonomy_file but only the new taxa are returned (default `limit = TRUE`)
more_taxa <- tibble::tibble(original_name = c("Amytornis whitei"
, "Amytornis striatus"
, "Amytornis modestus (North, 1902)"
, "Amytornis modestus modestus"
, "Amytornis modestus cowarie"
)
)
taxonomy <- make_taxonomy(df = more_taxa
, taxonomy_file = temp_file
, needed_ranks = c("species")
)
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> saving results to H:/temp/nige\RtmpYfG5ks\file4c801f265321.parquet
#> The following were completely unmatched: Galaxias sp. nov. Hunter and Not a taxa. Consider providing more taxonomic levels, or an override, for each unmatched taxa?
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
taxonomy$species
#> $lutaxa
#> # A tibble: 5 × 6
#> original_name match_type matched_rank returned_rank taxa original_is_tri
#> <chr> <chr> <ord> <ord> <chr> <lgl>
#> 1 Amytornis modestu… canonical… species species Amyt… FALSE
#> 2 Amytornis modestu… exactMatch subspecies species Amyt… TRUE
#> 3 Amytornis modestu… exactMatch subspecies species Amyt… TRUE
#> 4 Amytornis striatus exactMatch species species Amyt… FALSE
#> 5 Amytornis whitei exactMatch species species Amyt… FALSE
#>
#> $taxonomy
#> # A tibble: 3 × 8
#> taxa kingdom phylum class order family genus species
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Amytornis modestus Animalia Chordata Aves Passeriformes Maluri… Amyt… Amytor…
#> 2 Amytornis striatus Animalia Chordata Aves Passeriformes Maluri… Amyt… Amytor…
#> 3 Amytornis whitei Animalia Chordata Aves Passeriformes Maluri… Amyt… Amytor…
#>
# no dataframe supplied - all results in taxonomy_file returned
taxonomy <- make_taxonomy(taxonomy_file = temp_file
, needed_ranks = c("subspecies")
)
#> Joining with `by = join_by(original_name)`
taxonomy$subspecies
#> $lutaxa
#> # A tibble: 22 × 6
#> original_name match_type matched_rank returned_rank taxa original_is_tri
#> <chr> <chr> <ord> <ord> <chr> <lgl>
#> 1 Amytornis modest… canonical… species species Amyt… FALSE
#> 2 Amytornis modest… exactMatch subspecies subspecies Amyt… TRUE
#> 3 Amytornis modest… exactMatch subspecies subspecies Amyt… TRUE
#> 4 Amytornis striat… exactMatch species species Amyt… FALSE
#> 5 Amytornis whitei exactMatch species species Amyt… FALSE
#> 6 Bettongia lesueu… exactMatch subspecies subspecies Bett… TRUE
#> 7 Charadrius mongo… higherMat… species species Char… FALSE
#> 8 Charadrius rubri… exactMatch species species Phal… FALSE
#> 9 Eucalyptus exactMatch genus genus Euca… FALSE
#> 10 Eucalyptus vimin… exactMatch species species Euca… FALSE
#> # ℹ 12 more rows
#>
#> $taxonomy
#> # A tibble: 21 × 9
#> taxa kingdom phylum class order family genus species subspecies
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Amytornis modestus Animal… Chord… Aves Pass… Malur… Amyt… Amytor… NA
#> 2 Amytornis modestu… Animal… Chord… Aves Pass… Malur… Amyt… Amytor… Amytornis…
#> 3 Amytornis modestu… Animal… Chord… Aves Pass… Malur… Amyt… Amytor… Amytornis…
#> 4 Amytornis striatus Animal… Chord… Aves Pass… Malur… Amyt… Amytor… NA
#> 5 Amytornis whitei Animal… Chord… Aves Pass… Malur… Amyt… Amytor… NA
#> 6 Bettongia lesueur… Animal… Chord… Mamm… Dipr… Potor… Bett… Betton… Bettongia…
#> 7 Charadrius mongol… Animal… Chord… Aves Char… Chara… Char… Charad… NA
#> 8 Phalaropus lobatus Animal… Chord… Aves Char… Scolo… Phal… Phalar… NA
#> 9 Eucalyptus Plantae Charo… Equi… Myrt… Myrta… Euca… NA NA
#> 10 Eucalyptus vimina… Plantae Charo… Equi… Myrt… Myrta… Euca… Eucaly… NA
#> # ℹ 11 more rows
#>
# overrrides
overrides <- envClean::taxonomy_overrides
# C. rubricollis binned to Phalarope lobatus at species level!
taxonomy <- make_taxonomy(df = overrides
, taxonomy_file = temp_file
, needed_ranks = c("species", "subspecies")
)
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> saving results to H:/temp/nige\RtmpYfG5ks\file4c801f265321.parquet
#> The following were completely unmatched: Galaxias sp. nov. Hunter and Not a taxa. Consider providing more taxonomic levels, or an override, for each unmatched taxa?
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
taxonomy$species$lutaxa %>%
dplyr::filter(grepl("rubricollis", original_name))
#> # A tibble: 1 × 6
#> original_name match_type matched_rank returned_rank taxa original_is_tri
#> <chr> <chr> <ord> <ord> <chr> <lgl>
#> 1 Charadrius rubric… exactMatch species species Phal… FALSE
# add in override - C. rubricollis is binned to T. cucullatus at species level
taxonomy <- make_taxonomy(df = overrides
, taxonomy_file = temp_file
, needed_ranks = c("species", "subspecies")
, overrides = overrides
)
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name, returned_rank)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> saving results to H:/temp/nige\RtmpYfG5ks\file4c801f265321.parquet
#> The following were completely unmatched: Galaxias sp. nov. Hunter and Not a taxa. Consider providing more taxonomic levels, or an override, for each unmatched taxa?
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
taxonomy$species$lutaxa %>%
dplyr::filter(grepl("rubricollis", original_name))
#> # A tibble: 1 × 7
#> original_name match_type matched_rank returned_rank taxa original_is_tri
#> <chr> <chr> <ord> <ord> <chr> <lgl>
#> 1 Charadrius rubric… exactMatch species species Thin… FALSE
#> # ℹ 1 more variable: override <lgl>
# tweak_species example
make_taxonomy(df = tibble::tibble(original_name = "Acacia sp. Small Red-leaved Wattle (J.B.Williams 95033)")
, tweak_species = FALSE
)$raw %>%
dplyr::select(original_name, scientific_name, species)
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> saving results to H:/temp/nige\RtmpYfG5ks\file4c8078765c97.parquet
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> # A tibble: 1 × 3
#> original_name scientific_name species
#> <chr> <chr> <chr>
#> 1 Acacia sp. Small Red-leaved Wattle (J.B.Williams 9503… Acacia sp. Sma… spec.
make_taxonomy(df = tibble::tibble(original_name = "Acacia sp. Small Red-leaved Wattle (J.B.Williams 95033)")
, tweak_species = TRUE
)$raw %>%
dplyr::select(original_name, scientific_name, species)
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> saving results to H:/temp/nige\RtmpYfG5ks\file4c802c0321c2.parquet
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> # A tibble: 1 × 3
#> original_name scientific_name species
#> <chr> <chr> <chr>
#> 1 Acacia sp. Small Red-leaved Wattle (J.B.Williams 9503… Acacia sp. Sma… Acacia…
# clean up
rm(taxonomy)
unlist(paste0(temp_file, ".parquet"))
#> [1] "H:/temp/nige\\RtmpYfG5ks\\file4c801f265321.parquet"