Only queries GBIF for taxa not already in taxonomy_file
.
get_taxonomy(
df,
taxa_col = "original_name",
taxonomy_file = tempfile(),
force_new = list(original_name = NULL, timediff = as.difftime(26, units = "weeks")),
remove_taxa = c("BOLD:", "dead", "unverified", "annual herb", "annual grass", "\\?"),
remove_strings = c("\\sx\\s.*", "\\sX\\s.*", "\\s\\-\\-\\s.*",
"\\s\\(.*\\)", "\\ssp\\.$", "\\sssp\\.$", "\\sspec\\.$"),
remove_dead = FALSE,
...
)
Dataframe with taxa column.
Character. Name of column with taxa names. Each unique taxa
in this column will appear in the results in a column called original_name
Character. Path to save results to.
List with elements taxa_col
and difftime
. If
taxonomy_file
already exists any taxa_col
matches between force_new
and
taxonomy_file
will be requeried. Likewise any original_name
that has not
been searched since difftime
will be requeried. Note the name taxa_col
should be as provided as per the taxa_col
argument. Set either to NULL
to ignore.
Character. Regular expressions to be matched. Any matches will be filtered before searching. Removes any rows that match.
Character. Regular expressions to be matched. Any matches will be removed from the string before searching. Removes any text that matches, but the row remains.
Arguments passed to rgbif::name_backbone_checklist()
.
Dataframe. Results from envClean::get_gbif_tax()
. Tweaked by column
rank
being lowercase and ordered factor as per envClean::lurank
. Writes
taxonomy_file
and gsub("\\.", "_accepted.", taxonomy_file)
Common (vernacularName) no longer supported here. Use get_gbif_common()
on
a downstream result. It may be helpful to keep a usageKey through the
cleaning process for use in getting common names. Part of the reason for
removing that functionality here was the ambiguity of which key to use,
particularly around species vs subspecies.