Only queries GBIF for taxa not already in taxonomy_file.

get_taxonomy(
  df,
  taxa_col = "original_name",
  taxonomy_file = tempfile(),
  force_new = list(original_name = NULL, timediff = as.difftime(26, units = "weeks")),
  remove_taxa = c("BOLD:", "dead", "unverified", "annual herb", "annual grass", "\\?"),
  remove_strings = c("\\sx\\s.*", "\\sX\\s.*", "\\s\\-\\-\\s.*",
    "\\s\\(.*\\)", "\\ssp\\.$", "\\sssp\\.$", "\\sspec\\.$"),
  remove_dead = FALSE,
  ...
)

Arguments

df

Dataframe with taxa column.

taxa_col

Character. Name of column with taxa names. Each unique taxa in this column will appear in the results in a column called original_name

taxonomy_file

Character. Path to save results to.

force_new

List with elements taxa_col and difftime. If taxonomy_file already exists any taxa_col matches between force_new and taxonomy_file will be requeried. Likewise any original_name that has not been searched since difftime will be requeried. Note the name taxa_col should be as provided as per the taxa_col argument. Set either to NULL to ignore.

remove_taxa

Character. Regular expressions to be matched. Any matches will be filtered before searching. Removes any rows that match.

remove_strings

Character. Regular expressions to be matched. Any matches will be removed from the string before searching. Removes any text that matches, but the row remains.

...

Arguments passed to rgbif::name_backbone_checklist().

Value

Dataframe. Results from envClean::get_gbif_tax(). Tweaked by column rank being lowercase and ordered factor as per envClean::lurank. Writes taxonomy_file and gsub("\\.", "_accepted.", taxonomy_file)

Details

Common (vernacularName) no longer supported here. Use get_gbif_common() on a downstream result. It may be helpful to keep a usageKey through the cleaning process for use in getting common names. Part of the reason for removing that functionality here was the ambiguity of which key to use, particularly around species vs subspecies.