Only queries galah for taxa not already in taxonomy_file. Can return a list, for several levels of taxonomic hierarchy, with the 'best' match at that level. For example, if 'genus' is provided in needed_ranks, the returned list will have an element 'genus' that contains, in a column named taxa, and for each of the original names provided, the best result at genus level or higher (in cases where no genus level match was available).

make_taxonomy(
  df = NULL,
  taxa_col = "original_name",
  taxonomy_file = tempfile(),
  force_new = list(original_name = NULL, timediff = as.difftime(26, units = "weeks")),
  remove_taxa = c("bold:", "BOLD:", "unverified", "annual herb", "annual grass",
    "incertae sedis", "\\?", "another species", "not naturalised in SA",
    "unidentified", "unverified", "annual tussock grass", "*no id", "spec\\."),
  remove_strings = c("\\s\\-\\-\\s.*", "\\ssp\\.$", "\\sssp\\.$",
    "\\sspec\\.$", "dead"),
  not_names = c("sp", "ssp", "var", "subsp", "subspecies", "form", "race", "nov", "aff",
    "cf", "lineage", "group", "et", "al", "and", "pl", "revised", "nov"),
  tri_strings = c("\\sssp\\s", "\\sssp\\.\\s", "\\svar\\s",
    "\\svar\\.\\s", "\\ssubsp\\.", "\\ssubspecies", "\\sform\\)",
    "\\sform\\s", "\\sf\\.", "\\srace\\s", "\\srace\\)",
    "\\sp\\.v\\."),
  atlas = c("Australia"),
  tweak_species = TRUE,
  return_taxonomy = TRUE,
  limit = TRUE,
  needed_ranks = c("species"),
  overrides = NULL
)

Arguments

df

Dataframe with taxa_col. Can be NULL only if taxonomy_file already exists.

taxa_col

Character or index. Name or index of column with taxa names. Each unique taxa in this column will be queried against galah::search_taxa and appear in the results list element lutaxain a column called original_name

taxonomy_file

Character. File path to save results to. File type ignored. .parquet file used.

force_new

List with elements difftime and any column name from taxonomy_file. If taxonomy_file already exists any column matches between force_new and taxonomy_file, matching levels within that column will be requeried. Likewise any original_name that has not been searched since difftime will be requeried. Set either to NULL to ignore.

remove_taxa

Character. Rows with regular expressions in taxa_col that match remove_taxa are removed (rows are removed).

remove_strings

Character. Text that matches remove_strings is removed from the taxa_col before searching (text, not row, is removed).

not_names

Character. Text that matches non_name_strings is used to remove non-names from original_names before a word count to indicate (guess) if the original_name is trinomial (original_is_tri field in lutaxa).

tri_strings

Character. Text that matches tri_strings is used to indicate if the original_name is trinomial (original_is_tri field in output lutaxa).

atlas

Character. Name of galah atlas to use.

return_taxonomy

Logical. If TRUE, a list is returned containing the best match for each original_name in lutaxa and additional elements named for their rank (see envClean::lurank) with unique rows for that rank. One element per rank provided in needed_ranks

limit

Logical. If TRUE the returned list will be limited to those original_names in df

needed_ranks

Character vector of ranks required in the returned list. Can be "all" or any combination of ranks from envClean::lurank greater than or equal to subspecies.

overrides

Used to override results returned by galah::search_taxa(). Dataframe with (at least) columns: taxa_col and taxa_to_search. Can also contain any number of use_x columns where x is any of kingdom, phylum, class, order, family, genus, species, subspecies, variety and form. A two step process then attempts to find better results than if searched on taxa_col. Step 1 searches for taxa_to_search instead of taxa_col. If any use_x columns are present, step 2 then checks that the results from step 1 have a result at x. If not, level x results will be taken from use_x.

tweak_species.

Logical. If TRUE (default) and the returned species column result ends in a full stop, the values returned in the species column will be directly taken from the scientific_name column. See details.

Value

Null or list (depending on return_taxonomy). Writes taxonomy_file. taxa_col will be original_name in any outputs. Note that taxa_col, as original_name, will have any quotes removed. If list, then elements:

  • raw - the 'raw' results returned from galah::search_taxa(), tweaked by: column rank is an ordered factor as per envClean::lurank; rank_adj is a new column that will reflect the rank column unless rank is less than subspecies, in which case it will be subspecies; and original_is_tri is a new column

  • needed_ranks - One element for each rank specified in needed_ranks.

    • lutaxa - dataframe. For each unique name in taxa_col, the best taxa taxonomic bin to use, for each original_name, taking into account each level of needed_ranks

      • original_name - unique values from taxa_col

      • match_type - directly from galah::search_taxa()

      • matched_rank - rank column from galah::search_taxa()

      • returned_rank - the rank of the taxa returned for each original_name. This will never be lower than needed_rank but may be higher than needed_rank if no match was available at needed_rank. Use this 'rank' to filter bins in a cleaning workflow

      • taxa - the best taxa available for original_name at needed_rank, perhaps taking into account overrides

      • override - is the taxa the result of an override?

      • original_is_tri - Experimental. Is the original_name a trinomial? Highlights cases where the matched rank is > subspecies but the original_name is probably a subspecies. Guesses are based on a word count after removal of: not_names; numbers; punctuation; capitalised words that are not the first word; and single letter 'words'. tri_strings override the guess - flagging TRUE. Note, clearly, this is only an (informed) guess at whether the original_name is trinomial.

    • taxonomy - dataframe. For each taxa in lutaxa a row of taxonomic hierarchy

Details

The argument tweak_species replaces the galah::search_taxa() result in the species column with the result in the scientific_name column. This attempt to deal with instances where galah::search_taxa() returns odd results in species but good results in scientific_name. e.g. galah::search_taxa("Acacia sp. Small Red-leaved Wattle (J.B.Williams 95033)") returns spec. in the species column but Acacia sp. Small Red-leaved Wattle (J.B.Williams 95033) in the scientific_name column

Previous envClean::make_taxonomy() function is still available via envClean::make_gbif_taxonomy()

Examples


  # setup
  # library("envClean")

  temp_file <- tempfile()

  taxa_df <- tibble::tibble(taxa = c("Charadrius rubricollis"
                                     , "Thinornis cucullatus"
                                     , "Melithreptus gularis laetior"
                                     , "Melithreptus gularis gularis"
                                     , "Eucalyptus viminalis"
                                     , "Eucalyptus viminalis cygnetensis"
                                     , "Eucalyptus"
                                     , "Charadrius mongolus all subspecies"
                                     , "Bettongia lesueur Barrow and Boodie Islands subspecies"
                                     , "Lagorchestes hirsutus Central Australian subspecies"
                                     , "Perameles gunnii Victorian subspecies"
                                     , "Pterostylis sp. Rock ledges (pl. 185, Bates & Weber 1990)"
                                     , "Spyridium glabrisepalum"
                                     , "Spyridium eriocephalum var. glabrisepalum"
                                     , "Petrogale lateralis (MacDonnell Ranges race)"
                                     , "Gehyra montium (revised)"
                                     , "Korthalsella japonica f. japonica"
                                     , "Galaxias sp. nov. 'Hunter'"
                                     , "Not a taxa"
                                     )
                            )

  # make taxonomy (returns list and writes taxonomy_file)
  taxonomy <- make_taxonomy(df = taxa_df
                            , taxa_col = "taxa"
                            , taxonomy_file = temp_file
                            , needed_ranks = c("kingdom", "genus", "species", "subspecies")
                            )
#> Joining with `by = join_by(original_name)`
#> Matched 17 of 19 taxonomic search terms in selected atlas (Australia).
#> 2 unmatched search terms:
#>  "Galaxias sp. nov. Hunter", "Not a taxa"
#> 
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> saving results to H:/temp/nige\RtmpYfG5ks\file4c801f265321.parquet
#> The following were completely unmatched: Galaxias sp. nov. Hunter and Not a taxa. Consider providing more taxonomic levels, or an override, for each unmatched taxa?
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
  taxonomy$raw
#> # A tibble: 19 × 19
#>    original_name              search_term scientific_name scientific_name_auth…¹
#>    <chr>                      <chr>       <chr>           <chr>                 
#>  1 Bettongia lesueur Barrow … Bettongia … Bettongia lesu… NA                    
#>  2 Charadrius mongolus all s… Charadrius… Charadrius (Ch… Pallas, 1776          
#>  3 Charadrius rubricollis     Charadrius… Phalaropus lob… (Linnaeus, 1758)      
#>  4 Eucalyptus                 Eucalyptus  Eucalyptus      L'Hér.                
#>  5 Eucalyptus viminalis       Eucalyptus… Eucalyptus vim… Labill.               
#>  6 Eucalyptus viminalis cygn… Eucalyptus… Eucalyptus vim… Boomsma               
#>  7 Galaxias sp. nov. Hunter   Galaxias s… NA              NA                    
#>  8 Gehyra montium (revised)   Gehyra mon… Gehyra montium  Storr, 1982           
#>  9 Korthalsella japonica f. … Korthalsel… Korthalsella j… (Thunb.) Engl.        
#> 10 Lagorchestes hirsutus Cen… Lagorchest… Lagorchestes h… Gould, 1844           
#> 11 Melithreptus gularis gula… Melithrept… Melithreptus (… (Gould, 1837)         
#> 12 Melithreptus gularis laet… Melithrept… Melithreptus (… Gould, 1875           
#> 13 Not a taxa                 Not a taxa  NA              NA                    
#> 14 Perameles gunnii Victoria… Perameles … Perameles gunn… NA                    
#> 15 Petrogale lateralis (MacD… Petrogale … Petrogale late… Gould, 1842           
#> 16 Pterostylis sp. Rock ledg… Pterostyli… Pterostylis     R.Br.                 
#> 17 Spyridium eriocephalum va… Spyridium … Spyridium erio… J.M.Black             
#> 18 Spyridium glabrisepalum    Spyridium … Spyridium erio… J.M.Black             
#> 19 Thinornis cucullatus       Thinornis … Thinornis cucu… (Vieillot, 1818)      
#> # ℹ abbreviated name: ¹​scientific_name_authorship
#> # ℹ 15 more variables: taxon_concept_id <chr>, rank <ord>, match_type <chr>,
#> #   kingdom <chr>, phylum <chr>, class <chr>, order <chr>, family <chr>,
#> #   genus <chr>, species <chr>, vernacular_name <chr>, stamp <dttm>,
#> #   subspecies <chr>, rank_adj <ord>, original_is_tri <lgl>
  taxonomy$kingdom
#> $lutaxa
#> # A tibble: 17 × 6
#>    original_name     match_type matched_rank returned_rank taxa  original_is_tri
#>    <chr>             <chr>      <ord>        <ord>         <chr> <lgl>          
#>  1 Bettongia lesueu… exactMatch subspecies   kingdom       Anim… TRUE           
#>  2 Charadrius mongo… higherMat… species      kingdom       Anim… FALSE          
#>  3 Charadrius rubri… exactMatch species      kingdom       Anim… FALSE          
#>  4 Eucalyptus        exactMatch genus        kingdom       Plan… FALSE          
#>  5 Eucalyptus vimin… exactMatch species      kingdom       Plan… FALSE          
#>  6 Eucalyptus vimin… exactMatch subspecies   kingdom       Plan… TRUE           
#>  7 Gehyra montium (… canonical… species      kingdom       Anim… FALSE          
#>  8 Korthalsella jap… higherMat… species      kingdom       Plan… TRUE           
#>  9 Lagorchestes hir… canonical… species      kingdom       Anim… TRUE           
#> 10 Melithreptus gul… exactMatch subspecies   kingdom       Anim… TRUE           
#> 11 Melithreptus gul… exactMatch subspecies   kingdom       Anim… TRUE           
#> 12 Perameles gunnii… exactMatch subspecies   kingdom       Anim… TRUE           
#> 13 Petrogale latera… canonical… species      kingdom       Anim… TRUE           
#> 14 Pterostylis sp. … exactMatch genus        kingdom       Plan… FALSE          
#> 15 Spyridium erioce… exactMatch variety      kingdom       Plan… TRUE           
#> 16 Spyridium glabri… exactMatch variety      kingdom       Plan… TRUE           
#> 17 Thinornis cucull… exactMatch species      kingdom       Anim… FALSE          
#> 
#> $taxonomy
#> # A tibble: 2 × 2
#>   taxa     kingdom 
#>   <chr>    <chr>   
#> 1 Animalia Animalia
#> 2 Plantae  Plantae 
#> 
  taxonomy$genus
#> $lutaxa
#> # A tibble: 17 × 6
#>    original_name     match_type matched_rank returned_rank taxa  original_is_tri
#>    <chr>             <chr>      <ord>        <ord>         <chr> <lgl>          
#>  1 Bettongia lesueu… exactMatch subspecies   genus         Bett… TRUE           
#>  2 Charadrius mongo… higherMat… species      genus         Char… FALSE          
#>  3 Charadrius rubri… exactMatch species      genus         Phal… FALSE          
#>  4 Eucalyptus        exactMatch genus        genus         Euca… FALSE          
#>  5 Eucalyptus vimin… exactMatch species      genus         Euca… FALSE          
#>  6 Eucalyptus vimin… exactMatch subspecies   genus         Euca… TRUE           
#>  7 Gehyra montium (… canonical… species      genus         Gehy… FALSE          
#>  8 Korthalsella jap… higherMat… species      genus         Kort… TRUE           
#>  9 Lagorchestes hir… canonical… species      genus         Lago… TRUE           
#> 10 Melithreptus gul… exactMatch subspecies   genus         Meli… TRUE           
#> 11 Melithreptus gul… exactMatch subspecies   genus         Meli… TRUE           
#> 12 Perameles gunnii… exactMatch subspecies   genus         Pera… TRUE           
#> 13 Petrogale latera… canonical… species      genus         Petr… TRUE           
#> 14 Pterostylis sp. … exactMatch genus        genus         Pter… FALSE          
#> 15 Spyridium erioce… exactMatch variety      genus         Spyr… TRUE           
#> 16 Spyridium glabri… exactMatch variety      genus         Spyr… TRUE           
#> 17 Thinornis cucull… exactMatch species      genus         Thin… FALSE          
#> 
#> $taxonomy
#> # A tibble: 13 × 7
#>    taxa         kingdom  phylum     class         order           family   genus
#>    <chr>        <chr>    <chr>      <chr>         <chr>           <chr>    <chr>
#>  1 Bettongia    Animalia Chordata   Mammalia      Diprotodontia   Potoroi… Bett…
#>  2 Charadrius   Animalia Chordata   Aves          Charadriiformes Charadr… Char…
#>  3 Phalaropus   Animalia Chordata   Aves          Charadriiformes Scolopa… Phal…
#>  4 Eucalyptus   Plantae  Charophyta Equisetopsida Myrtales        Myrtace… Euca…
#>  5 Gehyra       Animalia Chordata   Reptilia      Squamata        Gekkoni… Gehy…
#>  6 Korthalsella Plantae  Charophyta Equisetopsida Santalales      Santala… Kort…
#>  7 Lagorchestes Animalia Chordata   Mammalia      Diprotodontia   Macropo… Lago…
#>  8 Melithreptus Animalia Chordata   Aves          Passeriformes   Melipha… Meli…
#>  9 Perameles    Animalia Chordata   Mammalia      Peramelemorphia Peramel… Pera…
#> 10 Petrogale    Animalia Chordata   Mammalia      Diprotodontia   Macropo… Petr…
#> 11 Pterostylis  Plantae  Charophyta Equisetopsida Asparagales     Orchida… Pter…
#> 12 Spyridium    Plantae  Charophyta Equisetopsida Rosales         Rhamnac… Spyr…
#> 13 Thinornis    Animalia Chordata   Aves          Charadriiformes Charadr… Thin…
#> 
  taxonomy$species
#> $lutaxa
#> # A tibble: 17 × 6
#>    original_name     match_type matched_rank returned_rank taxa  original_is_tri
#>    <chr>             <chr>      <ord>        <ord>         <chr> <lgl>          
#>  1 Bettongia lesueu… exactMatch subspecies   species       Bett… TRUE           
#>  2 Charadrius mongo… higherMat… species      species       Char… FALSE          
#>  3 Charadrius rubri… exactMatch species      species       Phal… FALSE          
#>  4 Eucalyptus        exactMatch genus        genus         Euca… FALSE          
#>  5 Eucalyptus vimin… exactMatch species      species       Euca… FALSE          
#>  6 Eucalyptus vimin… exactMatch subspecies   species       Euca… TRUE           
#>  7 Gehyra montium (… canonical… species      species       Gehy… FALSE          
#>  8 Korthalsella jap… higherMat… species      species       Kort… TRUE           
#>  9 Lagorchestes hir… canonical… species      species       Lago… TRUE           
#> 10 Melithreptus gul… exactMatch subspecies   species       Meli… TRUE           
#> 11 Melithreptus gul… exactMatch subspecies   species       Meli… TRUE           
#> 12 Perameles gunnii… exactMatch subspecies   species       Pera… TRUE           
#> 13 Petrogale latera… canonical… species      species       Petr… TRUE           
#> 14 Pterostylis sp. … exactMatch genus        genus         Pter… FALSE          
#> 15 Spyridium erioce… exactMatch variety      species       Spyr… TRUE           
#> 16 Spyridium glabri… exactMatch variety      species       Spyr… TRUE           
#> 17 Thinornis cucull… exactMatch species      species       Thin… FALSE          
#> 
#> $taxonomy
#> # A tibble: 14 × 8
#>    taxa                   kingdom  phylum     class   order family genus species
#>    <chr>                  <chr>    <chr>      <chr>   <chr> <chr>  <chr> <chr>  
#>  1 Bettongia lesueur      Animalia Chordata   Mammal… Dipr… Potor… Bett… Betton…
#>  2 Charadrius mongolus    Animalia Chordata   Aves    Char… Chara… Char… Charad…
#>  3 Phalaropus lobatus     Animalia Chordata   Aves    Char… Scolo… Phal… Phalar…
#>  4 Eucalyptus             Plantae  Charophyta Equise… Myrt… Myrta… Euca… NA     
#>  5 Eucalyptus viminalis   Plantae  Charophyta Equise… Myrt… Myrta… Euca… Eucaly…
#>  6 Gehyra montium         Animalia Chordata   Reptil… Squa… Gekko… Gehy… Gehyra…
#>  7 Korthalsella japonica  Plantae  Charophyta Equise… Sant… Santa… Kort… Kortha…
#>  8 Lagorchestes hirsutus  Animalia Chordata   Mammal… Dipr… Macro… Lago… Lagorc…
#>  9 Melithreptus gularis   Animalia Chordata   Aves    Pass… Melip… Meli… Melith…
#> 10 Perameles gunnii       Animalia Chordata   Mammal… Pera… Peram… Pera… Perame…
#> 11 Petrogale lateralis    Animalia Chordata   Mammal… Dipr… Macro… Petr… Petrog…
#> 12 Pterostylis            Plantae  Charophyta Equise… Aspa… Orchi… Pter… NA     
#> 13 Spyridium eriocephalum Plantae  Charophyta Equise… Rosa… Rhamn… Spyr… Spyrid…
#> 14 Thinornis cucullatus   Animalia Chordata   Aves    Char… Chara… Thin… Thinor…
#> 
  taxonomy$subspecies
#> $lutaxa
#> # A tibble: 17 × 6
#>    original_name     match_type matched_rank returned_rank taxa  original_is_tri
#>    <chr>             <chr>      <ord>        <ord>         <chr> <lgl>          
#>  1 Bettongia lesueu… exactMatch subspecies   subspecies    Bett… TRUE           
#>  2 Charadrius mongo… higherMat… species      species       Char… FALSE          
#>  3 Charadrius rubri… exactMatch species      species       Phal… FALSE          
#>  4 Eucalyptus        exactMatch genus        genus         Euca… FALSE          
#>  5 Eucalyptus vimin… exactMatch species      species       Euca… FALSE          
#>  6 Eucalyptus vimin… exactMatch subspecies   subspecies    Euca… TRUE           
#>  7 Gehyra montium (… canonical… species      species       Gehy… FALSE          
#>  8 Korthalsella jap… higherMat… species      species       Kort… TRUE           
#>  9 Lagorchestes hir… canonical… species      species       Lago… TRUE           
#> 10 Melithreptus gul… exactMatch subspecies   subspecies    Meli… TRUE           
#> 11 Melithreptus gul… exactMatch subspecies   subspecies    Meli… TRUE           
#> 12 Perameles gunnii… exactMatch subspecies   subspecies    Pera… TRUE           
#> 13 Petrogale latera… canonical… species      species       Petr… TRUE           
#> 14 Pterostylis sp. … exactMatch genus        genus         Pter… FALSE          
#> 15 Spyridium erioce… exactMatch variety      subspecies    Spyr… TRUE           
#> 16 Spyridium glabri… exactMatch variety      subspecies    Spyr… TRUE           
#> 17 Thinornis cucull… exactMatch species      species       Thin… FALSE          
#> 
#> $taxonomy
#> # A tibble: 16 × 9
#>    taxa               kingdom phylum class order family genus species subspecies
#>    <chr>              <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>   <chr>     
#>  1 Bettongia lesueur… Animal… Chord… Mamm… Dipr… Potor… Bett… Betton… Bettongia…
#>  2 Charadrius mongol… Animal… Chord… Aves  Char… Chara… Char… Charad… NA        
#>  3 Phalaropus lobatus Animal… Chord… Aves  Char… Scolo… Phal… Phalar… NA        
#>  4 Eucalyptus         Plantae Charo… Equi… Myrt… Myrta… Euca… NA      NA        
#>  5 Eucalyptus vimina… Plantae Charo… Equi… Myrt… Myrta… Euca… Eucaly… NA        
#>  6 Eucalyptus vimina… Plantae Charo… Equi… Myrt… Myrta… Euca… Eucaly… Eucalyptu…
#>  7 Gehyra montium     Animal… Chord… Rept… Squa… Gekko… Gehy… Gehyra… NA        
#>  8 Korthalsella japo… Plantae Charo… Equi… Sant… Santa… Kort… Kortha… NA        
#>  9 Lagorchestes hirs… Animal… Chord… Mamm… Dipr… Macro… Lago… Lagorc… NA        
#> 10 Melithreptus gula… Animal… Chord… Aves  Pass… Melip… Meli… Melith… Melithrep…
#> 11 Melithreptus gula… Animal… Chord… Aves  Pass… Melip… Meli… Melith… Melithrep…
#> 12 Perameles gunnii … Animal… Chord… Mamm… Pera… Peram… Pera… Perame… Perameles…
#> 13 Petrogale lateral… Animal… Chord… Mamm… Dipr… Macro… Petr… Petrog… NA        
#> 14 Pterostylis        Plantae Charo… Equi… Aspa… Orchi… Pter… NA      NA        
#> 15 Spyridium eriocep… Plantae Charo… Equi… Rosa… Rhamn… Spyr… Spyrid… Spyridium…
#> 16 Thinornis cuculla… Animal… Chord… Aves  Char… Chara… Thin… Thinor… NA        
#> 

  # query more taxa (results are added to taxonomy_file but only the new taxa are returned (default `limit = TRUE`)
  more_taxa <- tibble::tibble(original_name = c("Amytornis whitei"
                                                , "Amytornis striatus"
                                                , "Amytornis modestus (North, 1902)"
                                                , "Amytornis modestus modestus"
                                                , "Amytornis modestus cowarie"
                                                )
                              )

  taxonomy <- make_taxonomy(df = more_taxa
                            , taxonomy_file = temp_file
                            , needed_ranks = c("species")
                            )
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> saving results to H:/temp/nige\RtmpYfG5ks\file4c801f265321.parquet
#> The following were completely unmatched: Galaxias sp. nov. Hunter and Not a taxa. Consider providing more taxonomic levels, or an override, for each unmatched taxa?
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`

  taxonomy$species
#> $lutaxa
#> # A tibble: 5 × 6
#>   original_name      match_type matched_rank returned_rank taxa  original_is_tri
#>   <chr>              <chr>      <ord>        <ord>         <chr> <lgl>          
#> 1 Amytornis modestu… canonical… species      species       Amyt… FALSE          
#> 2 Amytornis modestu… exactMatch subspecies   species       Amyt… TRUE           
#> 3 Amytornis modestu… exactMatch subspecies   species       Amyt… TRUE           
#> 4 Amytornis striatus exactMatch species      species       Amyt… FALSE          
#> 5 Amytornis whitei   exactMatch species      species       Amyt… FALSE          
#> 
#> $taxonomy
#> # A tibble: 3 × 8
#>   taxa               kingdom  phylum   class order         family  genus species
#>   <chr>              <chr>    <chr>    <chr> <chr>         <chr>   <chr> <chr>  
#> 1 Amytornis modestus Animalia Chordata Aves  Passeriformes Maluri… Amyt… Amytor…
#> 2 Amytornis striatus Animalia Chordata Aves  Passeriformes Maluri… Amyt… Amytor…
#> 3 Amytornis whitei   Animalia Chordata Aves  Passeriformes Maluri… Amyt… Amytor…
#> 

  # no dataframe supplied - all results in taxonomy_file returned
  taxonomy <- make_taxonomy(taxonomy_file = temp_file
                            , needed_ranks = c("subspecies")
                            )
#> Joining with `by = join_by(original_name)`

  taxonomy$subspecies
#> $lutaxa
#> # A tibble: 22 × 6
#>    original_name     match_type matched_rank returned_rank taxa  original_is_tri
#>    <chr>             <chr>      <ord>        <ord>         <chr> <lgl>          
#>  1 Amytornis modest… canonical… species      species       Amyt… FALSE          
#>  2 Amytornis modest… exactMatch subspecies   subspecies    Amyt… TRUE           
#>  3 Amytornis modest… exactMatch subspecies   subspecies    Amyt… TRUE           
#>  4 Amytornis striat… exactMatch species      species       Amyt… FALSE          
#>  5 Amytornis whitei  exactMatch species      species       Amyt… FALSE          
#>  6 Bettongia lesueu… exactMatch subspecies   subspecies    Bett… TRUE           
#>  7 Charadrius mongo… higherMat… species      species       Char… FALSE          
#>  8 Charadrius rubri… exactMatch species      species       Phal… FALSE          
#>  9 Eucalyptus        exactMatch genus        genus         Euca… FALSE          
#> 10 Eucalyptus vimin… exactMatch species      species       Euca… FALSE          
#> # ℹ 12 more rows
#> 
#> $taxonomy
#> # A tibble: 21 × 9
#>    taxa               kingdom phylum class order family genus species subspecies
#>    <chr>              <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>   <chr>     
#>  1 Amytornis modestus Animal… Chord… Aves  Pass… Malur… Amyt… Amytor… NA        
#>  2 Amytornis modestu… Animal… Chord… Aves  Pass… Malur… Amyt… Amytor… Amytornis…
#>  3 Amytornis modestu… Animal… Chord… Aves  Pass… Malur… Amyt… Amytor… Amytornis…
#>  4 Amytornis striatus Animal… Chord… Aves  Pass… Malur… Amyt… Amytor… NA        
#>  5 Amytornis whitei   Animal… Chord… Aves  Pass… Malur… Amyt… Amytor… NA        
#>  6 Bettongia lesueur… Animal… Chord… Mamm… Dipr… Potor… Bett… Betton… Bettongia…
#>  7 Charadrius mongol… Animal… Chord… Aves  Char… Chara… Char… Charad… NA        
#>  8 Phalaropus lobatus Animal… Chord… Aves  Char… Scolo… Phal… Phalar… NA        
#>  9 Eucalyptus         Plantae Charo… Equi… Myrt… Myrta… Euca… NA      NA        
#> 10 Eucalyptus vimina… Plantae Charo… Equi… Myrt… Myrta… Euca… Eucaly… NA        
#> # ℹ 11 more rows
#> 

  # overrrides
  overrides <- envClean::taxonomy_overrides

  # C. rubricollis binned to Phalarope lobatus at species level!
  taxonomy <- make_taxonomy(df = overrides
                            , taxonomy_file = temp_file
                            , needed_ranks = c("species", "subspecies")
                            )
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> saving results to H:/temp/nige\RtmpYfG5ks\file4c801f265321.parquet
#> The following were completely unmatched: Galaxias sp. nov. Hunter and Not a taxa. Consider providing more taxonomic levels, or an override, for each unmatched taxa?
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`

  taxonomy$species$lutaxa %>%
    dplyr::filter(grepl("rubricollis", original_name))
#> # A tibble: 1 × 6
#>   original_name      match_type matched_rank returned_rank taxa  original_is_tri
#>   <chr>              <chr>      <ord>        <ord>         <chr> <lgl>          
#> 1 Charadrius rubric… exactMatch species      species       Phal… FALSE          

  # add in override - C. rubricollis is binned to T. cucullatus at species level
  taxonomy <- make_taxonomy(df = overrides
                            , taxonomy_file = temp_file
                            , needed_ranks = c("species", "subspecies")
                            , overrides = overrides
                            )
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name, returned_rank)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> saving results to H:/temp/nige\RtmpYfG5ks\file4c801f265321.parquet
#> The following were completely unmatched: Galaxias sp. nov. Hunter and Not a taxa. Consider providing more taxonomic levels, or an override, for each unmatched taxa?
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`

  taxonomy$species$lutaxa %>%
    dplyr::filter(grepl("rubricollis", original_name))
#> # A tibble: 1 × 7
#>   original_name      match_type matched_rank returned_rank taxa  original_is_tri
#>   <chr>              <chr>      <ord>        <ord>         <chr> <lgl>          
#> 1 Charadrius rubric… exactMatch species      species       Thin… FALSE          
#> # ℹ 1 more variable: override <lgl>


  # tweak_species example
  make_taxonomy(df = tibble::tibble(original_name = "Acacia sp. Small Red-leaved Wattle (J.B.Williams 95033)")
                , tweak_species = FALSE
                )$raw %>%
    dplyr::select(original_name, scientific_name, species)
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> saving results to H:/temp/nige\RtmpYfG5ks\file4c8078765c97.parquet
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> # A tibble: 1 × 3
#>   original_name                                          scientific_name species
#>   <chr>                                                  <chr>           <chr>  
#> 1 Acacia sp. Small Red-leaved Wattle (J.B.Williams 9503… Acacia sp. Sma… spec.  

  make_taxonomy(df = tibble::tibble(original_name = "Acacia sp. Small Red-leaved Wattle (J.B.Williams 95033)")
                , tweak_species = TRUE
                )$raw %>%
    dplyr::select(original_name, scientific_name, species)
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> saving results to H:/temp/nige\RtmpYfG5ks\file4c802c0321c2.parquet
#> Joining with `by = join_by(original_name)`
#> Joining with `by = join_by(original_name)`
#> # A tibble: 1 × 3
#>   original_name                                          scientific_name species
#>   <chr>                                                  <chr>           <chr>  
#> 1 Acacia sp. Small Red-leaved Wattle (J.B.Williams 9503… Acacia sp. Sma… Acacia…

  # clean up
  rm(taxonomy)
  unlist(paste0(temp_file, ".parquet"))
#> [1] "H:/temp/nige\\RtmpYfG5ks\\file4c801f265321.parquet"