Useful to prepare data from several different data sources into a common structure that can be read collectively via arrow::open_dataset()

remap_data_names(
  this_name,
  df_to_remap,
  data_map = NULL,
  out_file = NULL,
  exclude_cols = c("order", "epsg", "desc", "data_name_use", "url"),
  add_month = !is.null(data_map),
  add_year = !is.null(data_map),
  add_occ = !is.null(data_map),
  occ_cols = c("occ_derivation", "quantity"),
  absences = c("0", "none detected", "none observed", "None detected", "ABSENT"),
  previous = c("delete", "move"),
  compare_previous = TRUE,
  compare_cols = c("data_name", "survey"),
  ...
)

Arguments

this_name

Character. Name of the data source.

df_to_remap

Dataframe containing the columns to select and (potentially) rename

data_map

Dataframe or NULL. Mapping of fields to retrieve. See example envImport::data_map

out_file

Character. Name of file to save. If NULL, this will be here::here("ds", this_name, "this_name.parquet")

add_month, add_year

Logical. Add a year and/or month column to returned data frame (requires a date field to be specified by data_map)

add_occ

Logical. Make an occ column (occurrence) of 1 = detected, 0 = not detected? Due to the plethora of ways original data sets record numbers and absences this should not be considered 100% reliable.

absences

Character. If add_occ what values are considered absences?

previous

Character. What to do with any previous out_file. Default is 'delete'. Alternative 'move' will rename to the same location as gsub("\.parquet", paste0("moved__", format(now(), "%Y%m%d_%H%M%S"), ".parquet"), out_file)

compare_previous

Logical. If TRUE a comparison of records per compare_cols will be made between the new and previous out_file. Ignored unless previous == "move

compare_cols

If compare_previous which columns to comapare. Default is survey.

...

Not used

exclude_names

Character. column names in namesmap to exclude from the combined data

Value

Tibble with selected, renamed, adjusted and aligned columns

Details

Includes code from the stack exchange network post by Dan.

See also

Other Help with combining data sources: get_data()