R/remap_data_names.R
remap_data_names.Rd
Useful to prepare data from several different data sources into a common
structure that can be read collectively via arrow::open_dataset()
remap_data_names(
this_name,
df_to_remap,
data_map = NULL,
out_file = NULL,
exclude_cols = c("order", "epsg", "desc", "data_name_use", "url"),
add_month = !is.null(data_map),
add_year = !is.null(data_map),
add_occ = !is.null(data_map),
occ_cols = c("occ_derivation", "quantity"),
absences = c("0", "none detected", "none observed", "None detected", "ABSENT"),
previous = c("delete", "move"),
compare_previous = TRUE,
compare_cols = c("data_name", "survey"),
...
)
Character. Name of the data source.
Dataframe containing the columns to select and (potentially) rename
Dataframe or NULL. Mapping of fields to retrieve. See example
envImport::data_map
Character. Name of file to save. If NULL
, this will be
here::here("ds", this_name, "this_name.parquet")
Logical. Add a year and/or month column to returned
data frame (requires a date
field to be specified by data_map
)
Logical. Make an occ
column (occurrence) of 1 = detected, 0
= not detected? Due to the plethora of ways original data sets record numbers
and absences this should not be considered 100% reliable.
Character. If add_occ
what values are considered absences?
Character. What to do with any previous out_file
.
Default is 'delete'. Alternative 'move' will rename to the same location as
gsub("\.parquet", paste0("moved__", format(now(), "%Y%m%d_%H%M%S"), ".parquet"), out_file
)
Logical. If TRUE
a comparison of records per
compare_cols
will be made between the new and previous out_file.
Ignored
unless previous == "move
If compare_previous
which columns to comapare. Default
is survey
.
Not used
Character. column names in namesmap to exclude from the combined data
Tibble with selected, renamed, adjusted and aligned columns
Includes code from the stack exchange network post by Dan.
Other Help with combining data sources:
get_data()