Title: | Obtaining Stars from Flat Tables |
---|---|
Description: | Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure. Frequently, the operations to be performed aim to transform a flat table into a star schema. Transformations can be carried out using professional extract, transform and load tools or tools intended for data transformation for end users. With the tools mentioned, this transformation can be carried out, but it requires a lot of work. The main objective of this package is to define transformations that allow obtaining stars from flat tables easily. In addition, it includes basic data cleaning, dimension enrichment, incremental data refresh and query operations, adapted to this context. |
Authors: | Jose Samos [aut, cre] , Universidad de Granada [cph] |
Maintainer: | Jose Samos <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.5.9000 |
Built: | 2024-10-29 03:16:07 UTC |
Source: | https://github.com/josesamos/starschemar |
Transforms numeric type attributes of dimensions into character type. In a
star_schema
numerical data are measurements that are situated in the facts.
Numerical data in dimensions are usually codes, day, week, month or year
numbers. There are tools that consider any numerical data to be a
measurement, for this reason it is appropriate to transform the numerical
data of dimensions into character data.
character_dimensions(st, length_integers = list(), NA_replacement_value = NULL) ## S3 method for class 'star_schema' character_dimensions(st, length_integers = list(), NA_replacement_value = NULL)
character_dimensions(st, length_integers = list(), NA_replacement_value = NULL) ## S3 method for class 'star_schema' character_dimensions(st, length_integers = list(), NA_replacement_value = NULL)
st |
A |
length_integers |
A |
NA_replacement_value |
A string, value to replace NA values. |
It allows indicating the amplitude for some fields, filling with zeros on the left. This is useful to make the alphabetical order of the result correspond to the numerical order.
It also allows indicating the literal to be used in case the numerical value is not defined.
If a role playing dimension has been defined, the transformation is performed on it.
A star_schema
object.
Other star schema and constellation definition functions:
constellation()
,
role_playing_dimension()
,
snake_case()
,
star_schema()
st <- star_schema(mrs_age_test, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("date", "week", "year") ) |> character_dimensions(length_integers = list(week = 2), NA_replacement_value = "Unknown")
st <- star_schema(mrs_age_test, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("date", "week", "year") ) |> character_dimensions(length_integers = list(week = 2), NA_replacement_value = "Unknown")
constellation
S3 classCreates a constellation
object from a list of star_schema
objects. All
dimensions with the same name in the star schemas have to be conformable.
constellation(lst, name = NULL)
constellation(lst, name = NULL)
lst |
A list of |
name |
A string. |
A constellation
object.
Other star schema and constellation definition functions:
character_dimensions()
,
role_playing_dimension()
,
snake_case()
,
star_schema()
ct <- constellation(list(st_mrs_age, st_mrs_cause), name = "mrs")
ct <- constellation(list(st_mrs_age, st_mrs_cause), name = "mrs")
multistar
Once we have refined the format or content of facts and dimensions, we can
obtain a multistar
. A multistar
only distinguishes between general and
conformed dimensions, each dimension has its own data. It can contain
multiple fact tables.
constellation_as_multistar(ct) ## S3 method for class 'constellation' constellation_as_multistar(ct)
constellation_as_multistar(ct) ## S3 method for class 'constellation' constellation_as_multistar(ct)
ct |
A |
A multistar
object.
Other results export functions:
constellation_as_tibble_list()
,
multistar_as_flat_table()
,
star_schema_as_flat_table()
,
star_schema_as_multistar()
,
star_schema_as_tibble_list()
ms <- ct_mrs |> constellation_as_multistar()
ms <- ct_mrs |> constellation_as_multistar()
tibble
listOnce we have refined the format or content of facts and dimensions, we can
obtain a tibble
list with them. Role playing dimensions can be optionally
included.
constellation_as_tibble_list(ct, include_role_playing = FALSE) ## S3 method for class 'constellation' constellation_as_tibble_list(ct, include_role_playing = FALSE)
constellation_as_tibble_list(ct, include_role_playing = FALSE) ## S3 method for class 'constellation' constellation_as_tibble_list(ct, include_role_playing = FALSE)
ct |
A |
include_role_playing |
A boolean. |
A list of tibble
objects.
Other results export functions:
constellation_as_multistar()
,
multistar_as_flat_table()
,
star_schema_as_flat_table()
,
star_schema_as_multistar()
,
star_schema_as_tibble_list()
tl <- ct_mrs |> constellation_as_tibble_list() tl <- ct_mrs |> constellation_as_tibble_list(include_role_playing = TRUE)
tl <- ct_mrs |> constellation_as_tibble_list() tl <- ct_mrs |> constellation_as_tibble_list(include_role_playing = TRUE)
Constellation for the Mortality Reporting System considering age and cause classification.
ct_mrs
ct_mrs
A constellation
object.
# Defined by: ct_mrs <- constellation(list(st_mrs_age, st_mrs_cause), name = "mrs")
# Defined by: ct_mrs <- constellation(list(st_mrs_age, st_mrs_cause), name = "mrs")
Constellation for the Mortality Reporting System considering age and cause classification data test.
ct_mrs_test
ct_mrs_test
A constellation
object.
# Defined by: ct_mrs_test <- constellation(list(st_mrs_age_test, st_mrs_cause_test), name = "mrs_test")
# Defined by: ct_mrs_test <- constellation(list(st_mrs_age_test, st_mrs_cause_test), name = "mrs_test")
dimensional_model
objectTo define a dimension in a dimensional_model
object, we have to define its
name and the set of attributes that make it up.
define_dimension(st, name = NULL, attributes = NULL) ## S3 method for class 'dimensional_model' define_dimension(st, name = NULL, attributes = NULL)
define_dimension(st, name = NULL, attributes = NULL) ## S3 method for class 'dimensional_model' define_dimension(st, name = NULL, attributes = NULL)
st |
A |
name |
A string, name of the dimension. |
attributes |
A vector of attribute names. |
To get a star schema (a star_schema
object) we need a flat table
(implemented through a tibble
) and a dimensional_model
object. The
definition of dimensions in the dimensional_model
object is made from the
flat table column names. Using the dput
function we can list the column
names of the flat table so that we do not have to type their names.
A dimensional_model
object.
Other star definition functions:
define_fact()
,
dimensional_model()
# dput(colnames(mrs_age)) # # c( # "Reception Year", # "Reception Week", # "Reception Date", # "Data Availability Year", # "Data Availability Week", # "Data Availability Date", # "Year", # "WEEK", # "Week Ending Date", # "REGION", # "State", # "City", # "Age Range", # "Deaths" # ) dm <- dimensional_model() |> define_dimension(name = "When", attributes = c("Week Ending Date", "WEEK", "Year")) |> define_dimension(name = "When Available", attributes = c("Data Availability Date", "Data Availability Week", "Data Availability Year")) |> define_dimension(name = "Where", attributes = c("REGION", "State", "City")) |> define_dimension(name = "Who", attributes = c("Age Range"))
# dput(colnames(mrs_age)) # # c( # "Reception Year", # "Reception Week", # "Reception Date", # "Data Availability Year", # "Data Availability Week", # "Data Availability Date", # "Year", # "WEEK", # "Week Ending Date", # "REGION", # "State", # "City", # "Age Range", # "Deaths" # ) dm <- dimensional_model() |> define_dimension(name = "When", attributes = c("Week Ending Date", "WEEK", "Year")) |> define_dimension(name = "When Available", attributes = c("Data Availability Date", "Data Availability Week", "Data Availability Year")) |> define_dimension(name = "Where", attributes = c("REGION", "State", "City")) |> define_dimension(name = "Who", attributes = c("Age Range"))
dimensional_model
objectTo define facts in a dimensional_model
object, the essential data is a name
and a set of measurements that can be empty (does not have explicit
measurements). Associated with each measurement, an aggregation function is
required, which by default is SUM.
define_fact( st, name = NULL, measures = NULL, agg_functions = NULL, nrow_agg = "nrow_agg" ) ## S3 method for class 'dimensional_model' define_fact( st, name = NULL, measures = NULL, agg_functions = NULL, nrow_agg = "nrow_agg" )
define_fact( st, name = NULL, measures = NULL, agg_functions = NULL, nrow_agg = "nrow_agg" ) ## S3 method for class 'dimensional_model' define_fact( st, name = NULL, measures = NULL, agg_functions = NULL, nrow_agg = "nrow_agg" )
st |
A |
name |
A string, name of the fact. |
measures |
A vector of measure names. |
agg_functions |
A vector of aggregation function names. If none is indicated, the default is SUM. Additionally they can be MAX or MIN. |
nrow_agg |
A string, measurement name for the number of rows aggregated. |
To get a star schema (a star_schema
object) we need a flat table
(implemented through a tibble
) and a dimensional_model
object. The
definition of facts in the dimensional_model
object is made from the flat
table column names. Using the dput
function we can list the column names of
the flat table so that we do not have to type their names.
Associated with each measurement there is an aggregation function that can be SUM, MAX or MIN. Mean is not considered among the possible aggregation functions: The reason is that calculating the mean by considering subsets of data does not necessarily yield the mean of the total data.
An additional measurement corresponding to the number of aggregated rows is always added which, together with SUM, allows us to obtain the mean if needed.
A dimensional_model
object.
Other star definition functions:
define_dimension()
,
dimensional_model()
# dput(colnames(mrs_age)) # # c( # "Reception Year", # "Reception Week", # "Reception Date", # "Data Availability Year", # "Data Availability Week", # "Data Availability Date", # "Year", # "WEEK", # "Week Ending Date", # "REGION", # "State", # "City", # "Age Range", # "Deaths" # ) dm <- dimensional_model() |> define_fact( name = "mrs_age", measures = c("Deaths"), agg_functions = c("SUM"), nrow_agg = "nrow_agg" ) dm <- dimensional_model() |> define_fact( name = "mrs_age", measures = c("Deaths") ) dm <- dimensional_model() |> define_fact(name = "Factless fact")
# dput(colnames(mrs_age)) # # c( # "Reception Year", # "Reception Week", # "Reception Date", # "Data Availability Year", # "Data Availability Week", # "Data Availability Date", # "Year", # "WEEK", # "Week Ending Date", # "REGION", # "State", # "City", # "Age Range", # "Deaths" # ) dm <- dimensional_model() |> define_fact( name = "mrs_age", measures = c("Deaths"), agg_functions = c("SUM"), nrow_agg = "nrow_agg" ) dm <- dimensional_model() |> define_fact( name = "mrs_age", measures = c("Deaths") ) dm <- dimensional_model() |> define_fact(name = "Factless fact")
dimensional_model
S3 classAn empty dimensional_model
object is created in which definition of facts
and dimensions can be added.
dimensional_model()
dimensional_model()
To get a star schema (a star_schema
object) we need a flat table
(implemented through a tibble
) and a dimensional_model
object. The
definition of facts and dimensions in the dimensional_model
object is made
from the flat table columns. Each attribute can only appear once in the
definition.
A dimensional_model
object.
Other star definition functions:
define_dimension()
,
define_fact()
dm <- dimensional_model()
dm <- dimensional_model()
dimensional_query
S3 classAn empty dimensional_query
object is created where you can select fact
measures, dimension attributes and filter dimension rows.
dimensional_query(ms = NULL)
dimensional_query(ms = NULL)
ms |
A |
A dimensional_query
object.
Other query functions:
filter_dimension()
,
run_query()
,
select_dimension()
,
select_fact()
# ms_mrs <- ct_mrs |> # constellation_as_multistar() # dq <- dimensional_query(ms_mrs)
# ms_mrs <- ct_mrs |> # constellation_as_multistar() # dq <- dimensional_query(ms_mrs)
Definition of facts and dimensions for the Mortality Reporting System considering the age classification.
dm_mrs_age
dm_mrs_age
A dimensional_model
object.
# Defined by: dm_mrs_age <- dimensional_model() |> define_fact( name = "mrs_age", measures = c( "Deaths" ), agg_functions = c( "SUM" ), nrow_agg = "nrow_agg" ) |> define_dimension( name = "when", attributes = c( "Week Ending Date", "WEEK", "Year" ) ) |> define_dimension( name = "when_available", attributes = c( "Data Availability Date", "Data Availability Week", "Data Availability Year" ) ) |> define_dimension( name = "where", attributes = c( "REGION", "State", "City" ) ) |> define_dimension( name = "who", attributes = c( "Age Range" ) )
# Defined by: dm_mrs_age <- dimensional_model() |> define_fact( name = "mrs_age", measures = c( "Deaths" ), agg_functions = c( "SUM" ), nrow_agg = "nrow_agg" ) |> define_dimension( name = "when", attributes = c( "Week Ending Date", "WEEK", "Year" ) ) |> define_dimension( name = "when_available", attributes = c( "Data Availability Date", "Data Availability Week", "Data Availability Year" ) ) |> define_dimension( name = "where", attributes = c( "REGION", "State", "City" ) ) |> define_dimension( name = "who", attributes = c( "Age Range" ) )
Definition of facts and dimensions for the Mortality Reporting System considering the cause classification.
dm_mrs_cause
dm_mrs_cause
A dimensional_model
object.
# Defined by: dm_mrs_cause <- dimensional_model() |> define_fact( name = "mrs_cause", measures = c( "Pneumonia and Influenza Deaths", "Other Deaths" ), ) |> define_dimension( name = "when", attributes = c( "Week Ending Date", "WEEK", "Year" ) ) |> define_dimension( name = "when_received", attributes = c( "Reception Date", "Reception Week", "Reception Year" ) ) |> define_dimension( name = "when_available", attributes = c( "Data Availability Date", "Data Availability Week", "Data Availability Year" ) ) |> define_dimension( name = "where", attributes = c( "REGION", "State", "City" ) )
# Defined by: dm_mrs_cause <- dimensional_model() |> define_fact( name = "mrs_cause", measures = c( "Pneumonia and Influenza Deaths", "Other Deaths" ), ) |> define_dimension( name = "when", attributes = c( "Week Ending Date", "WEEK", "Year" ) ) |> define_dimension( name = "when_received", attributes = c( "Reception Date", "Reception Week", "Reception Year" ) ) |> define_dimension( name = "when_available", attributes = c( "Data Availability Date", "Data Availability Week", "Data Availability Year" ) ) |> define_dimension( name = "where", attributes = c( "REGION", "State", "City" ) )
Export the selected attributes of a dimension, without repeated combinations, to enrich the dimension.
enrich_dimension_export(st, name = NULL, attributes = NULL) ## S3 method for class 'star_schema' enrich_dimension_export(st, name = NULL, attributes = NULL)
enrich_dimension_export(st, name = NULL, attributes = NULL) ## S3 method for class 'star_schema' enrich_dimension_export(st, name = NULL, attributes = NULL)
st |
A |
name |
A string, name of the dimension. |
attributes |
A vector of attribute names. |
If it is a role dimension they cannot be exported, you have to work with the associated role playing dimension.
A tibble
object.
Other dimension enrichment functions:
enrich_dimension_import()
,
enrich_dimension_import_test()
tb <- enrich_dimension_export(st_mrs_age, name = "when_common", attributes = c("week", "year"))
tb <- enrich_dimension_export(st_mrs_age, name = "when_common", attributes = c("week", "year"))
tibble
to enrich a dimensionFor a dimension of a star schema a tibble
is attached. This contains
dimension attributes and new attributes. If values associated with all rows
in the dimension are included in the tibble
, the dimension is enriched with
the new attributes.
enrich_dimension_import(st, name = NULL, tb) ## S3 method for class 'star_schema' enrich_dimension_import(st, name = NULL, tb)
enrich_dimension_import(st, name = NULL, tb) ## S3 method for class 'star_schema' enrich_dimension_import(st, name = NULL, tb)
st |
A |
name |
A string, name of the dimension. |
tb |
A |
Role dimensions cannot be directly enriched. If a role playing dimension is enriched, the new attributes are also added to the associated role dimensions.
A star_schema
object.
Other dimension enrichment functions:
enrich_dimension_export()
,
enrich_dimension_import_test()
tb <- enrich_dimension_export(st_mrs_age, name = "when_common", attributes = c("week", "year")) # Add new columns with meaningful data (these are not), possibly exporting # data to a file, populating it and importing it. tb <- tibble::add_column(tb, x = "x", y = "y", z = "z") st <- enrich_dimension_import(st_mrs_age, name = "when_common", tb)
tb <- enrich_dimension_export(st_mrs_age, name = "when_common", attributes = c("week", "year")) # Add new columns with meaningful data (these are not), possibly exporting # data to a file, populating it and importing it. tb <- tibble::add_column(tb, x = "x", y = "y", z = "z") st <- enrich_dimension_import(st_mrs_age, name = "when_common", tb)
tibble
to test to enrich a dimensionFor a dimension of a star schema a tibble
is attached. This contains
dimension attributes and new attributes. If values associated with all rows
in the dimension are included in the tibble
, the dimension is enriched with
the new attributes. This function checks that there are values for all
instances. Returns the dimension instances that do not match the imported
data.
enrich_dimension_import_test(st, name = NULL, tb) ## S3 method for class 'star_schema' enrich_dimension_import_test(st, name = NULL, tb)
enrich_dimension_import_test(st, name = NULL, tb) ## S3 method for class 'star_schema' enrich_dimension_import_test(st, name = NULL, tb)
st |
A |
name |
A string, name of the dimension. |
tb |
A |
A dimension
object.
Other dimension enrichment functions:
enrich_dimension_export()
,
enrich_dimension_import()
tb <- enrich_dimension_export(st_mrs_age, name = "when_common", attributes = c("week", "year")) # Add new columns with meaningful data (these are not), possibly exporting # data to a file, populating it and importing it. tb <- tibble::add_column(tb, x = "x", y = "y", z = "z")[-1, ] tb2 <- enrich_dimension_import_test(st_mrs_age, name = "when_common", tb)
tb <- enrich_dimension_export(st_mrs_age, name = "when_common", attributes = c("week", "year")) # Add new columns with meaningful data (these are not), possibly exporting # data to a file, populating it and importing it. tb <- tibble::add_column(tb, x = "x", y = "y", z = "z")[-1, ] tb2 <- enrich_dimension_import_test(st_mrs_age, name = "when_common", tb)
Allows you to define selection conditions for dimension rows.
filter_dimension(dq, name = NULL, ...) ## S3 method for class 'dimensional_query' filter_dimension(dq, name = NULL, ...)
filter_dimension(dq, name = NULL, ...) ## S3 method for class 'dimensional_query' filter_dimension(dq, name = NULL, ...)
dq |
A |
name |
A string, name of the dimension. |
... |
Conditions, defined in exactly the same way as in |
Conditions can be defined on any attribute of the dimension (not only on
attributes selected in the query for the dimension). The selection is made
based on the function dplyr::filter
. Conditions are defined in exactly the
same way as in that function.
A dimensional_query
object.
Other query functions:
dimensional_query()
,
run_query()
,
select_dimension()
,
select_fact()
dq <- dimensional_query(ms_mrs) |> filter_dimension(name = "when", when_happened_week <= "03") |> filter_dimension(name = "where", city == "Boston")
dq <- dimensional_query(ms_mrs) |> filter_dimension(name = "when", when_happened_week <= "03") |> filter_dimension(name = "where", city == "Boston")
Filter fact rows based on dimension conditions in a star schema. Dimensions remain unchanged.
filter_fact_rows(st, name = NULL, ...) ## S3 method for class 'star_schema' filter_fact_rows(st, name = NULL, ...)
filter_fact_rows(st, name = NULL, ...) ## S3 method for class 'star_schema' filter_fact_rows(st, name = NULL, ...)
st |
A |
name |
A string, name of the dimension. |
... |
Conditions, defined in exactly the same way as in |
Filtered rows can be deleted using the incremental_refresh_star_schema
function.
A star_schema
object.
Other incremental refresh functions:
get_star_schema()
,
get_star_schema_names()
,
incremental_refresh_constellation()
,
incremental_refresh_star_schema()
,
purge_dimensions_constellation()
,
purge_dimensions_star_schema()
st <- st_mrs_age |> filter_fact_rows(name = "when", week <= "03") |> filter_fact_rows(name = "where", city == "Bridgeport") st2 <- st_mrs_age |> incremental_refresh_star_schema(st, existing = "delete")
st <- st_mrs_age |> filter_fact_rows(name = "when", week <= "03") |> filter_fact_rows(name = "where", city == "Bridgeport") st2 <- st_mrs_age |> incremental_refresh_star_schema(st, existing = "delete")
Estimation of the long-term health impacts of exposure to air pollution in London from 2016 to 2050.
ft_datagov_uk
ft_datagov_uk
A tibble
.
The original dataset contains 68 files, corresponding to 34 London areas and 2 pollutants: pollutant and zone are indicated in the name of each file. Each file has several sheets with different variables. It has been transformed into a flat table considering a single variable and defining the area and the pollutant as columns.
https://data.world/datagov-uk/fd864906-8456-46a8-9a01-0dcb2dbd87b9
Classification of London's boroughs into zones and sub-regions.
ft_london_boroughs
ft_london_boroughs
A tibble
.
https://en.wikipedia.org/wiki/List_of_sub-regions_used_in_the_London_Plan
City, state and county for US cities. It only includes those that appear in the Mortality Reporting System.
ft_usa_city_county
ft_usa_city_county
A tibble
.
https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
Name and abbreviation of US states.
ft_usa_states
ft_usa_states
A tibble
.
https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
Get a conformed dimension of a constellation given its name.
get_conformed_dimension(ct, name) ## S3 method for class 'constellation' get_conformed_dimension(ct, name)
get_conformed_dimension(ct, name) ## S3 method for class 'constellation' get_conformed_dimension(ct, name)
ct |
A |
name |
A string, name of the dimension. |
A dimension_table
object.
Other data cleaning functions:
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
,
update_selection_general()
d <- ct_mrs |> get_conformed_dimension("when")
d <- ct_mrs |> get_conformed_dimension("when")
Get the names of the conformed dimensions of a constellation.
get_conformed_dimension_names(ct) ## S3 method for class 'constellation' get_conformed_dimension_names(ct)
get_conformed_dimension_names(ct) ## S3 method for class 'constellation' get_conformed_dimension_names(ct)
ct |
A |
A vector of dimension names.
Other data cleaning functions:
get_conformed_dimension()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
,
update_selection_general()
d <- ct_mrs |> get_conformed_dimension_names()
d <- ct_mrs |> get_conformed_dimension_names()
Get a dimension of a star schema given its name.
get_dimension(st, name) ## S3 method for class 'star_schema' get_dimension(st, name)
get_dimension(st, name) ## S3 method for class 'star_schema' get_dimension(st, name)
st |
A |
name |
A string, name of the dimension. |
Role dimensions can be obtained but not role playing dimensions. Role dimensions get their instances of role playing dimensions.
A dimension_table
object.
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
,
update_selection_general()
d <- st_mrs_age |> get_dimension("when")
d <- st_mrs_age |> get_dimension("when")
Get the name of attributes in a dimension.
get_dimension_attribute_names(st, name) ## S3 method for class 'star_schema' get_dimension_attribute_names(st, name)
get_dimension_attribute_names(st, name) ## S3 method for class 'star_schema' get_dimension_attribute_names(st, name)
st |
A |
name |
A string, name of the dimension. |
A vector of attribute names.
Other rename functions:
get_measure_names()
,
rename_dimension()
,
rename_dimension_attributes()
,
rename_fact()
,
rename_measures()
attribute_names <- st_mrs_age |> get_dimension_attribute_names("when")
attribute_names <- st_mrs_age |> get_dimension_attribute_names("when")
Get the names of the dimensions of a star schema.
get_dimension_names(st) ## S3 method for class 'star_schema' get_dimension_names(st)
get_dimension_names(st) ## S3 method for class 'star_schema' get_dimension_names(st)
st |
A |
Role playing dimensions are not considered.
A vector of dimension names.
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
,
update_selection_general()
dn <- st_mrs_age |> get_dimension_names()
dn <- st_mrs_age |> get_dimension_names()
Get the name of measures in facts.
get_measure_names(st) ## S3 method for class 'star_schema' get_measure_names(st)
get_measure_names(st) ## S3 method for class 'star_schema' get_measure_names(st)
st |
A |
A vector of measure names.
Other rename functions:
get_dimension_attribute_names()
,
rename_dimension()
,
rename_dimension_attributes()
,
rename_fact()
,
rename_measures()
measure_names <- st_mrs_age |> get_measure_names()
measure_names <- st_mrs_age |> get_measure_names()
Get a star schema of a constellation given its name.
get_star_schema(ct, name) ## S3 method for class 'constellation' get_star_schema(ct, name)
get_star_schema(ct, name) ## S3 method for class 'constellation' get_star_schema(ct, name)
ct |
A |
name |
A string, name of the star schema. |
A dimension_table
object.
Other incremental refresh functions:
filter_fact_rows()
,
get_star_schema_names()
,
incremental_refresh_constellation()
,
incremental_refresh_star_schema()
,
purge_dimensions_constellation()
,
purge_dimensions_star_schema()
d <- ct_mrs |> get_star_schema("mrs_age")
d <- ct_mrs |> get_star_schema("mrs_age")
Get the names of the star schemas in a constellation.
get_star_schema_names(ct) ## S3 method for class 'constellation' get_star_schema_names(ct)
get_star_schema_names(ct) ## S3 method for class 'constellation' get_star_schema_names(ct)
ct |
A |
A vector of star schema names.
Other incremental refresh functions:
filter_fact_rows()
,
get_star_schema()
,
incremental_refresh_constellation()
,
incremental_refresh_star_schema()
,
purge_dimensions_constellation()
,
purge_dimensions_star_schema()
d <- ct_mrs |> get_star_schema_names()
d <- ct_mrs |> get_star_schema_names()
Incrementally refresh a star schema in a constellation with the content of a new star schema that is integrated into the first.
incremental_refresh_constellation(ct, st, existing = "ignore") ## S3 method for class 'constellation' incremental_refresh_constellation(ct, st, existing = "ignore")
incremental_refresh_constellation(ct, st, existing = "ignore") ## S3 method for class 'constellation' incremental_refresh_constellation(ct, st, existing = "ignore")
ct |
A |
st |
A |
existing |
A string, operation to be performed with records in the fact table whose keys match. |
Once the dimensions are integrated, if there are records in the fact table
whose keys match the new ones, new ones can be ignored, they can be replaced
by new ones, all of them can be grouped using the aggregation functions, or
they can be deleted. Therefore, the possible values of the existing
parameter are: "ignore", "replace", "group" or "delete".
A constellation
object.
Other incremental refresh functions:
filter_fact_rows()
,
get_star_schema()
,
get_star_schema_names()
,
incremental_refresh_star_schema()
,
purge_dimensions_constellation()
,
purge_dimensions_star_schema()
ct <- ct_mrs |> incremental_refresh_constellation(st_mrs_age_w10, existing = "replace") ct <- ct_mrs |> incremental_refresh_constellation(st_mrs_cause_w10, existing = "group")
ct <- ct_mrs |> incremental_refresh_constellation(st_mrs_age_w10, existing = "replace") ct <- ct_mrs |> incremental_refresh_constellation(st_mrs_cause_w10, existing = "group")
For a dimension, given the primary key of two records, it adds an update to the set of updates that modifies the combination of values of the rest of attributes of the first record so that they become the same as those of the second.
match_records(updates, dimension, old, new) ## S3 method for class 'record_update_set' match_records(updates, dimension, old, new)
match_records(updates, dimension, old, new) ## S3 method for class 'record_update_set' match_records(updates, dimension, old, new)
updates |
A |
dimension |
A |
old |
A number, primary key of the record to update. |
new |
A number, primary key of the record from which the values are taken. |
Primary keys are only used to get the combination of values easily. The update is defined exclusively from the rest of values.
It is especially useful when it is detected that two records should be only one: Two have been generated due to some data error.
A record_update_set
object.
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
,
update_selection_general()
dim_names <- st_mrs_age |> get_dimension_names() where <- st_mrs_age |> get_dimension("where") # head(where, 2) updates <- record_update_set() |> match_records(dimension = where, old = 1, new = 2)
dim_names <- st_mrs_age |> get_dimension_names() where <- st_mrs_age |> get_dimension("where") # head(where, 2) updates <- record_update_set() |> match_records(dimension = where, old = 1, new = 2)
Given a list of dimension record update operations, they are applied on the conformed
dimensions of the constellation
object. Update operations must be defined
with the set of functions available for that purpose.
modify_conformed_dimension_records(ct, updates = record_update_set()) ## S3 method for class 'constellation' modify_conformed_dimension_records(ct, updates = record_update_set())
modify_conformed_dimension_records(ct, updates = record_update_set()) ## S3 method for class 'constellation' modify_conformed_dimension_records(ct, updates = record_update_set())
ct |
A |
updates |
A |
When dimensions are defined, records can be detected that must be modified as part of the data cleaning process: frequently to unify two or more records due to data errors or missing data. This is not immediate because facts must be adapted to the new set of dimension instances.
This operation allows us to unify records and automatically propagate modifications to facts in star schemas.
A constellation
object.
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
,
update_selection_general()
ct <- ct_mrs |> modify_conformed_dimension_records(updates_st_mrs_age)
ct <- ct_mrs |> modify_conformed_dimension_records(updates_st_mrs_age)
Given a list of dimension record update operations, they are applied on the
dimensions of the star_schema
object. Update operations must be defined
with the set of functions available for that purpose.
modify_dimension_records(st, updates = record_update_set()) ## S3 method for class 'star_schema' modify_dimension_records(st, updates = record_update_set())
modify_dimension_records(st, updates = record_update_set()) ## S3 method for class 'star_schema' modify_dimension_records(st, updates = record_update_set())
st |
A |
updates |
A |
When dimensions are defined, records can be detected that must be modified as part of the data cleaning process: frequently to unify two or more records due to data errors or missing data. This is not immediate because facts must be adapted to the new set of dimension instances.
This operation allows us to unify records and automatically propagate modifications to facts.
The list of update operations can be applied repeatedly to new data received
to be incorporated into the star_schema
object.
A star_schema
object.
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
,
update_selection_general()
st <- st_mrs_age |> modify_dimension_records(updates_st_mrs_age)
st <- st_mrs_age |> modify_dimension_records(updates_st_mrs_age)
Selection of data from the 122 Cities Mortality Reporting System, for the first 11 weeks of 1962.
mrs
mrs
A tibble
.
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Selection of data from the 122 Cities Mortality Reporting System by age group, for the first 9 weeks of 1962.
mrs_age
mrs_age
A tibble
.
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Selection of data from the 2 Cities Mortality Reporting System by age group, for the first 3 weeks of 1962.
mrs_age_test
mrs_age_test
A tibble
.
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Selection of data from the 3 Cities Mortality Reporting System by age group, for week 4 of 1962. It also includes some isolated data from previous weeks that is supposed to be corrections for data errors.
mrs_age_w_test
mrs_age_w_test
A tibble
.
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Selection of data from the 122 Cities Mortality Reporting System by age group, for week 10 of 1962. It also includes some isolated data from previous weeks that is supposed to be corrections for data errors.
mrs_age_w10
mrs_age_w10
A tibble
.
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Selection of data from the 122 Cities Mortality Reporting System by age group, for week 11 of 1962. It also includes some isolated data from previous weeks that is supposed to be corrections for data errors.
mrs_age_w11
mrs_age_w11
A tibble
.
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Selection of data from the 122 Cities Mortality Reporting System by cause, for the first 9 weeks of 1962.
mrs_cause
mrs_cause
A tibble
.
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Selection of data from the 2 Cities Mortality Reporting System by cause, for the first 3 weeks of 1962.
mrs_cause_test
mrs_cause_test
A tibble
.
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Selection of data from the 3 Cities Mortality Reporting System by cause, for week 4 of 1962. It also includes some isolated data from previous weeks that is supposed to be additional data not considered before.
mrs_cause_w_test
mrs_cause_w_test
A tibble
.
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Selection of data from the 122 Cities Mortality Reporting System by cause, for week 10 of 1962. It also includes some isolated data from previous weeks that is supposed to be additional data not considered before.
mrs_cause_w10
mrs_cause_w10
A tibble
.
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Selection of data from the 122 Cities Mortality Reporting System by cause, for week 11 of 1962. It also includes some isolated data from previous weeks that is supposed to be additional data not considered before.
mrs_cause_w11
mrs_cause_w11
A tibble
.
The original dataset begins in 1962. For each week, in 122 US cities, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.
Two additional dates have been generated, which were not present in the original dataset.
Multistar for the Mortality Reporting System considering age and cause classification. It is the result obtained in the vignette.
ms_mrs
ms_mrs
A multistar
object.
# Defined by: ms_mrs <- ct_mrs |> constellation_as_multistar()
# Defined by: ms_mrs <- ct_mrs |> constellation_as_multistar()
Multistar for the Mortality Reporting System considering age and cause classification data test.
ms_mrs_test
ms_mrs_test
A multistar
object.
# Defined by: ms_mrs_test <- ct_mrs_test |> constellation_as_multistar()
# Defined by: ms_mrs_test <- ct_mrs_test |> constellation_as_multistar()
multistar
as a flat tableWe can obtain a flat table, implemented using a tibble
, from a multistar
(which can be the result of a query). If it only has one fact table, it is
not necessary to provide its name.
multistar_as_flat_table(ms, fact = NULL) ## S3 method for class 'multistar' multistar_as_flat_table(ms, fact = NULL)
multistar_as_flat_table(ms, fact = NULL) ## S3 method for class 'multistar' multistar_as_flat_table(ms, fact = NULL)
ms |
A |
fact |
A string, name of the fact. |
A tibble
.
Other results export functions:
constellation_as_multistar()
,
constellation_as_tibble_list()
,
star_schema_as_flat_table()
,
star_schema_as_multistar()
,
star_schema_as_tibble_list()
ft <- ms_mrs |> multistar_as_flat_table(fact = "mrs_age") ms <- dimensional_query(ms_mrs) |> select_dimension(name = "where", attributes = c("city", "state")) |> select_dimension(name = "when", attributes = c("when_happened_year")) |> select_fact(name = "mrs_age", measures = c("n_deaths")) |> select_fact( name = "mrs_cause", measures = c("pneumonia_and_influenza_deaths", "other_deaths") ) |> filter_dimension(name = "when", when_happened_week <= "03") |> filter_dimension(name = "where", city == "Boston") |> run_query() ft <- ms |> multistar_as_flat_table()
ft <- ms_mrs |> multistar_as_flat_table(fact = "mrs_age") ms <- dimensional_query(ms_mrs) |> select_dimension(name = "where", attributes = c("city", "state")) |> select_dimension(name = "when", attributes = c("when_happened_year")) |> select_fact(name = "mrs_age", measures = c("n_deaths")) |> select_fact( name = "mrs_cause", measures = c("pneumonia_and_influenza_deaths", "other_deaths") ) |> filter_dimension(name = "when", when_happened_week <= "03") |> filter_dimension(name = "where", city == "Boston") |> run_query() ft <- ms |> multistar_as_flat_table()
Delete instances of dimensions not related to facts in a constellation.
purge_dimensions_constellation(ct) ## S3 method for class 'constellation' purge_dimensions_constellation(ct)
purge_dimensions_constellation(ct) ## S3 method for class 'constellation' purge_dimensions_constellation(ct)
ct |
A |
A constellation
object.
Other incremental refresh functions:
filter_fact_rows()
,
get_star_schema()
,
get_star_schema_names()
,
incremental_refresh_constellation()
,
incremental_refresh_star_schema()
,
purge_dimensions_star_schema()
ct <- ct_mrs |> purge_dimensions_constellation()
ct <- ct_mrs |> purge_dimensions_constellation()
Delete instances of dimensions not related to facts in a star schema.
purge_dimensions_star_schema(st) ## S3 method for class 'star_schema' purge_dimensions_star_schema(st)
purge_dimensions_star_schema(st) ## S3 method for class 'star_schema' purge_dimensions_star_schema(st)
st |
A |
A star_schema
object.
Other incremental refresh functions:
filter_fact_rows()
,
get_star_schema()
,
get_star_schema_names()
,
incremental_refresh_constellation()
,
incremental_refresh_star_schema()
,
purge_dimensions_constellation()
st <- st_mrs_age |> purge_dimensions_star_schema()
st <- st_mrs_age |> purge_dimensions_star_schema()
record_update_set
S3 classA record_update_set
object is created. Stores updates on dimension records.
record_update_set()
record_update_set()
Each update is made up of a dimension name, an old value set, and a new value set.
When the update is applied, all the dimension records that have the combination of old values are modified with the new values provided.
A record_update_set
object.
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
update_record()
,
update_selection()
,
update_selection_general()
updates <- record_update_set()
updates <- record_update_set()
Set new name for a dimension.
rename_dimension(st, name, new_name) ## S3 method for class 'star_schema' rename_dimension(st, name, new_name)
rename_dimension(st, name, new_name) ## S3 method for class 'star_schema' rename_dimension(st, name, new_name)
st |
A |
name |
A string, name of the dimension. |
new_name |
A string, new name of the dimension. |
A star_schema
object.
Other rename functions:
get_dimension_attribute_names()
,
get_measure_names()
,
rename_dimension_attributes()
,
rename_fact()
,
rename_measures()
st <- st_mrs_age |> rename_dimension(name = "when", new_name = "when_happened")
st <- st_mrs_age |> rename_dimension(name = "when", new_name = "when_happened")
Set new names of some attributes in a dimension.
rename_dimension_attributes(st, name, attributes, new_names) ## S3 method for class 'star_schema' rename_dimension_attributes(st, name, attributes, new_names)
rename_dimension_attributes(st, name, attributes, new_names) ## S3 method for class 'star_schema' rename_dimension_attributes(st, name, attributes, new_names)
st |
A |
name |
A string, name of the dimension. |
attributes |
A vector of attribute names. |
new_names |
A vector of new attribute names. |
A star_schema
object.
Other rename functions:
get_dimension_attribute_names()
,
get_measure_names()
,
rename_dimension()
,
rename_fact()
,
rename_measures()
st <- st_mrs_age |> rename_dimension_attributes( name = "when", attributes = c("week", "year"), new_names = c("w", "y") )
st <- st_mrs_age |> rename_dimension_attributes( name = "when", attributes = c("week", "year"), new_names = c("w", "y") )
Set new name for facts.
rename_fact(st, name) ## S3 method for class 'star_schema' rename_fact(st, name)
rename_fact(st, name) ## S3 method for class 'star_schema' rename_fact(st, name)
st |
A |
name |
A string, new name of the fact. |
A star_schema
object.
Other rename functions:
get_dimension_attribute_names()
,
get_measure_names()
,
rename_dimension()
,
rename_dimension_attributes()
,
rename_measures()
st <- st_mrs_age |> rename_fact("age")
st <- st_mrs_age |> rename_fact("age")
Set new names of some measures in facts.
rename_measures(st, measures, new_names) ## S3 method for class 'star_schema' rename_measures(st, measures, new_names)
rename_measures(st, measures, new_names) ## S3 method for class 'star_schema' rename_measures(st, measures, new_names)
st |
A |
measures |
A vector of measure names. |
new_names |
A vector of new measure names. |
A star_schema
object.
Other rename functions:
get_dimension_attribute_names()
,
get_measure_names()
,
rename_dimension()
,
rename_dimension_attributes()
,
rename_fact()
st <- st_mrs_age |> rename_measures(measures = c("deaths"), new_names = c("n_deaths"))
st <- st_mrs_age |> rename_measures(measures = c("deaths"), new_names = c("n_deaths"))
star_schema
objectGiven a list of star_schema
dimension names, all with the same structure, a
role playing dimension with the indicated name and attributes is generated.
The original dimensions become role dimensions defined from the new role
playing dimension.
role_playing_dimension(st, dim_names, name = NULL, attributes = NULL) ## S3 method for class 'star_schema' role_playing_dimension(st, dim_names, name = NULL, attributes = NULL)
role_playing_dimension(st, dim_names, name = NULL, attributes = NULL) ## S3 method for class 'star_schema' role_playing_dimension(st, dim_names, name = NULL, attributes = NULL)
st |
A |
dim_names |
A vector of dimension names. |
name |
A string, name of the role playing dimension. |
attributes |
A vector of attribute names of the role playing dimension. |
After definition, all role dimensions have the same virtual instances (those of the role playing dimension). The foreign keys in facts are adapted to this new situation.
A star_schema
object.
Other star schema and constellation definition functions:
character_dimensions()
,
constellation()
,
snake_case()
,
star_schema()
st <- star_schema(mrs_age, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("Date", "Week", "Year") ) st <- star_schema(mrs_cause, dm_mrs_cause) |> role_playing_dimension( dim_names = c("when", "when_received", "when_available"), name = "when_common", attributes = c("date", "week", "year") )
st <- star_schema(mrs_age, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("Date", "Week", "Year") ) st <- star_schema(mrs_cause, dm_mrs_cause) |> role_playing_dimension( dim_names = c("when", "when_received", "when_available"), name = "when_common", attributes = c("date", "week", "year") )
Once we have selected the facts, dimensions and defined the conditions on the instances, we can execute the query to obtain the result.
run_query(dq, unify_by_grain = TRUE) ## S3 method for class 'dimensional_query' run_query(dq, unify_by_grain = TRUE)
run_query(dq, unify_by_grain = TRUE) ## S3 method for class 'dimensional_query' run_query(dq, unify_by_grain = TRUE)
dq |
A |
unify_by_grain |
A boolean, unify facts with the same grain. |
As an option, we can indicate if we do not want to unify the facts in the case of having the same grain.
A dimensional_query
object.
Other query functions:
dimensional_query()
,
filter_dimension()
,
select_dimension()
,
select_fact()
ms <- dimensional_query(ms_mrs) |> select_dimension(name = "where", attributes = c("city", "state")) |> select_dimension(name = "when", attributes = c("when_happened_year")) |> select_fact( name = "mrs_age", measures = c("n_deaths"), agg_functions = c("MAX") ) |> select_fact( name = "mrs_cause", measures = c("pneumonia_and_influenza_deaths", "other_deaths") ) |> filter_dimension(name = "when", when_happened_week <= "03") |> filter_dimension(name = "where", city == "Boston") |> run_query()
ms <- dimensional_query(ms_mrs) |> select_dimension(name = "where", attributes = c("city", "state")) |> select_dimension(name = "when", attributes = c("when_happened_year")) |> select_fact( name = "mrs_age", measures = c("n_deaths"), agg_functions = c("MAX") ) |> select_fact( name = "mrs_cause", measures = c("pneumonia_and_influenza_deaths", "other_deaths") ) |> filter_dimension(name = "when", when_happened_week <= "03") |> filter_dimension(name = "where", city == "Boston") |> run_query()
To add a dimension in a dimensional_query
object, we have to define its
name and a subset of the dimension attributes. If only the name of the
dimension is indicated, it is considered that all its attributes should be
added.
select_dimension(dq, name = NULL, attributes = NULL) ## S3 method for class 'dimensional_query' select_dimension(dq, name = NULL, attributes = NULL)
select_dimension(dq, name = NULL, attributes = NULL) ## S3 method for class 'dimensional_query' select_dimension(dq, name = NULL, attributes = NULL)
dq |
A |
name |
A string, name of the dimension. |
attributes |
A vector of attribute names. |
A dimensional_query
object.
Other query functions:
dimensional_query()
,
filter_dimension()
,
run_query()
,
select_fact()
dq <- dimensional_query(ms_mrs) |> select_dimension(name = "where", attributes = c("city", "state")) |> select_dimension(name = "when")
dq <- dimensional_query(ms_mrs) |> select_dimension(name = "where", attributes = c("city", "state")) |> select_dimension(name = "when")
To define the fact to be consulted, its name is indicated, optionally, a vector of names of selected measures and another of aggregation functions are also indicated.
select_fact(dq, name = NULL, measures = NULL, agg_functions = NULL) ## S3 method for class 'dimensional_query' select_fact(dq, name = NULL, measures = NULL, agg_functions = NULL)
select_fact(dq, name = NULL, measures = NULL, agg_functions = NULL) ## S3 method for class 'dimensional_query' select_fact(dq, name = NULL, measures = NULL, agg_functions = NULL)
dq |
A |
name |
A string, name of the fact. |
measures |
A vector of measure names. |
agg_functions |
A vector of aggregation function names. If none is indicated, those defined in the fact table are considered. |
If the name of any measure is not indicated, only the one corresponding to the number of aggregated rows is included, which is always included.
If no aggregation function is included, those defined for the measures are considered.
A dimensional_query
object.
Other query functions:
dimensional_query()
,
filter_dimension()
,
run_query()
,
select_dimension()
dq <- dimensional_query(ms_mrs) |> select_fact( name = "mrs_age", measures = c("n_deaths"), agg_functions = c("MAX") ) dq <- dimensional_query(ms_mrs) |> select_fact(name = "mrs_age", measures = c("n_deaths")) dq <- dimensional_query(ms_mrs) |> select_fact(name = "mrs_age")
dq <- dimensional_query(ms_mrs) |> select_fact( name = "mrs_age", measures = c("n_deaths"), agg_functions = c("MAX") ) dq <- dimensional_query(ms_mrs) |> select_fact(name = "mrs_age", measures = c("n_deaths")) dq <- dimensional_query(ms_mrs) |> select_fact(name = "mrs_age")
Transform fact, dimension, measurement, and attribute names according to the snake case style.
snake_case(st) ## S3 method for class 'star_schema' snake_case(st)
snake_case(st) ## S3 method for class 'star_schema' snake_case(st)
st |
A |
This style is suitable if we are going to work with databases.
A star_schema
object.
Other star schema and constellation definition functions:
character_dimensions()
,
constellation()
,
role_playing_dimension()
,
star_schema()
st <- star_schema(mrs_age, dm_mrs_age) |> snake_case() st <- star_schema(mrs_age, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("Date", "Week", "Year") ) |> snake_case()
st <- star_schema(mrs_age, dm_mrs_age) |> snake_case() st <- star_schema(mrs_age, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("Date", "Week", "Year") ) |> snake_case()
Star Schema for the Mortality Reporting System considering the age classification.
st_mrs_age
st_mrs_age
A star_schema
object.
# Defined by: st_mrs_age <- star_schema(mrs_age, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("date", "week", "year") ) |> snake_case() |> character_dimensions(NA_replacement_value = "Unknown", length_integers = list(week = 2))
# Defined by: st_mrs_age <- star_schema(mrs_age, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("date", "week", "year") ) |> snake_case() |> character_dimensions(NA_replacement_value = "Unknown", length_integers = list(week = 2))
Star Schema for the Mortality Reporting System considering the age classification data test.
st_mrs_age_test
st_mrs_age_test
A star_schema
object.
# Defined by: st_mrs_age_test <- star_schema(mrs_age_test, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("date", "week", "year") ) |> snake_case() |> character_dimensions(NA_replacement_value = "Unknown", length_integers = list(week = 2))
# Defined by: st_mrs_age_test <- star_schema(mrs_age_test, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("date", "week", "year") ) |> snake_case() |> character_dimensions(NA_replacement_value = "Unknown", length_integers = list(week = 2))
Star Schema for the Mortality Reporting System considering the age classification data test, for week 4 of 1962. It also includes some isolated data from previous weeks that is supposed to be corrections for data errors.
st_mrs_age_w_test
st_mrs_age_w_test
A star_schema
object.
# Defined by: st_mrs_age_w_test <- star_schema(mrs_age_w_test, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("date", "week", "year") ) |> snake_case() |> character_dimensions(NA_replacement_value = "Unknown", length_integers = list(week = 2))
# Defined by: st_mrs_age_w_test <- star_schema(mrs_age_w_test, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("date", "week", "year") ) |> snake_case() |> character_dimensions(NA_replacement_value = "Unknown", length_integers = list(week = 2))
Star Schema for the Mortality Reporting System considering the age classification data, for week 10 of 1962. It also includes some isolated data from previous weeks that is supposed to be corrections for data errors.
st_mrs_age_w10
st_mrs_age_w10
A star_schema
object.
# Defined by: st_mrs_age_w10 <- star_schema(mrs_age_w10, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("date", "week", "year") ) |> snake_case() |> character_dimensions(NA_replacement_value = "Unknown", length_integers = list(week = 2))
# Defined by: st_mrs_age_w10 <- star_schema(mrs_age_w10, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("date", "week", "year") ) |> snake_case() |> character_dimensions(NA_replacement_value = "Unknown", length_integers = list(week = 2))
Star Schema for the Mortality Reporting System considering the age classification data, for week 11 of 1962. It also includes some isolated data from previous weeks that is supposed to be corrections for data errors.
st_mrs_age_w11
st_mrs_age_w11
A star_schema
object.
# Defined by: st_mrs_age_w11 <- star_schema(mrs_age_w11, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("date", "week", "year") ) |> snake_case() |> character_dimensions(NA_replacement_value = "Unknown", length_integers = list(week = 2))
# Defined by: st_mrs_age_w11 <- star_schema(mrs_age_w11, dm_mrs_age) |> role_playing_dimension( dim_names = c("when", "when_available"), name = "When Common", attributes = c("date", "week", "year") ) |> snake_case() |> character_dimensions(NA_replacement_value = "Unknown", length_integers = list(week = 2))
Star Schema for the Mortality Reporting System considering the cause classification.
st_mrs_cause
st_mrs_cause
A star_schema
object.
# Defined by: st_mrs_cause <- star_schema(mrs_cause, dm_mrs_cause) |> snake_case() |> character_dimensions( NA_replacement_value = "Unknown", length_integers = list( week = 2, data_availability_week = 2, reception_week = 2 ) ) |> role_playing_dimension( dim_names = c("when", "when_received", "when_available"), name = "when_common", attributes = c("date", "week", "year") )
# Defined by: st_mrs_cause <- star_schema(mrs_cause, dm_mrs_cause) |> snake_case() |> character_dimensions( NA_replacement_value = "Unknown", length_integers = list( week = 2, data_availability_week = 2, reception_week = 2 ) ) |> role_playing_dimension( dim_names = c("when", "when_received", "when_available"), name = "when_common", attributes = c("date", "week", "year") )
Star Schema for the Mortality Reporting System considering the cause classification data test.
st_mrs_cause_test
st_mrs_cause_test
A star_schema
object.
# Defined by: st_mrs_cause_test <- star_schema(mrs_cause_test, dm_mrs_cause) |> snake_case() |> character_dimensions( NA_replacement_value = "Unknown", length_integers = list( week = 2, data_availability_week = 2, reception_week = 2 ) ) |> role_playing_dimension( dim_names = c("when", "when_received", "when_available"), name = "when_common", attributes = c("date", "week", "year") )
# Defined by: st_mrs_cause_test <- star_schema(mrs_cause_test, dm_mrs_cause) |> snake_case() |> character_dimensions( NA_replacement_value = "Unknown", length_integers = list( week = 2, data_availability_week = 2, reception_week = 2 ) ) |> role_playing_dimension( dim_names = c("when", "when_received", "when_available"), name = "when_common", attributes = c("date", "week", "year") )
Star Schema for the Mortality Reporting System considering the cause classification data test, for week 4 of 1962. It also includes some isolated data from previous weeks that is supposed to be additional data not considered before.
st_mrs_cause_w_test
st_mrs_cause_w_test
A star_schema
object.
# Defined by: st_mrs_cause_w_test <- star_schema(mrs_cause_w_test, dm_mrs_cause) |> snake_case() |> character_dimensions( NA_replacement_value = "Unknown", length_integers = list( week = 2, data_availability_week = 2, reception_week = 2 ) ) |> role_playing_dimension( dim_names = c("when", "when_received", "when_available"), name = "when_common", attributes = c("date", "week", "year") )
# Defined by: st_mrs_cause_w_test <- star_schema(mrs_cause_w_test, dm_mrs_cause) |> snake_case() |> character_dimensions( NA_replacement_value = "Unknown", length_integers = list( week = 2, data_availability_week = 2, reception_week = 2 ) ) |> role_playing_dimension( dim_names = c("when", "when_received", "when_available"), name = "when_common", attributes = c("date", "week", "year") )
Star Schema for the Mortality Reporting System considering the cause classification data, for week 10 of 1962. It also includes some isolated data from previous weeks that is supposed to be additional data not considered before.
st_mrs_cause_w10
st_mrs_cause_w10
A star_schema
object.
# Defined by: st_mrs_cause_w10 <- star_schema(mrs_cause_w10, dm_mrs_cause) |> snake_case() |> character_dimensions( NA_replacement_value = "Unknown", length_integers = list( week = 2, data_availability_week = 2, reception_week = 2 ) ) |> role_playing_dimension( dim_names = c("when", "when_received", "when_available"), name = "when_common", attributes = c("date", "week", "year") )
# Defined by: st_mrs_cause_w10 <- star_schema(mrs_cause_w10, dm_mrs_cause) |> snake_case() |> character_dimensions( NA_replacement_value = "Unknown", length_integers = list( week = 2, data_availability_week = 2, reception_week = 2 ) ) |> role_playing_dimension( dim_names = c("when", "when_received", "when_available"), name = "when_common", attributes = c("date", "week", "year") )
Star Schema for the Mortality Reporting System considering the cause classification data, for week 11 of 1962. It also includes some isolated data from previous weeks that is supposed to be additional data not considered before.
st_mrs_cause_w11
st_mrs_cause_w11
A star_schema
object.
# Defined by: st_mrs_cause_w11 <- star_schema(mrs_cause_w11, dm_mrs_cause) |> snake_case() |> character_dimensions( NA_replacement_value = "Unknown", length_integers = list( week = 2, data_availability_week = 2, reception_week = 2 ) ) |> role_playing_dimension( dim_names = c("when", "when_received", "when_available"), name = "when_common", attributes = c("date", "week", "year") )
# Defined by: st_mrs_cause_w11 <- star_schema(mrs_cause_w11, dm_mrs_cause) |> snake_case() |> character_dimensions( NA_replacement_value = "Unknown", length_integers = list( week = 2, data_availability_week = 2, reception_week = 2 ) ) |> role_playing_dimension( dim_names = c("when", "when_received", "when_available"), name = "when_common", attributes = c("date", "week", "year") )
star_schema
S3 classCreates a star_schema
object from a flat table (implemented by a tibble
)
and a dimensional_model
object.
star_schema(ft, sd)
star_schema(ft, sd)
ft |
A |
sd |
A |
Transforms the flat table data according to the facts and dimension
definitions of the dimensional_model
object. Each dimension is generated with
a surrogate key which is a foreign key in facts.
Facts only contain measurements and foreign keys.
A star_schema
object.
Other star schema and constellation definition functions:
character_dimensions()
,
constellation()
,
role_playing_dimension()
,
snake_case()
st <- star_schema(mrs_age, dm_mrs_age)
st <- star_schema(mrs_age, dm_mrs_age)
Once we have refined the format or content of facts and dimensions, we can
again obtain a flat table, implemented using a tibble
, from a star schema.
star_schema_as_flat_table(st) ## S3 method for class 'star_schema' star_schema_as_flat_table(st)
star_schema_as_flat_table(st) ## S3 method for class 'star_schema' star_schema_as_flat_table(st)
st |
A |
A tibble
.
Other results export functions:
constellation_as_multistar()
,
constellation_as_tibble_list()
,
multistar_as_flat_table()
,
star_schema_as_multistar()
,
star_schema_as_tibble_list()
ft <- st_mrs_age |> star_schema_as_flat_table()
ft <- st_mrs_age |> star_schema_as_flat_table()
multistar
Once we have refined the format or content of facts and dimensions, we can
obtain a multistar
. A multistar
only distinguishes between general and
conformed dimensions, each dimension has its own data. It can contain
multiple fact tables.
star_schema_as_multistar(st) ## S3 method for class 'star_schema' star_schema_as_multistar(st)
star_schema_as_multistar(st) ## S3 method for class 'star_schema' star_schema_as_multistar(st)
st |
A |
A multistar
object.
Other results export functions:
constellation_as_multistar()
,
constellation_as_tibble_list()
,
multistar_as_flat_table()
,
star_schema_as_flat_table()
,
star_schema_as_tibble_list()
ms <- st_mrs_age |> star_schema_as_multistar()
ms <- st_mrs_age |> star_schema_as_multistar()
tibble
listOnce we have refined the format or content of facts and dimensions, we can
obtain a tibble
list with them. Role playing dimensions can be optionally
included.
star_schema_as_tibble_list(st, include_role_playing = FALSE) ## S3 method for class 'star_schema' star_schema_as_tibble_list(st, include_role_playing = FALSE)
star_schema_as_tibble_list(st, include_role_playing = FALSE) ## S3 method for class 'star_schema' star_schema_as_tibble_list(st, include_role_playing = FALSE)
st |
A |
include_role_playing |
A boolean. |
A list of tibble
objects.
Other results export functions:
constellation_as_multistar()
,
constellation_as_tibble_list()
,
multistar_as_flat_table()
,
star_schema_as_flat_table()
,
star_schema_as_multistar()
tl <- st_mrs_age |> star_schema_as_tibble_list() tl <- st_mrs_age |> star_schema_as_tibble_list(include_role_playing = TRUE)
tl <- st_mrs_age |> star_schema_as_tibble_list() tl <- st_mrs_age |> star_schema_as_tibble_list(include_role_playing = TRUE)
Transformations that allow obtaining star schemas from flat tables.
From flat tables star schemas can be defined that can form constellations (star schema and constellation definition functions). Dimensions contain data without duplicates, operations to do data cleaning can be applied on them (data cleaning functions). Dimensions can be enriched by adding additional columns, sometimes using functions, others explicitly defined by the user (dimension enrichment functions). When new data is obtained, it is necessary to refresh the existing data with them by means of incremental refresh operations or delete data that is no longer necessary (incremental refresh functions). Finally, the results obtained can be exported to be consulted with other tools (results export functions) or through the defined query functions (query functions).
Starting from a flat
table, a dimensional model is defined specifying the attributes that make
up each of the dimensions and the measurements in the facts. The result is
a dimensional_model
object. It is carried out through the following
dimensional model definition functions:
A star schema is defined from a flat table and a dimensional model definition. Once defined, a star schema can be transformed by defining role playing dimensions, changing the writing style of element names or the type of dimension attributes. These operations are carried out through the following star schema definition and transformation functions:
Once a star schema is defined, we can rename its elements. It is necessary to be able to rename attributes of dimensions and measures of facts because the definition operations only allowed us to select columns of a flat table. For completeness also dimensions and facts can be renamed. To carry out these operations, the following star schema rename functions are available:
Based on various star schemas, a constellation can be defined in which star schemas share common dimensions. Dimensions with the same name must be shared. It is defined by the following constellation definition function:
Once the star schemas and constellations are defined, data cleaning operations can be carried out on dimensions. There are three groups of functions: one to obtain dimensions of star schemas and constellations; another to define data cleaning operations over dimensions; and one more to apply operations to star schemas or constellations.
Obtaining dimensions:
Update definition functions:
Modification application functions:
To enrich a dimension with new attributes related to others already included in it, first, we export the attributes on which the new ones depend, then we define the new attributes, and import the table with all the attributes to be added to the dimension.
When new data is obtained, an incremental refresh of the data can be carried out, both of the dimensions and of the facts. Incremental refresh can be applied to both star schema and constellation, using the following functions:
Sometimes the data refresh consists of eliminating data that is no longer necessary, generally because it corresponds to a period that has stopped being analysed but it can also be for other reasons. This data can be selected using the following function:
Once the fact data is removed (using the other incremental refresh functions), we can remove the data for the dimensions that are no longer needed using the following functions:
Once the data has been properly structured and transformed, it can be exported to be consulted with other tools or with R. Various export formats have been defined, both for star schemas and for constellations, using the following functions:
There are many multidimensional query tools
available. The exported data, once stored in files, can be used directly
from them. Using the following functions, you can also perform basic
queries from R on data in the multistar
format:
For a dimension, given the primary key of one record, it adds an update to the set of updates that modifies the combination of values of the rest of attributes of the selected record so that they become those given.
update_record(updates = NULL, dimension, old, values = vector()) ## S3 method for class 'record_update_set' update_record(updates = NULL, dimension, old, values = vector())
update_record(updates = NULL, dimension, old, values = vector()) ## S3 method for class 'record_update_set' update_record(updates = NULL, dimension, old, values = vector())
updates |
A |
dimension |
A |
old |
A number, primary key of the record to modify. |
values |
A vector of character values. |
Primary key is only used to get the combination of values easily. The update is defined exclusively from the rest of values.
A record_update_set
object.
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_selection()
,
update_selection_general()
dim_names <- st_mrs_age |> get_dimension_names() where <- st_mrs_age |> get_dimension("where") # head(where, 2) updates <- record_update_set() |> update_record( dimension = where, old = 1, values = c("1", "CT", "Bridgeport") )
dim_names <- st_mrs_age |> get_dimension_names() where <- st_mrs_age |> get_dimension("where") # head(where, 2) updates <- record_update_set() |> update_record( dimension = where, old = 1, values = c("1", "CT", "Bridgeport") )
For a dimension, given a vector of column names, a vector of old values and a vector of new values, it adds an update to the set of updates that modifies all the records that have the combination of old values in the columns with the new values in those same columns.
update_selection( updates = NULL, dimension, columns = vector(), old_values = vector(), new_values = vector() ) ## S3 method for class 'record_update_set' update_selection( updates = NULL, dimension, columns = vector(), old_values = vector(), new_values = vector() )
update_selection( updates = NULL, dimension, columns = vector(), old_values = vector(), new_values = vector() ) ## S3 method for class 'record_update_set' update_selection( updates = NULL, dimension, columns = vector(), old_values = vector(), new_values = vector() )
updates |
A |
dimension |
A |
columns |
A vector of column names. |
old_values |
A vector of character values. |
new_values |
A vector of character values. |
A record_update_set
object.
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection_general()
dim_names <- st_mrs_age |> get_dimension_names() where <- st_mrs_age |> get_dimension("where") # head(where, 2) updates <- record_update_set() |> update_selection( dimension = where, columns = c("city"), old_values = c("Bridgepor"), new_values = c("Bridgeport") )
dim_names <- st_mrs_age |> get_dimension_names() where <- st_mrs_age |> get_dimension("where") # head(where, 2) updates <- record_update_set() |> update_selection( dimension = where, columns = c("city"), old_values = c("Bridgepor"), new_values = c("Bridgeport") )
For a dimension, given a vector of column names, a vector of old values for those columns, another vector column names, and a vector of new values for those columns, it adds an update to the set of updates that modifies all the records that have the combination of old values in the first column vector with the new values in the second column vector.
update_selection_general( updates = NULL, dimension, columns_old = vector(), old_values = vector(), columns_new = vector(), new_values = vector() ) ## S3 method for class 'record_update_set' update_selection_general( updates = NULL, dimension, columns_old = vector(), old_values = vector(), columns_new = vector(), new_values = vector() )
update_selection_general( updates = NULL, dimension, columns_old = vector(), old_values = vector(), columns_new = vector(), new_values = vector() ) ## S3 method for class 'record_update_set' update_selection_general( updates = NULL, dimension, columns_old = vector(), old_values = vector(), columns_new = vector(), new_values = vector() )
updates |
A |
dimension |
A |
columns_old |
A vector of column names. |
old_values |
A vector of character values. |
columns_new |
A vector of column names. |
new_values |
A vector of character values. |
A record_update_set
object.
Other data cleaning functions:
get_conformed_dimension()
,
get_conformed_dimension_names()
,
get_dimension()
,
get_dimension_names()
,
match_records()
,
modify_conformed_dimension_records()
,
modify_dimension_records()
,
record_update_set()
,
update_record()
,
update_selection()
dim_names <- st_mrs_age |> get_dimension_names() where <- st_mrs_age |> get_dimension("where") # head(where, 2) updates <- record_update_set() |> update_selection_general( dimension = where, columns_old = c("state", "city"), old_values = c("CT", "Bridgepor"), columns_new = c("city"), new_values = c("Bridgeport") )
dim_names <- st_mrs_age |> get_dimension_names() where <- st_mrs_age |> get_dimension("where") # head(where, 2) updates <- record_update_set() |> update_selection_general( dimension = where, columns_old = c("state", "city"), old_values = c("CT", "Bridgepor"), columns_new = c("city"), new_values = c("Bridgeport") )
Example of updates on some dimensions of the star schema for Mortality Reporting System by age.
updates_st_mrs_age
updates_st_mrs_age
A record_update_set
object.
# Defined by: (dim_names <- st_mrs_age |> get_dimension_names()) where <- st_mrs_age |> get_dimension("where") when <- st_mrs_age |> get_dimension("when") who <- st_mrs_age |> get_dimension("who") updates_st_mrs_age <- record_update_set() |> update_selection_general( dimension = where, columns_old = c("state", "city"), old_values = c("CT", "Bridgepor"), columns_new = c("city"), new_values = c("Bridgeport") ) |> match_records(dimension = when, old = 37, new = 36) |> update_record( dimension = when, old = 73, values = c("1962-02-17", "07", "1962") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("<1 year"), new_values = c("1: <1 year") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("1-24 years"), new_values = c("2: 1-24 years") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("25-44 years"), new_values = c("3: 25-44 years") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("45-64 years"), new_values = c("4: 45-64 years") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("65+ years"), new_values = c("5: 65+ years") )
# Defined by: (dim_names <- st_mrs_age |> get_dimension_names()) where <- st_mrs_age |> get_dimension("where") when <- st_mrs_age |> get_dimension("when") who <- st_mrs_age |> get_dimension("who") updates_st_mrs_age <- record_update_set() |> update_selection_general( dimension = where, columns_old = c("state", "city"), old_values = c("CT", "Bridgepor"), columns_new = c("city"), new_values = c("Bridgeport") ) |> match_records(dimension = when, old = 37, new = 36) |> update_record( dimension = when, old = 73, values = c("1962-02-17", "07", "1962") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("<1 year"), new_values = c("1: <1 year") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("1-24 years"), new_values = c("2: 1-24 years") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("25-44 years"), new_values = c("3: 25-44 years") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("45-64 years"), new_values = c("4: 45-64 years") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("65+ years"), new_values = c("5: 65+ years") )
Example of updates on some dimensions of the star schema for Mortality Reporting System by age test.
updates_st_mrs_age_test
updates_st_mrs_age_test
A record_update_set
object.
# Defined by: (dim_names <- st_mrs_age_test |> get_dimension_names()) where <- st_mrs_age_test |> get_dimension("where") when <- st_mrs_age_test |> get_dimension("when") who <- st_mrs_age_test |> get_dimension("who") updates_st_mrs_age_test <- record_update_set() |> update_selection_general( dimension = where, columns_old = c("state", "city"), old_values = c("CT", "Bridgepor"), columns_new = c("city"), new_values = c("Bridgeport") ) |> match_records(dimension = when, old = 4, new = 3) |> update_record( dimension = when, old = 9, values = c("1962-01-20", "03", "1962") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("<1 year"), new_values = c("1: <1 year") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("1-24 years"), new_values = c("2: 1-24 years") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("25-44 years"), new_values = c("3: 25-44 years") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("45-64 years"), new_values = c("4: 45-64 years") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("65+ years"), new_values = c("5: 65+ years") )
# Defined by: (dim_names <- st_mrs_age_test |> get_dimension_names()) where <- st_mrs_age_test |> get_dimension("where") when <- st_mrs_age_test |> get_dimension("when") who <- st_mrs_age_test |> get_dimension("who") updates_st_mrs_age_test <- record_update_set() |> update_selection_general( dimension = where, columns_old = c("state", "city"), old_values = c("CT", "Bridgepor"), columns_new = c("city"), new_values = c("Bridgeport") ) |> match_records(dimension = when, old = 4, new = 3) |> update_record( dimension = when, old = 9, values = c("1962-01-20", "03", "1962") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("<1 year"), new_values = c("1: <1 year") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("1-24 years"), new_values = c("2: 1-24 years") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("25-44 years"), new_values = c("3: 25-44 years") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("45-64 years"), new_values = c("4: 45-64 years") ) |> update_selection( dimension = who, columns = c("age_range"), old_values = c("65+ years"), new_values = c("5: 65+ years") )