R/transformation.R
data-normalization.Rd
dc_cosine
is the cosine transformation.
dc_logistic
is the logistic transformation.
dc_zscore
is the zscore transformation.
dc_dist_canberra
computes the Canberra distance between 2 numeric vectors.
dc_dist_cosine
computes the cosine angle distance between 2 numeric vectors.
dc_dist_euclidean
compute the Euclidience distance between 2 numeric vectors.
dc_dist_pearson
compute the Pearson correlation distance between 2 numeric vectors.
dc_cosine(x, max = 100) dc_logistic(x, max = 100) dc_zscore(x) dc_dist_canberra(x, y) dc_dist_cosine(x, y) dc_dist_euclidean(x, y) dc_dist_pearson(x, y) dc_trim_outlier(x, fraction = 0.01) dc_normalize_ptile(x, fraction = 0.01) get_confidence_interval(x, level = 0.95) dc_decile_band(x, n = NA) dc_decile_ptile(x, band_ptile = c(seq(0, 0.95, 0.05))) dc_rank_ptile(x, level_rank = c(1, 2, 3, 4, seq(5, 100, 5))) dc_mode(x, na.rm = FALSE) dc_ceiling(x, digits = 0, na.rm = FALSE)
x | A numeric vector |
---|---|
max | A numeric value |
y | A numeric vector |
fraction | The percentile value (0 to 0.5) to trim out |
level | The CI level (0.5 to 1.0) of observations to be measured. |
band_ptile | The percentail band (0.0 to 1.0) |
level_rank | The rank level (0.0 to 1.0) for calculating percentile |
na.rm | A logical value indicating whether NA values should be stripped before the computation proceeds. |
digits | similar to rbase::round() which is integer indicating the number of decimal places (round) or significant digits (signif) to be used. Negative values are allowed |
returns a numeric vector after normaliztion or distance between 2 vectors.
dc_ceiling
similar to rbase::ceiling() with support decimal round up
dc_mode
compute the stats mode
dc_rank_ptile
add columns with ranked percentiles
dc_decile_band
add columns with decile bands
dc_decile_ptile
add columns with decile percentiles
library(dacol) library(dplyr) max = 30 dta1 = tibble(x1 = seq(-1.2*max, 1.2*max, length.out = 200), x2 = seq(1, max, length.out = 200), x3 = sample(200)) dta1 = mutate(dta1, # Transformation y_cosine = dc_cosine(x1, max), y_logistic = dc_logistic(x2, max), y_zcore = dc_zscore(x2), # Distant between 2 vector columns y_dist_canb = dc_dist_canberra(x2, x3), y_dist_cos = dc_dist_cosine(x2, y_zcore), y_dist_euc = dc_dist_euclidean(x2, y_zcore), y_dist_pear = dc_dist_pearson(x2, y_zcore), # Manage outliers y_trim = dc_trim_outlier(x3, 0.01), y_norm = dc_normalize_ptile(x3, 0.01), # Stats measures y_mode = dc_mode(x3), y_ceil = dc_ceiling(x3, -1), # Band segmentation y_dec_band1 = dc_decile_band(x3), y_dec_band2 = dc_decile_band(x3, c(seq(0, 0.9, 0.1))), y_dec_ptile1 = dc_decile_ptile(x3), y_dec_ptile2 = dc_decile_ptile(x3, c(seq(0, 0.9, 0.1))) )#> Warning: the condition has length > 1 and only the first element will be used