--- title: "Neat Data for Presentation" output: rmarkdown::html_vignette author: Shiva vignette: > %\VignetteIndexEntry{Neat Data for Presentation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} require(neatR) ``` We use R extensively not just for intensive computation, but also for presentation. Javascript visualization libraries in R and elegant ways to present data using R markdown makes R one stop shop for analytics and data science. We spend most of the time preparing, cleaning, analyzing and modeling the data. However, the last leg of analytics, which is presentation of results don't get enough attention most of the times. neatR package helps in formatting the results by providing simple utility functions covering common use cases. ### Formatting dates Often, we encounter dates which are either in `mm/dd/yyyy` or `dd/mm/yyyy` format and wondering what is the month or what is the date especially if there are no date values after 12th day of a month. An unambiguous approach would be to show the date in `mmm dd, yyyy` format with day of week which is easier to grasp. ```{r echo = TRUE} ndate(Sys.Date() - 3) ndate(Sys.Date() - 1) ndate(Sys.Date()) ndate(Sys.Date() + 1) ndate(Sys.Date() + 4) ``` To just get the date without the day of week, set `display.weekday` to `FALSE` ```{r echo = TRUE} ndate(Sys.Date(), display.weekday = FALSE) ``` When we are looking at the monthly data, abbreviating the date to mmm'yy is an elegant way to show the date and often helpful for charts. ```{r echo = TRUE} ndate(Sys.Date(), display.weekday = FALSE, is.month = TRUE) ``` To see the context of the date with respect to current date (referring dates within 1 week before or after current date), use the `nday` function. Day of week with context based on current date, `reference.alias` can be directly used on dates or timestamps. ```{r echo = TRUE} nday(Sys.Date(), reference.alias = FALSE) nday(Sys.Date(), reference.alias = TRUE) nday(Sys.time(), reference.alias = TRUE) ``` Below is another example with context based on current date. ```{r echo = TRUE} x <- seq(Sys.Date() - 10, Sys.Date() + 10, by = '1 day') nday(x, reference.alias = TRUE) ``` ### Formatting timestamp Timestamps are feature rich representation of date and time. ```{r echo = TRUE} ntimestamp(Sys.time()) ``` To format only date from the timestamp, we can use `ndate` function. ```{r echo = TRUE} ndate(Sys.time()) ``` To extract and format only the time from timestamp, we can do the following, ```{r echo = TRUE} ntimestamp(Sys.time(), display.weekday = FALSE, include.date = FALSE, include.timezone = FALSE) ``` Note: Hours are shown based on 12H clock format with AM / PM suffix. Components of time can be toggled on or off based on preference. ```{r echo = TRUE} ntimestamp(Sys.time(), include.date = FALSE, display.weekday = FALSE, include.hours = TRUE, include.minutes = TRUE, include.seconds = FALSE, include.timezone = FALSE) ``` Timezone can be toggled on or off using `include.timezone` parameter. ```{r echo = TRUE} ntimestamp(Sys.time(), include.timezone = FALSE) ``` ### Formatting number Most of the times, we deal with large numbers which are shown in scientific format from the output of a statistical model or just the raw data itself. `nnumber` can format the numeric data and show them in easily readable way. By default, the numbers are formatted in a more appropriate unit that best represents individual values. See the below example, ```{r echo = TRUE} x <- c(10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000) nnumber(x) nnumber(x, digits = 0) ``` `nnumber` can automatically determine best single unit to display all the numbers by setting `unit = 'auto'`. In the below example the unit of thousand seem to best fit most of the numbers. Any number lower than 0.1K are displayed as '<0.1K' for easier reference. ```{r echo = TRUE} x <- c(1e6, 99e3, 76e3, 42e3, 12e3, 789, 53) nnumber(x, unit = 'auto') ``` We can specify the units in which the number to be formatted, ```{r echo = TRUE} nnumber(123456789.123456, unit = 'Mn') ``` Default units are, 'K' for thousand, 'Mn' for million, 'Bn' for billion, 'Tn' for trillions. The unit labeling can be customized using `unit.labels` which is a list encompassing values and labels. ```{r echo = TRUE} nnumber(123456789.123456, unit = 'M', unit.labels = list(million = 'M')) ``` Below example, gives customization of all units. ```{r echo = TRUE} x <- c(10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000) nnumber(x, unit.labels = list(thousand = 'K', million = 'M', billion= 'B', trillion = T)) ``` Along with the formatted number, we can (optionally) add a prefix or suffix. ```{r echo = TRUE} nnumber(123456789.123456, unit = 'M', unit.labels = list(million = 'M'), prefix = '$ ') nnumber(123456789.123456, unit = 'M', unit.labels = list(million = 'M'), suffix = ' CAD') nnumber(123456789.123456, unit = 'M', unit.labels = list(million = 'M'), prefix = '$ ', suffix = ' CAD') ``` Sometimes, we are interested in showing the number as it is, which can be done by setting `unit = ''`. `thousand.separator` parameter is useful in separating the thousands which makes it easy to read the numbers. ```{r echo = TRUE} nnumber(123456789.123456, digits = 2, unit = '', thousand.separator = ',') ``` `thousand.separator` can take the following values `",", ".", "'", " ", "_", ""` The parameter `unit` can take any of the following values, `custom`: Unit is customized for each individual values. This is the default value to the `unit` parameter. `auto`: A single unit that best represents the overall data is automatically detected and applied based on majority of the values. `K`: The numbers are displayed in thousands. `Mn`: The number are displayed in millions. `Bn`: The number are displayed in billions. `Tn`: The number are displayed in trillions. If the unit labels are customized and provided via a list, for an example: `unit.labels = list(thousand = 'k')` then this string `k` to be provided for the `unit`. ### Formatting percentages Percentage data can come in two types, with or without multiplied by 100. For an example, 22.8% can be stored as 22.8 or 0.228 ```{r echo = TRUE} npercent(22.8, is.decimal = FALSE) npercent(0.228, is.decimal = TRUE) ``` By default, `is.decimal` is set as TRUE and decimal digits is set to 1. It is also useful to show if the percent is a positive number by adding a prefix of plus sign. This is the default behavior of the npercent function, which can be set to FALSE ```{r echo = TRUE} npercent(0.228, plus.sign = TRUE) npercent(0.228, plus.sign = FALSE) ``` When the percentages are high (especially while calculating growth from time A to time B), it would be easy to read this as 'nX'. ```{r echo = TRUE} tesla_2017 <- 20 tesla_2023 <- 200 g <- (tesla_2023 - tesla_2017)/tesla_2017 npercent(g, plus.sign = TRUE) npercent(g, plus.sign = TRUE, factor.out = TRUE) ``` ### Formatting string Formatting character vectors or string can be done with case type, options to remove special characters and selecting only english characters and numbers from the string. Below are the available `case` conversions, `lower`: converts string to lower case. `upper`: converts string to upper case. `title`: converts string to title case (first letter of each word is capitalized except stop words. Based on `tools::toTitleCase`). `start`: converts string to start case (first letter of each word is capitalized and rest of the letters are in lower case). `initcap`: converts string to initcap case (first letter of first word is capitalized and rest of the letters are in lower case). ```{r echo = TRUE} nstring(' All MOdels are wrong. some ARE useful!!! â', case = 'title', remove.specials = TRUE) ``` To exclude any special characters and retain only numbers and english alphabets, we can set `en.only` parameter to `TRUE` ```{r echo = TRUE} nstring(' All MOdels are wrong. some ARE useful!!! â ', case = 'title', remove.specials = TRUE, en.only = TRUE) ``` By default, Trailing and leading white spaces are removed and extra white spaces are reduced to single white space.