In a previous post, I described how I was captivated by the virtual landscape imagined by the RStudio education team while looking for resources on the RStudio website. In this post, I’ll take a look atCheatsheets another amazing resource hiding in plain sight.
Apparently, some time ago when I wasn’t paying much attention, cheat sheets evolved from the home made study notes of students with highly refined visual cognitive skills, but a relatively poor grasp of algebra or history or whatever to an essential software learning tool. I don’t know how this happened in general, but master cheat sheet artist Garrett Grolemund has passed along some of the lore of the cheat sheet at RStudio. Garrett writes:
- R Style Guide - This resource is more than a cheat sheet. Google's internal R user community put together this guide for clean R code that covers syntax & conventions that are unique to R. I include it here because I've refered to it quite a bit in my own work. Your code will be easy to read & maintain if you follow these guidelines.
- Base R Cheat Sheet. Statistics Distributions as.logical TRUE, FALSE, TRUE Boolean values. Converting between common data types in R. Can always go from a higher value in the table to a lower value. a a 1 'apple' The Environment Variable Assignment ls List all variables in the.
Descriptive Statistics in R for Matrix Objects. A matrix may look like a data frame but is not. In a matrix object, data split into rows and columns though it is a single vector. With data frame, you can use $ to extract data but you cannot extract parts of a matrix using $. You can use the square brackets to retrieve information of any row.
One day I put two and two together and realized that our Winston Chang, who I had known for a couple of years, was the same “W Chang” that made the LaTex cheatsheet that I’d used throughout grad school. It inspired me to do something similarly useful, so I tried my hand at making a cheatsheet for Winston and Joe’s Shiny package. The Shiny cheatsheet ended up being the first of many. A funny thing about the first cheatsheet is that I was working next to Hadley at a co-working space when I made it. In the time it took me to put together the cheatsheet, he wrote the entire first version of the tidyr package from scratch.
It is now hard to imagine getting by without cheat sheets. It seems as if they are becoming expected adjunct to the documentation. But, as Garret explains in the README for the cheat sheets GitHub repository, they are not documentation!
RStudio cheat sheets are not meant to be text or documentation! They are scannable visual aids that use layout and visual mnemonics to help people zoom to the functions they need. … Cheat sheets fall squarely on the human-facing side of software design.
Cheat sheets live in the space where human factors engineering gets a boost from artistic design. If R packages were airplanes then pilots would want cheat sheets to help them master the controls.
The RStudio site contains sixteen RStudio produced cheat sheets and nearly forty contributed efforts, some of which are displayed in the graphic above. The Data Transformation cheat sheet is a classic example of a straightforward mnemonic tool.It is likely that even someone who just beginning to work with dplyr
will immediately grok that it organizes functions that manipulate tidy data. The cognitive load then is to remember how functions are grouped by task. The cheat sheet offers a canonical set of classes: “manipulate cases”, “manipulate variables” etc. to facilitate the process. Users that work with dplyr
on a regular basis will probably just need to glance at the cheat sheet after a relatively short time.
The Shiny cheat sheet is little more ambitious. It works on multiple levels and goes beyond categories to also suggest process and workflow.
The Apply functions cheat sheet takes on an even more difficult task. For most of us, internally visualizing multi-level data structures is difficult enough, imaging how data elements flow under transformations is a serious cognitive load. I for one, really appreciate the help.
Cheat sheets are immensely popular. And even in this ebook age where nearly everything you can look at is online, and conference attending digital natives travel light, the cheat sheets as artifacts retain considerable appeal. Not only are they useful tools and geek art (Take a look at cartography) for decorating a workplace, my guess is that they are perceived as runes of power enabling the cognoscenti to grasp essential knowledge and project it in the world.
R Studio Stats Cheat Sheet
When in-person conferences resume again, I fully expect the heavy paper copies to disappear soon after we put them out at the RStudio booth.
This cheat sheet gives an overview of the most common statistics functions for spreadsheets in English and Dutch (LibreOffice Calc, Excel), and for the R programming language.
x
denotes a cell range (spreadsheet) or list/array/table (R).
Function | R | Spreadsheet (EN) | Spreadsheet (NL) |
---|---|---|---|
Mean, average | mean(x) | =AVERAGE(x) | =GEMIDDELDE(x) |
Population variance | – | =VAR.P(x) | =VAR.P(x) |
Population standard deviation | – | =STDEV.P(x) | =STDEV.P(x) |
Sample variance | var(x) | =VAR(x) , =VAR.S(x) | =VAR(x) , =VAR.S(x) |
Sample standard deviation | sd(x) | =STDEV(x) , =STDEV.S(x) | =STDEV(x) , =STDEV.S(x) |
Median | median(x) | =MEDIAN(x) | =MEDIAAN(x) |
Minimum | min(x) | =MIN(x) | =MIN(x) |
Maximum | max(x) | =MAX(x) | =MAX(x) |
Quartile | – | =QUARTILE(x, type) † | =KWARTIEL(x, type) † |
Percentile | quantile(x, alphas) ‡ | =PERCENTILE(x, alpha) ‡ | =PERCENTIEL(x, alpha) ‡ |
† type
: 0 = min, 1 = 25% (1st quartile) , 2 = 50% (median), 3 = 75% (3rd quartile), 1 = max
‡ alpha
is a number in [0, 1] denoting the percentile rank (0 = minimum, .5 = median, 1 = max). In R, you can specify an array of the desired percentiles, e.g. quantile(x, c(0, .33, .67, 1))
.
x
denotes the cell range (spreadsheet) or list/array/table (R) containing values of the independent variable.y
denotes the cell range (spreadsheet) or list/array/table (R) containing values of the dependent variable.
Function | R | Spreadsheet (EN) | Spreadsheet (NL) |
---|---|---|---|
Pearson’s correlation coefficient (R) | cor(x, y) | =PEARSON(y, x) | =PEARSON(y, x) |
Determination coefficient (R²) | =RSQ(y, x) | =R.KWADRAAT(y, x) | |
Covariance | cov(x, y) | =COVAR(x, y) | COVARIANTIE.S(x, y) |
X
is a normally distributed stochastic variable with meanm
and standard deviations
, orX ~ Nor(m, s)
.x
is a number drawn fromX
.P(X < x)
is the probability that a number is drawn fromX
smaller thanx
(left tail probability)
Z
is the standard normal distribution, orZ ~ Nor(0, 1)
.z
is a number drawn fromZ
.P(Z < z)
is the probability that a number is drawn fromZ
smaller thanz
(left tail probability)
R Statistics Cheat Sheet
Function | R | Spreadsheet (EN) | Spreadsheet (NL) |
---|---|---|---|
z-transformation | z <- (x - m)/s | =STANDARDIZE(x, m, s) | =NORMALISEREN(x, m, s) |
P(Z < z) | pnorm(z) | =NORMSDIST(z) | =STAND.NORM.VERD(z) |
P(X < x) | pnorm(x, m, s) | =NORMDIST(x, m, s) | =NORM.VERD(x, m, s) |
z so P(Z < z) = p | qnorm(p) | =NORM.S.INV(p) | =NORM.S.INV(p) |
x so P(X < x) = p | qnorm(p, m, s) | =NORMINV(p, m, s) | =NORM.INV.N(p, m, s) |
R Statistics Cheat Sheet Pdf
- Van Der Elst, J. (2012). Statistiek met Excel. Derde druk. Uitgeverij De Boeck.