datasauRus installedlibrary(datasauRus)
there is no package called ‘datasauRus’ appears, it means that the package needs to be installed. Use this:install.packages("datasauRus")
Since we are dealing with a tibble, we can just type
datasaurus_dozen
only the first 10 rows are displayed.
| dataset | x | y |
|---|---|---|
| dino | 55.3846 | 97.1795 |
| dino | 51.5385 | 96.0256 |
| dino | 46.1538 | 94.4872 |
| dino | 42.8205 | 91.4103 |
| dino | 40.7692 | 88.3333 |
| dino | 38.7179 | 84.8718 |
| dino | 35.6410 | 79.8718 |
| dino | 33.0769 | 77.5641 |
| dino | 28.9744 | 74.4872 |
| dino | 26.1538 | 71.4103 |
# dim() returns the dimensions of the data frame, i.e number of rows and columns
dim(datasaurus_dozen)
## [1] 1846 3
# ncol() only number of columns
ncol(datasaurus_dozen)
## [1] 3
# nrow() only number of rows
nrow(datasaurus_dozen)
## [1] 1846
datasaurus_dozen to the datasaurus_dozen object. This aims at populating the Global Environmentdatasaurus_dozen <- datasaurus_dozen
unique(datasaurus_dozen$dataset) %>% length()
## [1] 13
x & y column. For this, you need to group_by() the appropriate column and then summarise()summarise() you can define as many new columns as you wish. No need to call it for every single variable.
datasaurus_dozen %>%
group_by(dataset) %>%
summarise(mean_x = mean(x),
mean_y = mean(y))
| dataset | mean_x | mean_y |
|---|---|---|
| away | 54.26610 | 47.83472 |
| bullseye | 54.26873 | 47.83082 |
| circle | 54.26732 | 47.83772 |
| dino | 54.26327 | 47.83225 |
| dots | 54.26030 | 47.83983 |
| h_lines | 54.26144 | 47.83025 |
| high_lines | 54.26881 | 47.83545 |
| slant_down | 54.26785 | 47.83590 |
| slant_up | 54.26588 | 47.83150 |
| star | 54.26734 | 47.83955 |
| v_lines | 54.26993 | 47.83699 |
| wide_lines | 54.26692 | 47.83160 |
| x_shape | 54.26015 | 47.83972 |
x & y column in a same waydatasaurus_dozen %>%
group_by(dataset) %>%
summarise(sd_x = sd(x),
sd_y = sd(y))
| dataset | sd_x | sd_y |
|---|---|---|
| away | 16.76983 | 26.93974 |
| bullseye | 16.76924 | 26.93573 |
| circle | 16.76001 | 26.93004 |
| dino | 16.76514 | 26.93540 |
| dots | 16.76774 | 26.93019 |
| h_lines | 16.76590 | 26.93988 |
| high_lines | 16.76670 | 26.94000 |
| slant_down | 16.76676 | 26.93610 |
| slant_up | 16.76885 | 26.93861 |
| star | 16.76896 | 26.93027 |
| v_lines | 16.76996 | 26.93768 |
| wide_lines | 16.77000 | 26.93790 |
| x_shape | 16.76996 | 26.93000 |
summarise_if so we exclude the dataset column and compute the othersdatasaurus_dozen %>%
group_by(dataset) %>%
summarise_if(is.double, funs(mean = mean, sd = sd))
| dataset | x_mean | y_mean | x_sd | y_sd |
|---|---|---|---|---|
| away | 54.26610 | 47.83472 | 16.76983 | 26.93974 |
| bullseye | 54.26873 | 47.83082 | 16.76924 | 26.93573 |
| circle | 54.26732 | 47.83772 | 16.76001 | 26.93004 |
| dino | 54.26327 | 47.83225 | 16.76514 | 26.93540 |
| dots | 54.26030 | 47.83983 | 16.76774 | 26.93019 |
| h_lines | 54.26144 | 47.83025 | 16.76590 | 26.93988 |
| high_lines | 54.26881 | 47.83545 | 16.76670 | 26.94000 |
| slant_down | 54.26785 | 47.83590 | 16.76676 | 26.93610 |
| slant_up | 54.26588 | 47.83150 | 16.76885 | 26.93861 |
| star | 54.26734 | 47.83955 | 16.76896 | 26.93027 |
| v_lines | 54.26993 | 47.83699 | 16.76996 | 26.93768 |
| wide_lines | 54.26692 | 47.83160 | 16.77000 | 26.93790 |
| x_shape | 54.26015 | 47.83972 | 16.76996 | 26.93000 |
datasaurus_dozen with ggplot such the aesthetics are aes(x = x, y = y) with the geometry geom_point()ggplot() and geom_point() functions must be linked with a + sign
ggplot(datasaurus_dozen, aes(x = x, y = y)) +
geom_point()
dataset columnggplot(datasaurus_dozen, aes(x = x, y = y, colour = dataset)) +
geom_point()
dataset per facetggplot(datasaurus_dozen, aes(x = x, y = y, colour = dataset)) +
geom_point() +
facet_wrap(~ dataset, ncol = 3)
theme_void and remove the legendggplot(datasaurus_dozen, aes(x = x, y = y, colour = dataset)) +
geom_point() +
theme_void() +
theme(legend.position = "none") +
facet_wrap(~ dataset, ncol = 3)
gganimate package, its dependencies will be automatically installed.install.packages("gganimate")
dataset variable to the frame argument in the aes() function calllibrary(gganimate)
p <- ggplot(datasaurus_dozen, aes(x = x, y = y, frame = dataset)) +
geom_point() +
theme_gray(20) +
theme(legend.position = "none")
gganimate(p, title_frame = TRUE, "./img/dino.gif")
## Executing:
## convert -loop 0 -delay 100 Rplot1.png Rplot2.png Rplot3.png
## Rplot4.png Rplot5.png Rplot6.png Rplot7.png Rplot8.png
## Rplot9.png Rplot10.png Rplot11.png Rplot12.png Rplot13.png
## 'dino.gif'
## Output at: dino.gif
never trust summary statistics alone; always visualize your data | Alberto Cairo
Authors
from this post