datasauRus
installedlibrary(datasauRus)
there is no package called ‘datasauRus’
appears, it means that the package needs to be installed. Use this:install.packages("datasauRus")
Since we are dealing with a tibble
, we can just type
datasaurus_dozen
only the first 10 rows are displayed.
dataset | x | y |
---|---|---|
dino | 55.3846 | 97.1795 |
dino | 51.5385 | 96.0256 |
dino | 46.1538 | 94.4872 |
dino | 42.8205 | 91.4103 |
dino | 40.7692 | 88.3333 |
dino | 38.7179 | 84.8718 |
dino | 35.6410 | 79.8718 |
dino | 33.0769 | 77.5641 |
dino | 28.9744 | 74.4872 |
dino | 26.1538 | 71.4103 |
# dim() returns the dimensions of the data frame, i.e number of rows and columns
dim(datasaurus_dozen)
## [1] 1846 3
# ncol() only number of columns
ncol(datasaurus_dozen)
## [1] 3
# nrow() only number of rows
nrow(datasaurus_dozen)
## [1] 1846
datasaurus_dozen
to the datasaurus_dozen
object. This aims at populating the Global Environmentdatasaurus_dozen <- datasaurus_dozen
unique(datasaurus_dozen$dataset) %>% length()
## [1] 13
x
& y
column. For this, you need to group_by()
the appropriate column and then summarise()
summarise()
you can define as many new columns as you wish. No need to call it for every single variable.
datasaurus_dozen %>%
group_by(dataset) %>%
summarise(mean_x = mean(x),
mean_y = mean(y))
dataset | mean_x | mean_y |
---|---|---|
away | 54.26610 | 47.83472 |
bullseye | 54.26873 | 47.83082 |
circle | 54.26732 | 47.83772 |
dino | 54.26327 | 47.83225 |
dots | 54.26030 | 47.83983 |
h_lines | 54.26144 | 47.83025 |
high_lines | 54.26881 | 47.83545 |
slant_down | 54.26785 | 47.83590 |
slant_up | 54.26588 | 47.83150 |
star | 54.26734 | 47.83955 |
v_lines | 54.26993 | 47.83699 |
wide_lines | 54.26692 | 47.83160 |
x_shape | 54.26015 | 47.83972 |
x
& y
column in a same waydatasaurus_dozen %>%
group_by(dataset) %>%
summarise(sd_x = sd(x),
sd_y = sd(y))
dataset | sd_x | sd_y |
---|---|---|
away | 16.76983 | 26.93974 |
bullseye | 16.76924 | 26.93573 |
circle | 16.76001 | 26.93004 |
dino | 16.76514 | 26.93540 |
dots | 16.76774 | 26.93019 |
h_lines | 16.76590 | 26.93988 |
high_lines | 16.76670 | 26.94000 |
slant_down | 16.76676 | 26.93610 |
slant_up | 16.76885 | 26.93861 |
star | 16.76896 | 26.93027 |
v_lines | 16.76996 | 26.93768 |
wide_lines | 16.77000 | 26.93790 |
x_shape | 16.76996 | 26.93000 |
summarise_if
so we exclude the dataset
column and compute the othersdatasaurus_dozen %>%
group_by(dataset) %>%
summarise_if(is.double, funs(mean = mean, sd = sd))
dataset | x_mean | y_mean | x_sd | y_sd |
---|---|---|---|---|
away | 54.26610 | 47.83472 | 16.76983 | 26.93974 |
bullseye | 54.26873 | 47.83082 | 16.76924 | 26.93573 |
circle | 54.26732 | 47.83772 | 16.76001 | 26.93004 |
dino | 54.26327 | 47.83225 | 16.76514 | 26.93540 |
dots | 54.26030 | 47.83983 | 16.76774 | 26.93019 |
h_lines | 54.26144 | 47.83025 | 16.76590 | 26.93988 |
high_lines | 54.26881 | 47.83545 | 16.76670 | 26.94000 |
slant_down | 54.26785 | 47.83590 | 16.76676 | 26.93610 |
slant_up | 54.26588 | 47.83150 | 16.76885 | 26.93861 |
star | 54.26734 | 47.83955 | 16.76896 | 26.93027 |
v_lines | 54.26993 | 47.83699 | 16.76996 | 26.93768 |
wide_lines | 54.26692 | 47.83160 | 16.77000 | 26.93790 |
x_shape | 54.26015 | 47.83972 | 16.76996 | 26.93000 |
datasaurus_dozen
with ggplot
such the aesthetics are aes(x = x, y = y)
with the geometry geom_point()
ggplot()
and geom_point()
functions must be linked with a + sign
ggplot(datasaurus_dozen, aes(x = x, y = y)) +
geom_point()
dataset
columnggplot(datasaurus_dozen, aes(x = x, y = y, colour = dataset)) +
geom_point()
dataset
per facetggplot(datasaurus_dozen, aes(x = x, y = y, colour = dataset)) +
geom_point() +
facet_wrap(~ dataset, ncol = 3)
theme_void
and remove the legendggplot(datasaurus_dozen, aes(x = x, y = y, colour = dataset)) +
geom_point() +
theme_void() +
theme(legend.position = "none") +
facet_wrap(~ dataset, ncol = 3)
gganimate
package, its dependencies will be automatically installed.install.packages("gganimate")
dataset
variable to the frame
argument in the aes()
function calllibrary(gganimate)
p <- ggplot(datasaurus_dozen, aes(x = x, y = y, frame = dataset)) +
geom_point() +
theme_gray(20) +
theme(legend.position = "none")
gganimate(p, title_frame = TRUE, "./img/dino.gif")
## Executing:
## convert -loop 0 -delay 100 Rplot1.png Rplot2.png Rplot3.png
## Rplot4.png Rplot5.png Rplot6.png Rplot7.png Rplot8.png
## Rplot9.png Rplot10.png Rplot11.png Rplot12.png Rplot13.png
## 'dino.gif'
## Output at: dino.gif
never trust summary statistics alone; always visualize your data | Alberto Cairo
Authors
from this post