tutorials

A gramática dos gráficos: uma introdução usando ggplot2

author: Diogo Melo date: 2019/10/29 font-family: ‘Helvetica’ width: 1366 height: 768

Grammar of graphics

left: 35%


Elementos da gramática

ggplot2

left: 35%

“ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.”


plot of chunk intro

Data

Data - formato wide

head(iris, 10)
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa

Data - formato narrow

library(tidyr)
head(
  gather(iris, trait, value, Sepal.Length:Petal.Width), 
  10)
   Species        trait value
1   setosa Sepal.Length   5.1
2   setosa Sepal.Length   4.9
3   setosa Sepal.Length   4.7
4   setosa Sepal.Length   4.6
5   setosa Sepal.Length   5.0
6   setosa Sepal.Length   5.4
7   setosa Sepal.Length   4.6
8   setosa Sepal.Length   5.0
9   setosa Sepal.Length   4.4
10  setosa Sepal.Length   4.9

Conversão narrow/wide

Aesthetics

Geometries

ggplot mínimo

Regra geral:

ggplot(data_frame_entrada, aes(x = coluna_eixo_x, 
                               y = coluna_eixo_y,
                               group = coluna_agrupadora, 
                               color = coluna_das_cores)) 
+ geom_tipo_do_grafico(opcoes que não dependem dos dados, 
                       aes(opcoes que dependem))

Scatter plot

ggplot(data = iris, aes(Sepal.Length, Sepal.Width)) + geom_point()

plot of chunk unnamed-chunk-4

Scatter plot - Species mapeado nas cores

ggplot(data = iris, aes(Sepal.Length, Sepal.Width, 
                        color = Species)) + 
  geom_point()

plot of chunk unnamed-chunk-5

Histograma

ggplot(diamonds, aes(price)) + geom_histogram(bins = 500)

plot of chunk unnamed-chunk-6

Facets

Facets

ggplot(data = iris, aes(Sepal.Length, Sepal.Width)) + 
  geom_point(aes(color = Species)) + facet_wrap(~Species)

plot of chunk unnamed-chunk-7

Statistics

Statistics - regressão não-linear (loess)

ggplot(data = iris, aes(Sepal.Length, Sepal.Width, 
                        color = Species)) + 
  geom_point() + geom_smooth()

plot of chunk unnamed-chunk-8

Statistics - regressão linear (lm)

ggplot(data = iris, aes(Sepal.Length, Sepal.Width, 
                        color = Species)) + 
  geom_point() + geom_smooth(method = "lm")

plot of chunk unnamed-chunk-9

Statistics - boxplot + jitter

library(tidyr)
narrow_iris = pivot_longer(iris, -Species)
ggplot(narrow_iris, aes(Species, value)) + 
  geom_boxplot() + geom_jitter(width = 0.2, height = 0) + facet_wrap(~name, scales="free")

plot of chunk unnamed-chunk-10

Statistics - boxplot + dotplot

library(tidyr)
narrow_iris = pivot_longer(iris, -Species)
ggplot(narrow_iris, aes(Species, value)) + 
  geom_boxplot() + geom_dotplot(binaxis = 'y',
        dotsize = 0.5,
        stackdir = 'center') + facet_wrap(~name, scales="free")

plot of chunk unnamed-chunk-11

Coordinates

Theme

ggplot - objetos graficos

left: 60%

library(gapminder)
meu_grafico = ggplot(gapminder, aes(x = log(gdpPercap), y = log(lifeExp))) + geom_point(aes(color = continent))

meu_grafico

plot of chunk unnamed-chunk-13

ggplot - objetos graficos

left: 60%

Algumas opções comuns:

meu_grafico = meu_grafico + 
    labs(x = "GDP per capta", 
         y = "Expectativa de vida")

meu_grafico = meu_grafico + theme(text = element_text(size = 30), legend.title = element_text(face = "italic")) + scale_color_discrete(name = "Continente")

meu_grafico

plot of chunk unnamed-chunk-15

ggplot - temas

left: 30%

Temas prontos!

library(cowplot)
meu_grafico = meu_grafico + theme_cowplot()

meu_grafico

plot of chunk unnamed-chunk-17

ggplot - temas

Temas prontos!

library(ggthemes)
meu_grafico = ggplot(gapminder, 
 aes(x = log(gdpPercap), 
     y = log(lifeExp))) + 
  geom_point(size = 3, 
  aes(shape = continent)) + 
  theme_wsj()

meu_grafico

plot of chunk unnamed-chunk-19

ggplot - outros tipos de gráficos

Documentação