Successfully reported this slideshow.
Upcoming SlideShare
×

# Grammar of Graphics - Darya Vanichkina

Presented as part of ResBaz 2018 in Sydney.

• Full Name
Comment goes here.

Are you sure you want to Yes No

### Grammar of Graphics - Darya Vanichkina

1. 1. The Grammar of Graphics Darya Vanichkina
2. 2. … a personal commentary/interpretation of … http://vita.had.co.nz/papers/layered-grammar.html
3. 3. Grammar “the fundamental principles or rules of an art or science” (OED Online 1989)
4. 4. Simple case
5. 5. Simple case
6. 6. Components of the grammar Data Variables of interest Aesthetics x-axis y-axis colour ﬁll size labels alpha shape line width line type Geometries point line histogram bar boxplot Facets columns rows Statistics binning smoothing descriptive inferential Coordinates cartesian ﬁxed polar limits Themes Not data, but important for overall impact Adapted from DataCamp
7. 7. Exploring the diamonds data
8. 8. Relationship between table (width of top of diamond relative to widest point) and price? diamonds %>% ggplot(aes(x = table, y = price)) + geom_point()
9. 9. Relationship between carat (weight) and price? diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point()
10. 10. Relationship between carat (weight) and price? [How many diamonds?] diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point(alpha = 0.1)
11. 11. Relationship between carat (weight) and price? [Lots of small diamonds!] diamonds %>% ggplot(aes(x = carat, y = price)) + geom_hex()
12. 12. Components of the grammar Data Variables of interest Aesthetics x-axis y-axis colour ﬁll size labels alpha shape line width line type Geometries point line histogram bar boxplot Facets columns rows Statistics binning smoothing descriptive inferential Coordinates cartesian ﬁxed polar limits Themes Not data, but important for overall impact Adapted from DataCamp
13. 13. Relationship between clarity and price? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = clarity, y = price)) + geom_boxplot()
14. 14. How many diamonds with each clarity? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = clarity, fill = clarity)) + geom_bar()
15. 15. Relationship between cut and price? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()
16. 16. Relationship between cut and price? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()
17. 17. Relationship between cut and price? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = cut, y = price)) + geom_violin()
18. 18. Components of the grammar Data Variables of interest Aesthetics x-axis y-axis colour ﬁll size labels alpha shape line width line type Geometries point line histogram bar boxplot Facets columns rows Statistics binning smoothing descriptive inferential Coordinates cartesian ﬁxed polar limits Themes Not data, but important for overall impact Adapted from DataCamp
19. 19. Relationship between price and carat grouped by cut and clarity? diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point() + facet_grid(cut~clarity)
20. 20. Relationship between price and carat grouped by cut and clarity (with trend)? diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point() + facet_grid(cut~clarity) + geom_smooth(method = "lm")
21. 21. Relationship between table parameter (width of top of diamond relative to widest point, converted on the ﬂy into a factor) and price, faceted by cut? diamonds %>% mutate(TableIsSmall = factor(table < median(table)))%>% ggplot(aes(x = carat, y = price, colour = TableIsSmall)) + geom_point() + facet_grid(.~cut)
22. 22. Relationship between table parameter (width of top of diamond relative to widest point, converted on the ﬂy into a factor) and price, faceted by cut? diamonds %>% mutate(TableIsSmall = factor(table < median(table)))%>% ggplot(aes(x = carat, y = price, colour = TableIsSmall)) + geom_point() + facet_grid(.~cut) Most diamonds with a small table (TableIsBig == FALSE) have an “Ideal” cut…
23. 23. Components of the grammar Data Variables of interest Aesthetics x-axis y-axis colour ﬁll size labels alpha shape line width line type Geometries point line histogram bar boxplot Facets columns rows Statistics binning smoothing descriptive inferential Coordinates cartesian ﬁxed polar limits Themes Not data, but important for overall impact Adapted from DataCamp
24. 24. Relationship between cut and price? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()
25. 25. Relationship between cut and price? [log-scale] Worse ——————————————————> Better diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()+ scale_y_log10() Worse ——————————————————> Better diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()
26. 26. How many diamonds with each clarity? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = clarity, fill = clarity))+geom_bar()
27. 27. How many diamonds with each clarity? Worse ——————————————————> Better diamonds %>% ggplot(aes(x = clarity, fill = clarity))+geom_bar() diamonds %>% ggplot(aes(x = clarity, fill = clarity)) +geom_bar(width=1) + coord_polar()
28. 28. Components of the grammar Data Variables of interest Aesthetics x-axis y-axis colour ﬁll size labels alpha shape line width line type Geometries point line histogram bar boxplot Facets columns rows Statistics binning smoothing descriptive inferential Coordinates cartesian ﬁxed polar limits Themes Not data, but important for overall impact Adapted from DataCamp
29. 29. Finishing up with the diamonds data diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point() + facet_grid(cut~clarity, scales = "free_x") + geom_smooth(method = “lm") diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()+ scale_y_log10()
30. 30. Finishing up with the diamonds data diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point() + facet_grid(cut~clarity, scales = "free_x") + geom_smooth(method = "lm") + theme_bw() diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()+ scale_y_log10() + theme_minimal()
31. 31. Just changing the themediamonds %>% ggplot(aes(x = carat, y = price)) + geom_point(col = "red") + facet_grid(cut~clarity, scales = "free_x") + geom_smooth(method = "lm") + theme_black() diamonds %>% ggplot(aes(x = cut, y = price)) + geom_boxplot()+ scale_y_log10() + theme_dark()
32. 32. The Grammar of Graphics Data Variables of interest Aesthetics x-axis y-axis colour ﬁll size labels alpha shape line width line type Geometries point line histogram bar boxplot Facets columns rows Statistics binning smoothing descriptive inferential Coordinates cartesian ﬁxed polar limits Themes Not data, but important for overall impact
33. 33. Limitations
34. 34. Limitations https://www.andrewheiss.com/blog/2017/08/10/exploring-minards-1812-plot-with-ggplot2/
35. 35. The Grammar of Graphics Data Variables of interest Aesthetics x-axis y-axis colour ﬁll size labels alpha shape line width line type Geometries point line histogram bar boxplot Facets columns rows Statistics binning smoothing descriptive inferential Coordinates cartesian ﬁxed polar limits Themes Not data, but important for overall impact https://github.com/dvanic/18ResBazPresentation