- Understanding the Grammar of Graphics
- Terminology for Interactive Graphics
- Interactive Graphic Examples
- Leading Platforms and Packages
- Future Research

Matthew Sigal (msigal@yorku.ca)

York University

- Understanding the Grammar of Graphics
- Terminology for Interactive Graphics
- Interactive Graphic Examples
- Leading Platforms and Packages
- Future Research

The grammar of graphics takes us beyond a

- Chart typologies versus general graphics in graphical software
- Software will necessarily offer
**fewer charts**than people want - Software will lack deep
**structure**, and so be inefficient

- Software will necessarily offer
- Wilkinson's monograph aimed to replace chart typographies with an overarching language
- However, the
*Grammar of Graphics*was primarily a theoretical treatise - It was later implemented by Wilkinson in both the proprietary
Graphics Production Language ofSPSS andnViZn , the backbone of theSPSS Visualization Designer application

The grammar is broken up into three components:

**Specification:**Translating what we expect to happen into a formal language**Assembly:**The coordination of the specified attributes.**Display:**The actual rendering of the graphic onto a display system

**Assembly** and **Display** are typically products of the software and hardware we use, so Wilkinson's
primary emphasis is on **Specification**.

**Algebra:**Operations that combine variables and specify graph dimensionality**Scales:**Represent variables on measured dimensions**Statistics:**Functions that allow graphs to change appearance and representation schemes**Geometry:**Creation of geometric objects from variables**Coordinates:**Coordinate systems (from polar to complex map projections)**Aesthetics:**Sensory attributes used to represent graphics**Facets and Guides:**Allows for coordination between graphs and tables, and annotations

**Important notes:**

- Difference between "data", "varset", "graph", and "graphic"
- One way process (with iteration)
- Order is important!

The first step is to extract data into variables.

- The variable mapping function returns a single value in the range for every index.
- Data can be broadly defined:
- a relational database
- indexing a stream of words
- a picture
- can be the product of bootstrapping, or even metadata.

- Can apply variable transformations (mathematical, statistical, multivariate)
- Output of this stage is a
`varset`

We then can apply various algebraic techniques to the varset, which will define the structure (or frame) of our plot.

**Three primary operators**:

- Cross (*): crosses all values of X with all values of Y, and a result exists for every case.
- e.g., a two-dimensional scatterplot depicting city population for 2000 and 2010

- Nest (/): nests all values of X in all of values of Z, results only exist for particular combinations.
- e.g., facet by group variable; city/country produces separate plots for USA and Canada

- Blend (+): Combines all values of X with all values of Y on the same dimension
- e.g., plot the combined population for cities in 2000 and 2010

These are functions that are used to map varsets to dimensions (size, shape, and location).

- For example, with categorical data, we could do this based upon natural (alphabetic) order, relative frequency, or even length of string
- General scale types:
- Categorical
- Linear
- Time
- "One-Bend" (e.g., logarthmic, power)
- "Two-Bend" (e.g., arcsine, logit/probit, probability)

Statistical operations can be employed to reduce the number of rows in the varset.

- These are methods that can alter the positions of the geometric plot symbols.
- Five primary methods:
**Bin**(rect/tri/hex/quantile/boundary/voronoi/dot/stem)**Summary**(count/proportion/sum/mean/median/mode/sd/se/range/leaf)**Region**(spread/confi)**Smooth**(linear/quadratic/cubic/log/spline/density)**Link**(join/sequence/mst/hull/tsp/complete/neighbor)

These functions create graph objects that can be represented by magnitudes in a space.

- These are not actually visible (as they don't yet have aesthetic attributes)!
- Functions: point/line/area/interval/path/schema
- Partitions: polygon/contour
- Networks: edges

- Geometric objects can impose collision modifers to avoid overlap (e.g., jitter)

Our next step is to choose and apply a coordinates system.

- These are sets that locate points in space, and are amenable to transformation.
- Planar transformations:
- Isometry (reflect, rotate, translate) and Similarity (dilate)
- Affine (shear, stretch), Projective (project), and Conformal

- Projections on a plane:
- Perspective projections; Triangular coordinates; Map projections

- 3D and high dimensional coordinate systems
- For example, a pie chart is simply a stacked bar chart plotted in polar coordinates, with bar height mapped to the angle of the slice.

Form | Surface | Motion | Sound | Text |
---|---|---|---|---|

Position | Color | Direction | Tone | Label |

Size | * Hue | Speed | Volume | |

Shape | * Brightness | Acceleration | Rhythm | |

* Polygon | * Saturation | Voice | ||

* Glyph | Texure | |||

* Image | * Pattern | |||

Rotation | * Granularity | |||

Resolution | * Orientation | |||

Blur | ||||

Transparency |

(Wilksinson, 2005, p. 274)

In GPL, any statistical graphic can be expressed in terms of six statements:

**DATA:**These expressions involve the creation of variables from datasets**TRANS:**Apply variable transformations (for instance, rank)**ELEMENT:**Define graphs (e.g., points) and their aesthetic attributes (e.g., color)**SCALE:**Apple scale transformations (for instance, log)**GUIDE:**Define guides to aid interpretation (e.g. axes, legends, et cetera)**COORD:**Define the coordinate system (e.g., Cartesian, polar)

DATA: y = "SepalWidth"

DATA: z = "species"

TRANS: x = x

TRANS: y = y

ELEMENT: point(position(x*y), color(z))

COORD: rect(dim(1,2))

SCALE: linear(dim(1))

SCALE: linear(dim(2))

GUIDE: axis(dim(1), label("Sepal Length"))

GUIDE: axis(dim(2), label("Sepal Width"))

However, as most of these actions would be the default of a well-organized graphical system, only the ELEMENT statement is truly necessary.

*component*, and those components can be *combined* in many different ways to produce a huge variety of plots.

-Murrell, 2011

`gg`

in `ggplot2`

:General Principals for `ggplot2`

:

- Define the data you want to plot and create a plot template with
`ggplot()`

- Specify the aesthetics of the shapes that will be used to represet the data with
`aes()`

- Specify the graphical shapes (
`geoms`

) that will be used to view the data- Add them with the appropriate function; e.g.
`geom_point()`

or`geom_line()`

- Add them with the appropriate function; e.g.
- Call the object to render and view it

Components of a `ggplot2`

object:

- One or more
`Layers`

consisting of:`Data`

: What we want to see!`Mapping`

: Defines the aesthetics of the graphic`Stat`

: Statistical transformations of the data (e.g., binning or averaging)`Geom`

: Geometric objects that are drawn to represent the data (simple or complex)`Position`

: Position adjustments for each geom (e.g., jitter, dodge, stack)

`Scale`

: Controls mapping between data and aesthetics (variable or constant; colour/position)`Themes`

: Relatively new`ggplot2`

feature that allows for visual adjustments of a plot object`Coord`

: The coordinate system (provides axes and gridlines)`Facet`

: Allows us to break up the data into subsets

`GPL`

and `ggplot2`

(Based upon Wickham, 2010)

Building the grouped scatterplot:

```
library(ggplot2)
dat <- iris
p1 <- ggplot(data = dat,
aes(x = Sepal.Length, y = Sepal.Width, colour = Species)) +
geom_point() +
theme_bw()
p1
```

Is this a perfect implementaion of the Grammar of Graphics?

- In the theoretical grammar, we have data, a mapping of data to graph, and the graph itself
- In
`ggplot2`

, we have to deal with:- Constructing the data into an interpretable format (a
`data.frame`

object) - The R syntax for
`ggplot2`

- The underlying
`ggplot2`

object - The generated graph, which is only constructed when
`print()`

or`ggsave()`

is called

- Constructing the data into an interpretable format (a

Is this problematic?

- No!
- The
`ggplot2`

maintains the core beliefs of the system- The object created has a hierarchical structure, even if it is not immediately apparent
- Look at
`str(plot)`

- The
`+`

operator allows us to make changes to the general plot object - Additional geom calls add layers that allow us to build up a graphic
- The "pipeline" is less restrictive (e.g., we can apply aesthetics before coordinates)
- While not as flexible as Wilkinson's theoretical framework, substantially more practical.

Dynamic, interactive visualizations...

to explore the data for themselves.

-Murray, 2013, p. 2

Overview first, zoom and filter, then details-on-demand.

-Shneiderman, 1996, p. 337

(the "Visual Information Seeking Mantra")

- It is now easier than ever to provide online supplements for research
- Justifiability: The reader can "see it for themselves"
- Discoverability: Allows access to views and projections of the data that were previously hard to conceptualize
- Narrative: We can curate a virtual experience to convey a meaningful story, especially to an audience that we might not have been able to reach otherwise

**Selection**– the ability of users to dynamically subset the data**Slice**– live faceting of the data**Probing**– the generation of "ToolTips" for particular data points**Panning and Zooming**– allows for the traversal of complex datasets**Drill Down**– allows for the navigation of a categorical hierarchy**Modification**– provide sandbox for users to explore

The most basic interactions allow the user to dynamically alter the parameters of a plot. This feature is already built into RStudio with the `manipulate`

package.

For example, the following code allows users to dynamically alter:

- The x-axis limits
- Chart type
- Axes and axes labelling

```
manipulate(plot(iris$Sepal.Length, iris$Sepal.Width,
xlim = c(x.min, x.max), type = type, axes = axes, ann = label),
x.min = slider(0, 10, initial = 4),
x.max = slider(0, 10, initial = 8),
type = picker("points" = "p", "line" = "l", "none" = "n",
initial = "points"),
axes = checkbox(TRUE, "Draw Axes"),
label = checkbox(TRUE, "Draw Axes Labels"))
```

`manipulate`

package`manipulate`

packageWhile easy to use, unfortauntely, `manipulate`

has some draw-backs:

- It is not designed for presentation, and is only available from within the GUI
- Interactive graphics cannot be distributed outside of an R code snippet
- Limited number of parameters are manipulatable
- Limited number of choices in terms of how to manipulate parameters

`clickme`

ScatterplotA more interesting example comes courtesy of Nacho Cabellero's `clickme`

package.

The goals of this package are to:

- Create easily sharable dynamic plots
- Allow for different types of plots via templates (presently only points are supported)
- Incorporate optional parameters to change how the visualization behaves

For example, here are the results of conducting multidimensional scaling on the `iris`

dataset:

Interactivity can also be harnessed for pedagogical purposes. For instance, while teaching introductory statistics, we might want to visually demonstrate how skewness and kurtosis affect a distribution.

We can do this live via the `shiny`

package, which allows us to create a web application framework for R with "reactive bindings".

(This is an approximation based upon the sinh-arcsinh transformation; Jones & Pewsey, 2009)

- I am a big fan of
`R`

(if it should be done, there probably is a package) - However, there are a wide variety of other languages to choose from when thinking about implementing interactive graphics.
- Many are based upon JavaScript and are designed to be stand-alone web applications.

`D3.js`

(Data-Driven Documents)- Based upon
`Protovis`

from the Stanford Visualization Group - Popular for constructing interactive networks and maps
- Combination of HTML, JavaScript, CSS, and D3
- Output is rendered in SVG (lossless)
- Extremely versatile, but applications are typically directed

`D3.js`

(Data-Driven Documents)`sigma.js`

: A library for interactive networks`Polymaps`

: US Unemployment 2009 Example`rCharts`

: `Highcharts.js`

`rCharts`

: `NVD3.js`

- Local application written in Java, offers linked plots, some interactions, and queries.

- Small Java applet for visualizing probabilistic concepts.

- Interactive applets for business analytics.
- Latest version features R integration!

**Processing** and **Processing.js**:

- Language designed for interactive graphics.

- Public hosting of data and graphic templates, sponsored by IBM.

**R Packages**

`Acinonyx`

, aka "iPlots eXtreme" - designed for large data (development limbo?)`install.packages("Acinonyx","http://rforge.net")`

`animint`

has a similar feature set to`clickme`

, but targeted specifically for`ggplot2`

graphics.`require(devtools)`

and then`install_github("animint","tdhock")`

`animation`

(self-explanatory; no interactivity)`cranvas`

reimplements`GGobi`

(parallel coordinate plots; limited interactivity)`d3network`

allows for the creation of D3-based force direction graphs in R`require(devtools); install_github("d3Network", "christophergandrud")`

`gridSVG`

creates interactive`ggplot2`

+`D3`

objects`install.packages("gridSVG", repos="http://R-Forge.R-project.org")`

- One of Hadley Wickham's latest projects is reimplementing the grammar of graphics with interactive applications on the web in mind. This project, called
`ggvis`

is still in development, but already has a plethora of examples. - Refines the grammar of graphics (and one day might replace
`ggplot2`

) - In
`ggplot2`

, geom is kind of abstract (e.g.,`geom_histogram()`

combines`geom_bar()`

and`stat_bin()`

). In`ggvis`

, pure geoms are called "marks", and combined geoms and stats are referred to as "branches". - No
`qplot()`

(or overloaded +)! - Rendered plots can be drawn on the canvas or as SVG
- Interactivity! e.g.
`mark_symbol(props(size = input_slider(100, 1000))`

```
# Installation:
library(devtools)
install_github(c("assertthat", "testthat"))
install_github(c("httpuv", "shiny", "ggvis"), "rstudio")
```

- Build model specific but data generic applications available for distribution
- Encourage researchers to provide online resources for readers to interact with
- In particular, plan to code specific applications to aid in outlier detection and influence diagnostics for structucal equation models.

- Many JavaScript libraries are available to create interactive graphics, however they do not follow the tenets of the grammar of graphics, and are often a challenge to adapt to new datasets.
`rCharts`

leverages this somewhat by allow us to utilize JS libraries from within`R`

.

`ggvis`

and`ggplot2`

are both attempts at implementing Wilkinson's Grammar of Graphics in`R`

.- The later package features a core restructuring that actually brings it closer to the ideals of the underlying grammar by removing and elaborating upon some of the ambiguties of
`ggplot2`

. - It also demonstrates how interaction can be embedded within that grammar.
- This can basically be thought of as a loop, where parameters are able to be changed on the fly, which produces an instantaneous rerendering of the plot.
- Just as entire plots can be thought of as embedded geoms (see
`ggsubplot`

), we can think of an interactive plot as a Grammar of Graphics pipeline that is continuously rendering.

- There are many potential applications for these new technologies ready to be adapted for analysis and implementation.

**Matthew J. Sigal, MA**

Department of Psychology

262 Behavioural Science Building

York University, 4700 Keele St.

Toronto, ON, Canada M3J 1P3

(416) 736-2100 x66163

matthewsigal@gmail.com / msigal@yorku.ca

http://www.matthewsigal.com

http://www.dfconsulting.org

- Hadley Wickham's
`ggplot2`

and`ggvis`

- Nacho Caballero's
`clickme`

`require(devtools); install_github("clickme", "nachocab")`

- RStudio's
`manipulate`

and`shiny`

Ramnath Vaidyanathan's

`rCharts`

`require(devtools); install_github('rCharts', 'ramnathv')`

Slides made with RStudio via Ramnath Vaidyanathan's

`slidify`

`require(devtools); install_github('slidify', 'ramnathv');`

`require(devtools); install_github('slidifyLibraries', 'ramnathv')`

- Cleveland, W. S. (1994).
*The elements of graphing data*. Hobart Press. - Few, S. (2009).
*Now you see it*. Analytics Press. - Fry, B. (2007). Visualizing data: Exploring... data with the Processing environment. O’Reilly.
- Jones, M. C. & Pewsey A. (2009). Sinh-arcsinh distributions.
*Biometrika*,*96*, 761–780. - Murray, S. (2013).
*Interactive data visualization for the web*. O’Reilly. - Murrell, P. (2011).
*R graphics*. Chapman and Hall/CRC, 2nd edition. - Shneiderman, B. (1996). The eyes have it.
*Proc. IEEE Visual Languages*, 336-343. - Theus, M. & Urbanek, S. (2009).
*Interactive graphics for data analysis*. Taylor & Francis. - Wickham, H. (2009).
*ggplot2: Elegant graphics for data analysis*. Springer, 2nd edition. - Wilkinson, L. (2005).
*The grammar of graphics*. Springer, 2nd edition. - Yau, N. (2011).
*Visualize this*. Wiley.

Slides available at: http://mattsigal.github.io/InteractiveGraphics/