Data Visualization with altair Package in Python

Altair is a declarative statistical visualization library for python.

For more details on the library read here

To install the package use the following code:

Code
pip install altair

You only need to install a package once but you need to reload it every time you start a new session

Altair data management

When specifying data in Altair we use pandas Data Frame objects.

Use of pandas Data Frame will prompt altair to store entire data set in JSON format.

The syntax for altair is equivalent to the ggplot function in R.

The ggplot() function one provides the following:

  • data
  • geom function
  • collection of mapping

The altair syntax is given as follows:

Code
(alt.Chart(<data>)
    .encode(<ENCODINGS>)
    .mark_*())

Aesthetic mappings

An encoding is a visual property of the objects in your plot and they include things like:

  • size
  • shape
  • color of your points

Basic Scatter Plot

We demonstrate how to use the altair library we load the faithful data set from this repository

Code
import pandas as pd

import altair as alt

url = "https://github.com/byuidatascience/data4python4ds/raw/master/data-raw/faithful/faithful.csv"

faithful = pd.read_csv(url)

faithful.head()
eruptions waiting
0 3.600 79
1 1.800 54
2 3.333 74
3 2.283 62
4 4.533 85

A basic scatter plot is given in Figure 1:

Code
scatter = (alt.Chart(faithful)
               .encode(
                 x = "waiting",
                 y = "eruptions")
                 .mark_circle())
scatter
Figure 1: A scatter plot for eruptions versus waiting time

Figure 1 shows a positive and a linear relationship between eruptions and waiting time of the geysers.

We demonstrate the usability of aesthetics in altair library using the mpg dataset.

Code
url = "https://github.com/byuidatascience/data4python4ds/raw/master/data-raw/mpg/mpg.csv"
mpg = pd.read_csv(url)

We graph displ versus hwy in the mpg dataset and then color by the class of the car see Figure 2.

Code
scat_mpg = (alt.Chart(mpg)
                .encode(
                  x = "displ",
                  y = "hwy",
                  color = "class")
                  .mark_circle())
scat_mpg
Figure 2: Scatter plot with additional aesthetic

One can configure the encoding properties of your mark manually as shown in the code below:

Code
scat_mpgb = (alt.Chart(mpg)
                .encode(
                  x = "displ",
                  y = "hwy",
                  color = alt.value("blue"))
                  .mark_circle())
scat_mpgb
Figure 3: Scatter plot with manual encoding

A way to add additional variables is with facets. This works especially for categorical variables.

Code
scat_facet = (alt.Chart(mpg)
                .encode(
                  x = "displ",
                  y = "hwy")
                  .mark_circle()
                  .facet(
                    facet = "class",
                    columns = 3))
scat_facet
Figure 4: Scatter plot with faceting

One can do a scatter plot with a super-imposed line of fit.

Code
base_chart = (alt.Chart(mpg)
  .encode(
    x = "displ",
    y = "hwy"
    ))

chart_line = base_chart.mark_circle() + base_chart.transform_loess("displ", "hwy").mark_line()
chart_line
Figure 5: Scatter plot with a superimposed line

Note: A mark is a geometrical object that a plot uses to represent data.

A sample of line graphis given in

Code
chart_line = (alt.Chart(mpg)
                 .transform_loess("displ", "hwy", groupby =                          ["drv"])
                 .encode(
                   x = "displ",
                   y = "hwy",
                   strokeDash = "drv"
                )
                .mark_line())
chart_line

An alternative to the altair library in Python is the plotnine package in Python

We briefly give an illustration of the plotnine library. For more details read this documentation

A Grammar of Graphics for Python

plotnine is an implementation of a grammar of graphics in Python that is based on ggplot2.

The grammar allows you to compose plots by explicitly mapping variables in a dataframe to the visual objects that make up the plot.

To install plotnine run the following code:

Code
!pip install plotnine 
Code
from plotnine import ggplot, geom_point, aes, stat_smooth, facet_wrap, theme_minimal
from plotnine.data import mtcars

The code to do the basic plot is:

Code
(
    ggplot(mtcars, aes("wt", "mpg", color="factor(gear)"))
    + geom_point()
    + stat_smooth(method="lm")
    + facet_wrap("gear")
    + theme_minimal()
)