Legos

Author

Termeh Shafie
(adapted from original script by Mine Çetinkaya-Rundel)

Here, we work with (simulated) data from Lego sales in 2018 for a sample of customers who bought Legos in the US.

Data and Packages

We’ll use the tidyverse package for much of the data wrangling and visualisation and the data is given to import.

library(tidyverse)

The following variables are available in the data set:

Answer the following questions using pipelines.

What are the three most common first names of purchasers?
What are the three most common themes of Lego sets purchased?
Among the most common theme of Lego sets purchased, what is the most common subtheme?
Create a new variable called age_group and group the ages into the following categories: “18 and under”, “19 - 25”, “26 - 35”, “36 - 50”, “51 and over”. Hint: Use the case_when() function.
Which age group has purchased the highest number of Lego sets. Hint: You will need to consider quantity of purchases.
Which age group has spent the most money on Legos? Hint: You will need to consider quantity of purchases as well as price of lego sets.
Which Lego theme has made the most money for Lego?

Hint: The str_sub() function will be helpful here!

Which area code has spent the most money on Legos? In the US the area code is the first 3 digits of a phone number.
Come up with a question you want to answer using these data, and write it down. Then, create a data visualization that answers the question, and explain how your visualization answers the question.