library(tidyverse)
Legos
Here, we work with (simulated) data from Lego sales in 2018 for a sample of customers who bought Legos in the US.
Data and Packages
We’ll use the tidyverse package for much of the data wrangling and visualisation and the data is given to import.
The following variables are available in the data set:
first_name
: First name of customerlast_name
: Last name of customerage
: Age of customerphone_number
: Phone number of customerset_id
: Set ID of lego set purchasednumber
: Item number of lego set purchasedtheme
: Theme of lego set purchasedsubtheme
: Sub theme of lego set purchasedyear
: Year of purchasename
: Name of lego set purchasedpieces
: Number of pieces of legos in set purchasedus_price
: Price of set purchase in US Dollarsimage_url
: Image URL of lego set purchasedquantity
: Quantity of lego set(s) purchased
Exercises
Answer the following questions using pipelines.
What are the three most common first names of purchasers?
What are the three most common themes of Lego sets purchased?
Among the most common theme of Lego sets purchased, what is the most common subtheme?
Create a new variable called
age_group
and group the ages into the following categories: “18 and under”, “19 - 25”, “26 - 35”, “36 - 50”, “51 and over”. Hint: Use thecase_when()
function.Which age group has purchased the highest number of Lego sets. Hint: You will need to consider quantity of purchases.
Which age group has spent the most money on Legos? Hint: You will need to consider quantity of purchases as well as price of lego sets.
Which Lego theme has made the most money for Lego? Hint: The
str_sub()
function will be helpful here!Which area code has spent the most money on Legos? In the US the area code is the first 3 digits of a phone number.
Come up with a question you want to answer using these data, and write it down. Then, create a data visualization that answers the question, and explain how your visualization answers the question.