Data Analyst Assignment – UK

What we’re looking for (and what we’re not looking for)
We’re really interested in the process you take to tackle this task and how well you can communicate your findings. We’re not interested in the findings themselves so please don’t worry if you’re not able to draw any concrete conclusions at the end of it. A finding that you need more data or that more investigation is required is perfectly valid. If you can explain what you’ve done and why you are or are not able to draw conclusions from your work that’s the important thing.
Data Analyst Assignment – UK

Data Analyst Assignment

Completing the task
When you’ve finished the task please upload any files that are needed to understand your final results or how you got there with the link you have been provided. We definitely want to see your working (e.g. SQL queries and data manipulation) as much as your final insights so if they’re in separate files send them all through. A zipped folder is ideal if possible.

We know everybody has different commitments and unforeseen things come up all the time so if you are unable to finish this at the expected time just let us know and we can make arrangements for you to submit at a later date.

Getting help
If you get stuck at any point whether in connecting to the database under standing what the data represents or interpreting our expectation please feel free to contact owen@dice.fm and we can answer any questions or jump on a call to troubleshoot or explain further.

The task
What we would like from you is to understand more about retention of DICE fans so we would like you to use the data provided (explained below) to answer the question “are some segments of DICE fans retained better than others?”

In answering that question, we’d like you to address these points:
● How did you define retention, and why did you make that choice?
● If you see differences in retention between groups what do you think might explain that? If you do not see any differences why do you think that is? You don’t need to prove any of these ideas with data.
● If you had more time and more data what other segments would you like to look into. What other data points might be required to do this?
You can take any approach you like to tackle the task and can use whatever technology suits you best to present your results (options include slide presentations R Mark down note books Jupyter notebooks or spreadsheets). Here are a few pointers though:
● We’re interested in your ability to work with SQL so even if it’s possible to extract all data from the database and manipulate it elsewhere we’d advise against doing so.
● Simple segmentation is enough – for example age groups cities or fans who did/did not perform a given action. There’s no need to train an ML model to segment users as you’ll only be working with limited data and we’re more interested in your ability to communicate insights than to use complex models.
● There’s no need to exhaustively test every possible segment. We’ve given you a few fields so that you can select what you think is likely to be most important. Don’t worry if the ones you investigate first don’t reveal any clear trends – we’ll be more interested in why you chose the approach you did, rather than the outcome.
● We’d love to see a few visualisations of the data to help get your points across so please do include a few charts that you think communicate the most important points effectively.

Data Analyst Assignment – UK

The data
The data available for this task all pertains to DICE fans who purchased their first ticket with us between (inclusive).

Data are stored in the analyst_task schema, which consists of the following tables:
● transactions gives data on the transaction that represents a fan’s first purchase on DICE.
Note that later transactions for the same users are not included. The columns in the table are:
○ user_id – a unique identifier string for the fan who made the purchase, and foreign key for the users table
○ event_id – a unique identifier string for the event for which tickets were purchased and foreign key for the events table
○ time_stamp – the timestamp of the purchase, in UTC
○ purchase_app_type – a string showing where the fan made the purchase, either on the DICE mobile_app or web_app
○ count_tickets – an integer giving the number of tickets purchased in the transaction
● users gives some basic information about the fans registered with DICE. The columns in the table are:
○ user_id – a unique identifier string for the fan
○ registration_date – the date on which the fan registered, in UTC
○ date_of_birth – the date of birth provided by the fan on signup, in UTC
● events gives some top-level information on the events (gigs) listed on DICE The columns in the table are:
○ event_id – a unique identifier string for the event
○ event_date – a timestamp giving the start time of the event in UTC
○ venue_city – a string showing the city in which the event is occurring
○ venue_country – a string showing the country in which the event is occurring
○ event_type – a string showing the top-level classification of the event into either “music” or “culture”
○ event_category – a string showing the second-level classification of the event
● event_genres gives a third level of classification (after event_type and event_category) above. The columns in this table are:
○ event_id – a unique identifier string for the event, and foreign key for the events table
○ genre – a string representing the genre (lowest level classification) of the event
● user_activity is a daily record of activity for each user in the users table from 1 January 2019 or the date they registered whichever is latest, to 31 December 2019. These are activities occurring on the date shown not a cumulative total. Users appear in this table every day after they registered regardless of whether any activity was recorded. The columns in this table are:
○ date – the date, in UTC
○ user_id – a unique identifier string for the fan, and foreign key for the users table
○ count_events_viewed – an integer giving the number of events (i.e. gigs) viewed on by the fan on DICE
○ count_events_saved – an integer giving the number of events (i.e. gigs) saved by the fan to their profile on DICE
○ count_tickets_purchased – an integer giving the number of tickets purchased on DICE
○ count_tickets_purchased_mobile – an integer giving the number of tickets purchased through the DICE mobile app only
○ count_tickets_purchased_web – an integer giving the number of tickets purchased through the DICE website only
○ count_mobile_sessions – an integer giving the number of sessions (i.e. any activity) recorded by the fan using the DICE mobile app. A session ends where there is 30 minutes or more without activity.
○ count_web_sessions – an integer giving the number of sessions (i.e. any activity) recorded by the fan using the DICE website. A session ends where there is 30 minutes or more without activity.
○ count_waitinglists_joined – an integer giving the number of events (gigs) for which the fan joined the waitinglist (i.e. where the event was sold out and they’ve requested tickets if anyone returns tickets previously purchased)

Data Analyst Assignment – UK

Data Analyst Assignment

Connecting to the database
The data for the task is in an Amazon Redshift database. You will be sent a username and password to access it in a separate email.

Data Analyst Assignment – UK

You can connect to the database via an SQL client of your choice (such as Postico [mac only] D Beaver or many others). The details required for connection are:
host: candidates .cmgjrptvm5nu.eu-west-1. redshift.amazonaws com
port: 5439
database: dev
user:
password:

You may also need:
Please include any SQL queries and csv files used to construct your analysis with your final submission You can also use these credentials to connect directly from R Mark down or Jupyter notebooks if you choose to use either of those and feel comfortable doing so but it is not a requirement to do so.