(1) Email-essay scenario: Your boss has tasked you with leading the data science team effort for this project. (Or, your team for the Netflix prize has put you in charge.) Last week, you worked on defining the project’s objectives and questions that need to be answered. Now it’s time to make a plan for the first two weeks of work, which will be focused on defining what data needs to be used to answer the business questions and reach the objectives, and gather that data. Your boss has asked you to send a proposed plan as an email, including:
- What datasets will be needed
- Why these datasets? How does the information that they contain inform the decision or answer business questions?
- Which datasets exist internally?
- If any datasets don’t already exist, specify how they will be collected.
* Use your knowledge of the cases / how businesses work to imagine what likely exists already internally at Salesforce and Netflix. This week’s video “Delivering High Quality Analytics at Netflix” will give you a sense of what sorts of data exists at Netflix, and help you imagine what data may exist at Salesforce.
- Minimum 300 words
- Minimum 2 references (can use book as a reference) with in line citations as appropriate
- Reference list
(3) Data description exercise
For one dataset specified in your email, write up a partial data encyclopedia and dictionary. Examples of one dataset:
- History of salaries and bonuses for each employee
- All customer ratings for each video
Include (see Bartlett 12.2 for more details):
- Purpose of Dataset
- Source of dataset
- Time window (that the dataset represents)
- Cost of data (to the company)
- Collection techniques (see also Bartlett Chapter 10)
- Collection tools
- For each column in the dataset
- Variable Classification (see Bartlett Table 12.1, p. 247)
For any details you cannot find in the cases or through research, make up a reasonable description.
Example for the exercise and last week’s work below: