Northeastern University Data Analysis Technique Assignment

Northeastern University Data Analysis Technique Assignment

(1) Email-essay scenario: The data science team you work on (at Salesforce, Netflix, or of your own imagining) is interested in using a data analysis technique new to the team/company. You were tasked with gaining a general overview of the technique and writing a medium length email (~3 paragraphs) to summarize your findings. Include:

  • a description of the technique for a technical audience, you want to give them an introduction to the method that will be a jumping off point for them learning more detail and discussing the method
  • explain the value of the method
  • the types of data it can be used for
  • the method’s limitations

Choosing a technique: Choose a data analysis technique you are interested in learning more about, based on your analytics skill level. For example, if you’ve never done a regression, pick a regression or correlation or something simpler. If you have more analytics experience, use this as an opportunity to learn more about a method you’re interested in or dig deeper into a method from another class. You are welcome to describe how this analysis method could be used for the data / business question from your case study.

Some ideas: regression, cross-validation, filtering (signal processing), deep learning, principal components analysis (PCA), random forests, randomization techniques

Requirements:

  • Use in line citations where appropriate
  • Include a reference list/bibliography
  • Minimum 300 words

(4) Short Answer

Prompt:

Answer the following questions based on this week’s reading:

  1. What is the purpose of statistical diagnostics? (refer to Bartlett Chapter 8)
  2. What considerations need to be made when choosing training data for machine learning algorithms? (refer to machine learning reading)
  3. What is the purpose of cross-validation? (refer to cross validation videos)

Requirements

  • Use in line citations where appropriate
  • Include a reference list/bibliography
  • Minimum 300 words (total)

Required reading: https://www.technologyreview.com/s/608248/biased-algorithms-are-everywhere-and-no-one-seems-to-care/

https://www.mckinsey.com/business-functions/risk/our-insights/controlling-machine-learning-algorithms-and-their-biases

https://www.nature.com/articles/d41586-018-05707-8