STAT 250 Confidence Intervals

Instructions on how to complete this lab is in the zip file and you can also see what requirements for this lab is like below. Thank you.

## Skills

* To recognize and pull out key information from a research scenario

* Given a study objective, determine whether significance testing is appropriate

* Identify potential types of error given a significance test

* Construct a confidence interval for a population mean

* Given a confidence level (\$1-alpha\$), determine the critical value (t*) needed to construct the confidence interval

* Construct and interpret a one sample interval for the mean based on the t-distribution

You may work in pairs from *within the same section*, but include both partner’s names on the documents and write a short description of how each person contributed to the lab. Labs completed with pairs from different sections will not be graded. Each person is responsible for submitting the documents to the assignment (i.e. each person in the pair submits the same documents to iLearn).

Here are some symbols that you might find useful:

\$H_0\$ \$H_1\$ \$sigma\$ \$mu\$ \$mu_0\$ \$mu_{word}\$ \$neq\$

***

“`{r setup, include=FALSE}

knitr::opts_chunk\$set(echo = TRUE, comment = “”, warning = FALSE, message = FALSE)

require(rmarkdown, quietly=TRUE)

“`

## Scenario 1: Revisiting our Advertising Campaign

Recall: A non-profit that delivers clean water devices world-wide generally averages 11 thousand dollars in donations daily. In hopes of expanding, the non-profit recently ran a print campaign to raise awareness and drive traffic to their website, hopefully resulting in an increase in donations. The ads ran each Sunday. You work for the company and have been asked to see if the campaign has resulted in an increase in the true mean daily donation. You randomly sample 40 days from the six months following the campaign start and determine a mean daily donation of 11.4 thousand dollars with a sample standard deviation of 1.57 thousand dollars.

1.1. In the problem above, bold the parameter of interest and italicize the research question. You may need to use a combination of `*` and `_` if they overlap — check the RMarkdown resources if you don’t recall how to use either `*` or `_`.

1.2. Calculate the 95% confidence interval ‘by hand’ for the true mean daily donation levels after the campaign started. Modify the code chunk below to enter the appropriate values for the [placeholders] (remove the entire placeholder, including brackets, and replace with correct value/name). Add comments to the code, using `#`s to describe what each line of code is doing.

“`{r}

xbar<-[placeholder] # describe here

s<-[placeholder] # describe here

n<-[placeholder] # describe here

se<-[placeholder]/sqrt([placeholder]) # describe here

(tcrit<-qt(1-[placeholder]/2, [placeholder]-1)) # describe here

“`

Use the R objects created above to fill in the [placeholders] below to calculate the lower and upper bound of the confidence interval.

“`{r}

(lower<-[placeholder]-[placeholder]*[placeholder]) # describe here

(upper<-[placeholder]+[placeholder]*[placeholder]) # describe here

“`

1.3. Write your final confidence interval below, using in-line code.

> ([placeholder], [placeholder])

1.4. Based on the confidence interval, do you think the true mean daily donation after the campaign could be the **same** as the value it was before the campaign stated, 11 thousand dollars? Why or why not? Possible words/ values to insert include: \$11.4k, 1.57, 40, \$11k, 5%, 95%, true mean, sample mean, population, sample. Bold your inserted/selected words/phrases.

> Our interval [DID/DID NOT] capture the null value (before-campaign mean of [placeholder]). Assuming our sample is one of the [placeholder] that would contain the [placeholder], then the mean daily donation after the ad campaign [WOULD BE THE SAME / WOULD NOT BE THE SAME] as the mean daily donation before the campaign started.

Recall: You decide to take a second sample to revisit and revise your question. This time, instead of selecting *any* day in the six months of the campaign, you randomly select *only* from days that the print campaign ran (i.e., a random selection of Sundays from the six months). You randomly select 10 days on which the print advertisement appeared and found a sample mean daily donation of 11.94 dollars (in thousands) and a sample deviation of 1.57 thousand dollars.

1.5. Should you calculate a confidence interval for this sample with what you currently know for this scenario? Why or why not?

1.6. Assuming your sample is roughly symmetric, calculate the 95% confidence interval ‘by hand’ (using calculations in R) for this new sample.

“`{r}

[insert code needed to answer 1.6 here]

“`

1.7. Interpret your confidence interval from 1.6. Edit the [placeholder]s to add in the specific values/context from this question, using in-line code where possible. The placeholders may stand in for a word, phrase, value, values, or mathematical notation. Bold your inserted answers.

> Based on our [placeholder], we are [placeholder] confident that the [insert parameter of interest] is between [lower bound] and [upper bound] [units].

1.8. Based on the **confidence level**, do you think that the non-profit has significantly increased their mean daily donation rate by running the ad campaign? Why or why not?

1.9. Does this conclusion agree with your hypothesis test from Lab #4? What about the tests for confidence intervals and hypothesis tests might differ that could give us different results?

***

## Scenario 2: Run differential

Baseball is a game for statistics lovers. One particular statistic is the mean run differential, or the mean number of runs between the winning and losing teams’ scores — in other words, by how much a particular team wins or loses games, on average. You’d like to know if your team is, on average, winning games (having a run differential greater than zero). You recorded the run differential for your favorite team for a random sample of games through the season. We can read them into R as follows:

“`{r}

run.diff<-c(3, 1, -1, -4, -2, 5, 2, 1, 1, -1, 3, -4, 4, 2, 1, 1, -1, 6, 10, 2, -1, -2, 3, 2, 1, -1, 3, 2, -1, -2, 1, -3, -1, 1, -4)

“`

2.1. Can you perform a hypothesis test?

2.2. Calculate the appropriate descriptive statistics for `run.diff`.

<Insert code chunk here>

2.3. Conduct a hypothesis test, using the `t.test()` function. Modify the code below by replacing the [placeholder]s.

“`{r}

t.test([placeholder, mu=[placeholder], alternative=”[placeholder]”)

“`

2.4. Calculate the confidence interval, using the `t.test()` function. Modify the code below by replacing the [placeholder]s

“`{r}

t.test([placeholder], alternative=”[placeholder]”, conf.level = [placeholder)\$conf.int

“`

*Note: your confidence interval should be two actual values. If you are seeing something else, go back and review your notes.*

2.5. What can you conclude from your hypothesis test and confidence interval about the true mean run differential for your favorite team?

***

## Scenario 3: Range Shifts

As the world warms, the geographic ranges of species might shift toward cooler areas. This could take the form of migration to higher latitudes or moving up in elevation from a species’ native range. Chen et al. (2011) studied recent changes in the highest elevation at which species occur. Typically, higher elevations are cooler than lower elevations. The researchers want to know if species have shifted upwards towards cooler elevations, i.e. if their elevational range shift is greater than zero meters. Below are the changes in highest elevation for 31 taxa in a given location (e.g. one data point might be plants in Switzerland, etc.), in meters, over the late 1900s and early 2000s. (Many taxa were surveyed, including plants, vertebrates, and arthropods; the taxa included were selected in an unbiased way.) Positive numbers indicate upward shifts in elevation, and negative numbers indicate shifts to lower elevations. (Chen et al. 2011. Science)

58.9, 7.8, 108.6, 44.8, 11.1, 19.2, 61.9, 30.5, 12.7, 35.8, 7.4, 39.3, 24.0, 62.1, 24.3, 55.3, 32.7, 65.3, -19.3, 7.6, -5.2, -2.1, 31.0, 69.0, 88.6, 39.5, 20.7, 89.0, 69.0, 64.9, 64.8

3.1. In the text above, bold the null value and italicize the research question.

3.2. Calculate the relevant summary statistics.

< Insert code chunk here; annotate your code to describe what descriptive statistic each line calculates >

Now we will want to conduct a hypothesis test.

3.3. Have the conditions been met to perform a hypothesis test?

> Insert answer, and, if needed, code chunk here

3.4. State the population of interest and the parameter of interest.

3.5. State the null and alternative hypotheses (verbal and symbolic).

3.6. Execute your hypothesis test and then fill in the appropriate values below.

<Inserted code chunk>

> \$alpha = \$

> \$s/sqrt{n} = \$

> \$t_{statistic} = \$

> \$p\$-value \$= \$

> [Reject/Fail to Reject] \$H_0\$

3.7. Interpret your decision (reject or fail to reject), in the context of the question. (i.e., what was your decision and how does that relate back to the central research question?)

3.8. Calculate a 95% confidence interval for the mean elevational shift, then write your interval below, using in-line code.

<Insert code chunk>

3.10. Why would we want to calculate a confidence interval? (What can a confidence interval tell us that a hypothesis test cannot?) Explain.

3.11. Based on the confidence interval and your one-sample t-test result, do you think the mean elevational shift could be above 0 m?

3.12. The expected range shift, based on tracking temperature, for Tsaratanana Massif in Madagascar is 43.6 m. Would you expect that the taxa in Tsaratanana Massif have tracked the temperature shift? Explain your reasoning.

3.13. The mean expected range shift, based on tracking temperature perfectly, is 126.97 m. Based on your confidence interval, do you think that taxa are tracking temperature well, or lagging behind? Explain your reasoning.

3.14. For each of the following, indicate if it is a correct or incorrect interpretation of the confidence interval. Provide an explanation for each.

a. 95% of the sample mean elevational range shifts will be between the upper and lower bounds of the confidence interval.

> Correct or Incorrect?

> Explain.

b. There is a 95% chance that the mean elevational range shift is between the upper and lower bounds of the confidence interval.

> Correct or Incorrect?

> Explain.

c. 95% of the time, when we calculate a confidence interval in this way, the true mean elevational range shift will actually be between the upper and lower bounds of the confidence interval. 5% of the time, it will not.

> Correct or incorrect?

> Explain.

3.15. All else being equal, how do the following influence a confidence interval? (Choose one answer, and bold it.)

a. Increased confidence level (1-alpha): [Wider, Narrower, No Influence]

b. Smaller sample size (n): [Wider, Narrower, No Influence]

c. Larger standard deviation: [Wider, Narrower, No Influence]