#### FINAL EXAM 37151 INTRODUCTION TO STATISTICS

## Final exam, 37151 Introduction to Statistics / 35513 Statistical Methods

Get Help for this exam now with

Final exam, 37151 Introduction to Statistics / 35513 Statistical Methods

School of Mathematical and Physical Sciences 3 June, 2021

### Instructions

The exam contains four problems worth 50 marks in total. Together with the *online graded exercises *and the *computer labs*, a total of 100 marks can be awarded for the subject. To pass the subject, you need to score at least 50% out of 100 marks in total.

The solutions must be uploaded to Canvas (where you found this document) before the deadline 3 June 6:30 PM AEST (unless you have special considerations). **Late submissions will automatically ****be awarded 0 marks**, so make sure you upload in time. The exam should take around two hours to solve, but it is advisable to start as early as possible during the 3:30 PM – 6:30 PM exam availability window. This will give you plenty of time to upload the exam. After you have uploaded the exam, send a copy of the exam to 37151_exam@uts.edu.au (the same email address also applies for the 35513 students).

- Write clear handwritten worked solutions on A4 paper (not provided). Scan the solutions us- ing Cam Scanner, Genius Scan, Adobe Scan or any other scanning program you see fit, and upload them before the above stated deadline. Optionally, you can use an electronic pencil if your laptop or tablet has that functionality. Provide a
**single pdf file**named EXAM_ 37151_studentnumber.pdf (or EXAM_35513_studentnumber.pdf if you are a 35513 stu- dent), where the front page is this one with your name and student ID on the top of this page.**If you cannot print out the first page, it is sufficient to write your full name and student ID on the top of the first page of your solutions**. - You are not allowed to collaborate, or take help from any other individual. The exam is open book (do not forget to use the formula sheet!), but you have to solve the problems yourself. Anything else is regarded as academic misconduct and may have serious consequences for you.
- Staff may ask that students undertake an oral test to ensure they have completed the work on their own and to assess their knowledge of the answers they have submitted.
- Should you face any problems, I will be available on matias.quiroz@uts.edu.au and on MS Teams during the whole exam window period (including special considerations).

Good luck!

### Problem 1 (5 marks)

Suppose that *X *and *Y *are random variables with outcomes *x *= 1*, *2, and *y *= 0*, *1*, *2, with the (incomplete) joint probability distribution *P *(*X *= *x, Y *= *y*) given in Table 1.

y

0 1 2

x 1 1/6 1/16

2 1/8 1/8

Table 1: Incomplete joint probability distribution *P *(*X *= *x, Y *= *y*) for Problem 1, missing the values

*P *(*X *= 1*, Y *= 2) and *P *(*X *= 2*, Y *= 0).

### (a.) (1 mark)

Suppose that *P *(*X *= 1) = 13*/*48. Complete the joint probability distribution in Table 1, i.e. compute

*P *(*X *= 2*, Y *= 0) and *P *(*X *= 1*, Y *= 2).

### (b.) (1 mark)

Compute the marginal distribution of *Y *, i.e. *P *(*Y *= *y*) for *y *= 0*, *1*, *2.

### (c.) (1 mark)

Compute the expected value of *X*, i.e. *µ _{X} *=

*E*(

*X*).

### (d.) (1 mark)

X |

Compute the variance of *X*, i.e. *σ*^{2}

= *V *(*X*).

### (e.) (1 mark)

Compute *P *(*X < *2*|**Y > *1).

#### Problem 2 (20 marks)

In an airport in a Scandinavian country, health care professionals randomly test people for COVID-19. Suppose that the test kit used has a 0.95 probability of detecting COVID-19 if the person is infected. Moreover, suppose that the test gives a 0.99 probability of not detecting COVID-19 if the person is not infected. Assume that 10 percent of people who pass the airport are infected with COVID-19.

#### (a.) (2 marks)

Suppose that a person returns a negative test result. What is the probability that the person is not infected?

Suppose that a person returns a positive test result. What is the probability that the person is infected?

#### (c.) (4 marks)

What is the probability that a test results in a positive test result?

#### (d.) (4 marks)

Health care professionals randomly select 17 people for testing. What is the probability that between 2 to 4 of them have COVID-19? What is the probability that at least 13 of them do not have COVID-19? Carefully state your assumptions.

#### (e.) (4 marks)

Suppose that 6 out of the 17 people tested in (d.) are men, and that 11 of them are women. Suppose that a second test is carried out, where 4 out of the 17 people are randomly sampled to undergo a second round of testing. What is the probability distribution of the number of men among the 4 persons who undergo the second round of testing? Clearly state the probability distribution and its parameters.

#### (f.) (4 marks)

Ramona works in a lunch restaurant at the airport. What is the probability that she has to serve at least three people before she encounters the first COVID-19 infected person? What is the probability that the 8th person she serves is the second one who has a COVID-19 infection?

### Problem 3 (10 marks)

originally contains *n *= 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris The Iris flower dataset was introduced by the famous British statistician Ronald Fisher. The dataset versicolor). For each species, the length and the width (in centimeters) of the sepals and petals are

measured. We here analyse one of the flower species, Iris virginica, and we focus on a single variable

*X*, which is the sepal width of Iris virginica.

i=1 |

Σ |

For the *n *= 50 Iris virginica flowers, the following statistics are obtained Σ*n **x _{i} *= 148

*.*7 and

*n i*=1

(*x _{i} *

*−*

*x*¯)

^{2}= 5

*.*0962. Moreover, an exploratory analysis shows that you may assume a normal

population for the sepal width of Iris virginica. No previous studies are available to you, so you may not assume that the population variance is known.

#### (a.) (1 mark)

Propose an unbiased estimator for the population mean of the sepal width of Iris virginica. Report the estimate given the data above.

What is the sampling distribution of your proposed estimator in (a.)? Carefully state your assumptions.

#### (c.) (1 mark)

Propose an unbiased estimator for the population variance of the sepal width of Iris virginica. Report the estimate given the data above.

#### (d.) (2 marks)

What is the sampling distribution of your proposed estimator in (c.)? Carefully state your assumptions. **Hint: **We have not directly encountered the sampling distribution of the estimator in (c.). However, we have encountered the sampling distribution of a properly scaled version of the estimator many times. Report the sampling distribution of this scaled estimator instead.

#### (e.) (2 marks)

A botanist says that she wants a two-sided 95% confidence interval of the population variance of the sepal width of Iris virginica. Provide such an interval to the botanist and interpret it. Use the

closest available approximation when reading the relevant quantiles from the statistical tables in the textbook. Alternatively, you may compute exact quantiles using R.

#### (f.) (2 marks)

two-sided 95% interval that provides information about the sepal width of the flower she will pluck. The botanist says that she plans to pluck a single Iris virginica tomorrow. Provide an appropriate Use the closest available approximation when reading the relevant quantiles from the statistical tables

in the textbook. Alternatively, you may compute exact quantiles using R.

### Problem 4 (15 marks)

Two statisticians, Ramon and Fernando, share salsa dancing as another common interest. Ramon claims that it is well known in the dancing community that he is a better dancer than Fernando. To test this hypothesis, they first agree that the best dancer is he who gets the smallest proportion of rejections (a lady can either accept or reject a dance invitation) over the course of one month. After one month, Ramon reports that out of 58 dance invites, he got 51 accepted. Fernando reports that out of 53 dances, he got 44 accepted.

#### (a.) (1 mark)

Carefully state the models, including the assumptions, that you would use to carry out a statistical analysis of the situation described above.

Write down the null and alternative hypotheses to test Ramon’s claim.

#### (c.) (3 marks)

Given your hypothesis in (b.), explain in words the two different errors that may result from this analysis? From Ramon’s ego’s point of view, which of these errors do you think is worse to commit?

#### (d.) (3 marks)

Provide a 95% confidence interval for the proportion in the dance community that would reject a dance from Fernando. Interpret the interval.

#### (e.) (4 marks)

Test your hypothesis in (b.) using the significance level *α *= 0*.*05. What is your conclusion regarding who is the better dancer?

#### (f.) (2 marks)

Suppose that the power of the test is 0*.*78 if the true population proportion that rejects a dance from Fernando is 0*.*14, and the corresponding population proportion for Ramon is 0*.*10. What is the probability of Type II error in this case?