HI6007 Statistics for busines
HOLMES INSTITUTE
FACULTY OF HIGHER EDUCATION
HI6007 Group Assignment
Due End of Week/Lecture 10
WORTH 30%
(Maximum 5 students in the group)
“This is an applied assignment, not a research assignment. You have to show that you understand the principles and techniques taught in this course. Therefore you are expected to show all your workings, and all problems must be completed in the format taught in class, the lecture notes or prescribed text book. Any problems not done in the prescribed format will not be marked, regardless of the ultimate correctness of the answer.“
Instructions:
- Your assignment must be submitted in WORD format only!
- When answering questions, wherever required, you should copy/cut and paste the Excel output (e.g., plots, regression output etc) to show your working/output.
- Submit your assignment through Safe-Assign in the course website, under the Assignments and due dates, Assignment Final Submission before the due date.
- You are required to keep an electronic copy of your submitted assignment to re-submit, in case the original submission is failed and/or you are asked to resubmit.
- Please check your Holmes email prior to reporting your assignment mark regularly for possible communications due to failure in your submission.
Important Notice:
All assignments submitted undergo plagiarism checking; if found to have
cheated, all involving submissions would receive a mark of zero for this
assessment item.
Please read below information carefully and respond all questions listed.
- Many Holmes Institute instructors believe that students need to spend at least 2 hours studying outside of class for every hour of lecture. They believe that the number of hours students study to prepare for the exam affect students’ marks significantly. As opposed, few of the lecturers believe that the number of preparation hours do not essentially affect students’ marks while some other factors are to be considered. To study the relationship between the preparation time spent by each student (in hours) for the exam and the reported mark, a sample of 100 students were selected randomly from a large statistics class. The data are stored in the file named “ASSIGNMENTDATA” in the course website. Answer below 9 questions: (22 marks)
- What type of survey method could be used? Explain your answer.
(1.5 mark)
- What sampling method could be used to select the sample? Explain your answer. (1.5 mark)
- On the basis of given data, determine the dependent and independent variables we should use, and why? Also, identify the data type(s) for each variable.
(2 marks)
- What kind of issues we may face in collecting the data using this type of survey method? List and explain two cases. (1 mark)
- Using 8 classes and intervals of 20 – 30, 30 – 40, etc for both of the variables selected in question 3, develop a distribution table including class intervals, frequency, relative frequency and cumulative relative frequency for each variable. Then, draw frequency histogram, relative frequency histogram and cumulative relative frequency histogram for each variable. Also, Comment on the shape of frequency histogram for each variable and provide reason(s) for your comment.
(5.5 marks)
- Draw and use an appropriate scatter plot to investigate the relationship between the two variables. Also, briefly explain the selection of each variable on the X and Y axes and the reason? Finally, draw the fitting line for the plotted observations. (2.5 marks)
- Present the equation of the estimated fitting line (regression) in your answer to Question f. Then, estimate the effect of an increase in the independent variable by one unit on the dependent variable. (2.5 marks)
- Prepare a numerical summary report about the data on the two variables by including the mean, median, range, variance, standard deviation, smallest and largest values, quartiles, interquartile range and the 30^{th} percentile for each variable. (3.5 marks)
- Compute a numerical measurement which measures the strength and direction of the linear relationship between the two variables. Also, interpret this value. (2 marks)
- To determine whether or not the height of sons is related to father’s height (x1) and mother’s height (x2), data were gathered and part of the multiple regression excel output is shown below. Fill the table and answer the following questions. (8 marks)
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.5169 | |||||
R Square | 0.2672 | |||||
Adjusted R Square | 0.2635 | |||||
Standard Error | 8.0683 | |||||
Observations | 400 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 2 | 9421.58 | ? | ? | 0.0000 | |
Residual | ? | 25843.41 | ? | |||
Total | ? | 35264.98 | ||||
Coefficients | Standard Error | t Stat | P-value | |||
Intercept | 93.8993 | 8.0072 | 11.7269 | 0.0000 | ||
X1 | 0.4849 | 0.0412 | 11.7772 | 0.0000 | ||
X2 | -0.0229 | 0.0395 | -0.5811 | 0.5615 | ||
- What is the standard error of estimate? What does this statistic tell you?
(0.5 mark)
- What is the coefficient of determination? What does this statistic tell you? (1 mark)
- What is the adjusted coefficient of determination for degree of freedom? What do this statistic and the one referred to in part (b) tell you about how well the model fits the data (1 mark)
- Test the overall utility of the model. What does the test result tell you? (1.5 marks)
- Interpret each of the coefficients. (2 marks)
- Do these data allow the statistic practitioner to infer that the heights of the sons and the fathers are linearly related? (1 mark)
- Do these data allow the statistic practitioner to infer that the heights of the sons and the mothers are linearly related? (1 mark)
END OF THE ASSIGNMENT