Trimester 3, 2025
Marks: 45% of the Total Assessment for the Course
Due Date: 11:59pm January 23rd 2026
Submit your assignment to Canvas – Assignments - Task 2. Please follow the submission instructions in Canvas.
The assignment will be marked out of a total of 100 marks and forms 45% of the total assessment for the course. ALL assignments will be checked for plagiarism by Turnitin system provided by Canvas automatically.
Be aware, you must do your own work. If your work resembles in code, or in text, work of others in class or a third party, disciplinary action will be taken. Consequences include possible course fail grade, suspension from the university, or discharge from the university. Refer to your Course Outline or the Course Web Site for a copy of the "Student Misconduct, Plagiarism and Collusion" guidelines.
Late submission will be penalised according to the policy in the course outline. Please note Saturday and Sunday are included in the count of days late.
Requests for an extension to an assignment MUST be made to the course coordinator prior to the date of submission and requests made on the day of submission or after the submission date will only be considered in exceptional circumstances. Assignment submission extensions will only be made using the official University guidelines.
The data consist of the estimation of obesity levels in people from the countries of Mexico, Peru and Colombia, with ages between 14 and 61 and diverse eating habits and physical condition. Data was collected using a web platform with a survey where anonymous users answered each question, then the information was processed obtaining 17 attributes and 2111 records.
The attributes related with eating habits are: Frequent consumption of high caloric food (FAVC), Frequency of consumption of vegetables (FCVC), Number of main meals (NCP), Consumption of food between meals (CAEC), Consumption of water daily (CH20), and Consumption of alcohol (CALC). The attributes related with the physical condition are: Calories consumption monitoring (SCC), Physical activity frequency (FAF), Time using technology devices (TUE), Transportation used (MTRANS); variables obtained: Gender, Age, Height and Weight.
NObesity values are:
• Underweight Less than 18.5
• Normal 18.5 to 24.9
• Overweight 25.0 to 29.9
• Obesity I 30.0 to 34.9
• Obesity II 35.0 to 39.9
• Obesity III Higher than 40
The data contains numerical data and continuous data. Data is provided in CSV format.
You are undertaking a consulting task for the Health Ministers in Mexico, Peru and Colombia. You are tasked with analysing this set of data and providing insights for potential strategies. You are required to provide a detailed analysis of the data set, both describing the data, and provide inference and predictions of the data. Once you have described the entire set, you need to provide advanced analysis/trends you find in the data. You need to provide a professional report on the data for your client.
Key Questions you need to answer:
Describe the data. Provide a comprehensive overview of the data and its attributes, things such as how many, what type, what it describes. Exploratory Data Analysis requires you to fully describe any trends and factors you may see. You will use graphs, tables or plots to do this.
Describe the analysis finding/s: What did you find, what did you predict, what did you think is important? You will use Cluster, Decision Trees, Regression models to do this.
You have been requested to prepare a data analysis report about your work and explain your findings to the state or territory government of your choice.
They may have limited ICT or mathematical knowledge. Therefore, the report should be technical but have clear explanations describing the findings.
To prepare the report, please include the following sections:
Introduce the problem. Include background material as appropriate: who cares about this problem, what impact it has, what are the dimensions and structure of the data. What is it that you intend to describe and then predict. Provide research question/s.
Describe any methods you use to get the data ready into a format that can be analysed. Be clear on this process, especially in your R code, and describe the process in your report.
One-variable analysis studies one variable (one column/attribute) each time. It is up to you to decide which attribute/variable you use for this analysis but the attribute you select need to be related to the research objectives. Provide tables, graphs or plots.
A two-variable analysis studies the relation between two variables. It is up to you to decide which attributes/variables you use for this analysis but the attributes you select need to be related to the research objectives.
Briefly explain the concept of linear regression (with references). It is up to you to decide which attributes/variables you use for this analysis but the attributes you select need to be related to the research objectives.
Briefly explain the concept of clustering and k-means (with references). Perform a clustering analysis. It is up to you to decide which attribute(s) you use for this analysis but the attribute(s) you select need to be related to the research objectives.
Briefly explain the concept of decision trees, provide appropriate outputs and justification for your work. It is up to you to decide which attribute(s) you use for this analysis but the attribute(s) you select need to be related to the research objectives.
Sum up your findings and provide some insight into the findings.
In this part, discuss any difficulties you had performing the analysis and how you solved those difficulties. Reflect on how the analysis process went for you, what you learnt, and what you might do differently next time. Aim to write one paragraph.
For all data analysis (Section 2, 3 & 4), you need to provide both a separate R script file and the explanation to the code (in comments in code).
Please submit a single R code file as part of your submission for compiling and running.
Your R code MUST run.
The marking rubric is viewable on Canvas.
Your report should be 1,200+ words. The report MUST be formatted using the following guidelines:
Please follow the conventions detailed in:
Summers, J. & Smith, B., 2014, Communication Skills Handbook, 4th Ed, Wiley, Australia.
References for the explanation of decision trees and linear regression are required. These references should follow the Harvard or APA method of referencing. Note that ALL references should be from journal articles, conference papers, technical papers or a recognized expert in the field. Use the library databases or Google Scholar to find appropriate articles. DO NOT use Wikipedia as a reference.
Assignment grades will be available on Canvas in two weeks after the submission. Details of marking will also be accessible via online rubrics on Canvas.
Where an assignment is undergoing investigation for alleged plagiarism or collusion the grade for the assignment and the assignment will be withheld until the investigation has concluded.
This assignment will take many weeks to complete and will require a good understanding of data science theories and practices for successful completion. It is imperative that students take heed of the following points in relation to doing this assignment:
Get original papers written according to your instructions and save time for what matters most.