DATA4200 Data Acquisition and Management Assignment Help

Assessment 1 Information

Subject Code:DATA4200
Subject Name:Data Acquisition and Management
Assessment Title:Sampling and data mining project
Assessment Type:Report
Word Count:1600Words(+/-10%)
Weighting:30%
Total Marks:30
Submission:via MyKBSand Turnitin
Due Date:By TuesdayWeek 5 (Report) 23:55AEST

 

Your Task

Read the Assessment Instructions and complete sections (a) – (e)

Consider the rubric at the end of the assignment for guidance on structure and content.

  • LO3: Create analysis-ready data sets by applying and exploring basic validation, preprocessing, filtering and cleaning techniques
  • LO4: Evaluate and apply data mining software

 

Submit your written report (in Word) and your software file (e.g. Excel, Power BI) via MyKBS by Tuesday 23:55 AEST Week 5.

 

Assessment Description

 

Business Problem: Airbnb is a U.S. company which provides an online marketplace for short- term and/or holiday accommodation. Airbnb collect large volumes of data to gain insight into their clients and associated customers, such as review scores, host acceptance rate, ‘superhosts’, popular accommodation types and density of listings in particular location.

 

 

Data sets: We have obtained data on Airbnb listings in Melbourne with a variety of variables. Sampled datasets, the original data and data dictionary will be available from Week 4. See sections below.

Assessment Instructions

 

Analysis and Report (30 marks)

Use Microsoft Excel or Power BI or Tableau.


Recall the sampling methods below that you have learnt about in lectures.

 

A data dictionary file and the following datasets (as .csv files) that contain sample data generated using quota, systematic, simple random, and stratified sampling will be available from week 4, see section c. below. You will also have to access the original population dataset cleansed_listings_dec_18.csv from the source, see section a. and section e. below.

 

Create a report and include your response to the following questions:

 

  1. Access the data file cleansed_listings_dec_18.csv, by going to the link provided on MyKBS under the Assessment 1 tab. You will initially be downloading a zip folder from the Melbourne Airbnb Open Data project on KaggleExtract all the files within the folder and then choose the file cleansed_listings_dec_18.csv. Browse over the columns and comment on which variables appear to be the most useful in terms of insights into current listings. Document that in your report. (150 words, 2 marks)

 

  1. List an advantage, possible disadvantage and limitations of each of the sampling methods. (150 words, 2 marks)

 

  1. Access the sampled data sets on MyKBS. Choose a number of different variables, as in part (a), then for each of the sampled datasets create summary statistics for each of those variables. That is, make sure that the selected variables are the same for each of the four datasets and document them in your report. (300 words, 6 marks)

 

  1. Interpret and compare the results of the summary stats across all four sample datasets. What conclusions can you draw from the comparison. Document your findings in your report. (500 words, 10 marks)

 

  1. Repeat the above for the original dataset cleansed_listings_dec_18.csv. Explain with statistical examples which sampling method summary stats (across all chosen variables) were nearest in value to the original dataset summary stats.

 

Explain the variations in your report and include the supporting data. Explain possible ethical issues that could occur from the use of sampled data.

 

Briefly evaluate the software that you have used to produce the summaries. (500 words, 10 marks)

Important Study Information

 

Academic Integrity Policy

 

KBS values academic integrity. All students must understand the meaning and consequences of cheating, plagiarism and other academic offences under the Academic Integrity and Conduct Policy.

 

What is academic integrity and misconduct? What are the penalties for academic misconduct? What are the late penalties?

How can I appeal my grade?

 

Click here for answers to these questions: http://www.kbs.edu.au/current-students/student-policies/.

 

Word Limits for Written Assessments

 

Submissions that exceed the word limit by more than 10% will cease to be marked from the point at which that limit is exceeded.

 

 

Study Assistance

 

Students may seek study assistance from their local Academic Learning Advisor or refer to the resources on the MyKBS Academic Success Centre page. Further details can be accessed at https://elearning.kbs.edu.au/course/view.php?id=1481

 

Generative AI Traffic Lights

 

Please see the level of Generative AI that this assessment has been designed to accept:

 

Traffic Light

 

Amountof Generative Artificial Intelligence (AI) usage

 

Evidence Required

This assessment ()

 

 

 

Level1

This assessment fully integrates Generative AI, encouraging you to harness the technology's full potential in collaboration withyour own expertise.

It willhighlight your ability to demonstrate how effectively you can work alongside AI to achieve sophisticated outcomes, blending human intellect and artificial intelligence.

Your collaboration with AI must be clearlyreferenced and documented in the appendix of your submission, including all prompts and responses used for

the assessment.

 

 

 

 

Level2

This assessment invites you to engage with Generative AI as a means of expanding your creativity and idea generation.

It will highlight your ability to complement your original thinking with the capabilities of AI. For example, through brainstorming and preliminary concept development.

Your collaboration with AI must be clearlyreferenced and documented in the appendix of your submission, including all prompts and responses used for

the assessment.

 

 

 

 

 

Level3

This assessment showcases your individual knowledge and skills in the absence of Generative AI support.

It willhighlight your personal abilities. For example, to analyse, synthesise, and create based on your own understanding and learning.

Use of generative AI is prohibited and may potentially result in penalties for academic misconduct, including but not limited to a mark of zero for the

assessment.

 

 

Assessment Marking Guide

SectionCriteria

NN (Fail)

0-0.5 mark

P (Pass) 50%-64%CR (Credit) 74%-65%DN (Distinction) 75%-84%HD (High Distinction) 85%-100%
(a)Comments on the usefulness of at least 4 variables in relation to insights (2 marks)No commentsComments on one selected variableComments on two selected variablesComments on three selected variablesComments on at least 4 selected variables
(b)State at least 3 advantage/disadvantage and limitations (2 marks)not statedOne advantage / disadvantage and one limitation statedTwo advantage / disadvantage and two limitations statedAny three advantages/disadvantages and less than 3 limitationsAt least 3 advantage/disadvantage andlimitations stated
(c)Summary statistics for each sample across the four selected variables (6)One sample and one selected variableTwo samples and two selected variables2-3 samples and 3 variablesAny three advantages/disadvantages and less than 3 limitationsAt least 3 advantage/disadvantage andlimitations stated
(d)Comparisons made of results generated above and conclusions drawn and documented (10 marks)No or limited comparison/conclusions drawnResults compared to 2 samples and 2 selected variables with limited conclusionsResults compared to 2 samples and 2 selected variables with limited conclusions3 -4 samples 3 variables used in comparison of results with meaningful conclusions4 samples and at least 4 variablesused in comparison of results withmeaningful conclusions
(e)Explained with statistical examples which sampling method summary stats across all selected variables were nearest the main dataset, and variations were explained. Explain ethical issues and evaluate the software. (10 marks)

No, or very limited explanation of the comparative variations across 0-1 selected variables.

 

Ethics not considered

 

Evaluation of software not mentioned

Comparison of summary stats across one sample and just two simple variables

 

Ethics considered in a very general way

 

Evaluation of software very general

Comparison of summary stats across at least two sample and two unrelated variables

 

 

Ethics considered in a more relevant way, but may not be practical

 

Evaluation of software relevant

Comparison of summary stats across at all samples and three variables

 

Ethics considered in a very relevant, practical and realistic way

 

Evaluation of software relevant andspecific to this project

Comparison of summary stats acrossat least four sample and at least four variables. Diverse variable choices and originality shown.

 

Report engaging, novel and well integrated

 

Ethics considered in a very relevant, novel and practical way

 

Evaluation of software detailed, relevant and specific to this project

 

Example invalid form file feedback

Join our 150К of happy users

Get original papers written according to your instructions and save time for what matters most.