Get Cheapest Assignment in Australia, UK, US, UAE, Canada and NZ Order Now

ICT583 Data Science Applications Data visualization

0 Comments

ICT583 Data Science Applications

Murdoch University

Exercise 2: Data visualization

Instructions

  1. This exercise must be done individually by each student.
  2. Write your answers in a report format. Clearly indicate each question/sub-question number, and give your code followed by the snapshot of your plots.
  3. You will submit one .R file along with the report, so we can run it for check. Make sure your code matches the provided answers. For example, if there are three separate plots your code should produce exactly the same three separate plots.
  4. Code should be easy to read and understand. Only include code and comments necessary for the exercise.

Questions

  1. Using the famous Galton data set from the mosaicData package:

library(mosaicData)

head(Galton)

Note: You can find out more about the data set by running the command ?Galton Always make sure you understand the data set before answering the questions.

1.1. Create a scatterplot of each male child’s height against their mother’s height (10 points)

1.2. Separate your plot into facts by nkids (10 points)

1.3. Add regression lines to all your facts(10 points)

  • Using the storms data set from the nasaweather package:

2.1 create a scatterplot between wind and pressure, with color being used to distinguish the type of storm. (10 points)

2.2 You might notice there are lots of overlapping data points in the scatterplot due to a comparatively large sample size, How would you improve your visualization? (10 points)

  • Using the whately_2015 data set from the macleish package:

Using ggplot2, create a data graphic that displays the average temperature over each 10-minute interal (temperature) as a function of time (when). Show both connected line and fitted line (20 points)

  • Use time_series_covid_19_confirmed.csv file from LMS to create a clear bar chart that displays the latest number of COVID-19 cases of top 10 countries. (30 points)

Note: This data set details can be found via https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset#time_series_covid_19_confirmed.csv;

Consider how to improve the quality and aesthetics of your visualization;

The data manipulation part has to be done in R.

Check out our latest posts !!