#### ICT583 Data Science Applications Exercise 1 Data manipulation

**ICT583 Data Science Applications**

**Murdoch University **

**Exercise 1: Data manipulation**

**Instructions **

- This exercise must be done
**individually**by each student. - Write your answers in a report format. Clearly indicate each question/sub-question number, and give your code followed by the snapshot of your results.
- You will submit one .R file along with the report, so we can run it for check. Make sure your code matches the provided answers. For example, if there are three separate data frames your code should produce the same three separate data frames.
- Code should be easy to read and understand. Only include code and comments necessary for the exercise.

*Remember:*

Each of the following tasks can be performed using a single data verb function.

1. Find the average of one
of the variables. **summarise()**

2. Add a new column that is
the ratio between two variables. **mutate()**

3. Sort the cases in descending
order of a variable. **arrange() with desc()**

4. Create a new data table
that includes only those cases that meet a criterion. **filter()**

5. From a data table with
three categorical variables A, B, and C, and a quantitative variable X, produce
a data frame that has the same cases but only the variables A and X. **select()**

**Questions
**

- Use
the
**nycflights13**package and the**flights**data frame to answer the following question: What plane (specified by the tailnum variable) traveled the most times from New York City airports in 2013? (20 points)

- Use
the
**nycflights13**package and the**weather**table to answer the following questions: On how many days was there precipitation in the New York area in 2013? Were there differences in the mean visibility (visib) based on the day of the week and/or month of the year? (20 points)

- Define two new variables in the
**Teams**data frame from the**Lahman**package: batting average (BA) and slugging percentage (SLG). Batting average is the ratio of hits (H) to at-bats (AB), and slugging percentage is total bases divided by at-bats. To compute total bases, you get 1 for a single, 2 for a double, 3 for a triple, and 4 for a home run. (20 points)

- Display the top 15 teams ranked in terms of slugging percentage in MLB history. Repeat this using teams since 1969. (20 points)

- Create a factor called election that divides the yearID into four-year blocks that correspond to U.S. presidential terms. During which term have the most home runs been hit? Hint: seq function (20 points)