Modelling with Matrices
Using data from a sporting context to rank teams
Introduction
This report is a detailed study on how dominance matrices can be useful in sporting competitions. The dominance model can be used to make predictions based on current season performances about which team might win a competition. This report provides information obtained by applying dominance matrices on a sport which has different teams with their given scores till a certain round and then predicting the result of the rounds which will happen in future and hence predicting the possible winner. This report will pay attention to various circumstances that might occur in the process of predicting the result of the concerned sport; like a tie and how that circumstance affects the mathematical calculations.
It will also highlight the why’s and how’s of the method used for solving the described problem by offering explanation for the observed changes. The matrix observations will have limitations, which will be noted, and will be further analyzed with the help of another matrix. This report will explain how by using dominance matrix in the game theory, we could enhance the analysis and hence predict with higher accuracy.
Outline of the Problem:
Predict the final standings of the teams played in a competition, which was organized, in the round-robin format.
Use Dominance model to predict the results of at least 3 games of the event the event.
Methodology:
- Gathering and Preparing of data set
- Select test dataset
- Use Dominance Matrix of the training dataset
- Use Dominance Matrix to predict the results of the test dataset
- Compare the actual result and prediction by Dominance Matrix
- Use Supremacy Model on the training dataset
- Use Supremacy Model to predict the results of the test dataset
- Compare the actual result and prediction by Supremacy model
Investigate ways of refining the dominance model which might improve the predictions made, considering:
- different supremacy models
- adding further game outcomes to the dominance matrix
- some way of incorporating winning margins
1. The sport for which we will be using dominance matrices to make predictions is Big Bash League (BBL). Big Bash League is an Australian Twenty 20 league for cricket (Big Bash, 2018). The league is features eight city-based franchises or cricket teams who compete against each other. Earlier this league was called KFC Twenty20 Big Bash, which was later renamed. Also, the league previously featured six state teams, which were later, replaced by 8 city teams.
It was established in 2011 by the Cricket Australia and that was the year when its first tournament took place. Big Bash League is an annual league, which happens in the month of December and January.
2. Dataset
We are taking the data of the BBL tournament, which happened in 2011 to make predictions using the dominance matrix (Kaggle, 2011).
As per the past statistics, out of the eight teams, five teams have won the title at least once in the tournament.
Description of dataset :
MatchDateSK – Date of Match played – (YYYYMMDD) Numeric
Team 1 – Team Name – String
Team 2 – Team Name – String
Winner – Winner of the match – String
Margin – Winning Margin – (wickets/runs) String
If Team 1 wins, winning margin is in runs. If Team 2 wins, winning margin is in wickets.
The report uses only MatchID, Team 1, Team 2, and Winner from season 2011.
In the first iteration, the report uses only first 8 matches played by each of the team in season 2011. To improve the model further, it will be introduced to more games and its results.
The detailed dataset of first 8 games played by each team: –
MatchDateSK | Team 1 | Team 2 | Winner | Margin |
20111216 | Syd Sixers | Heat | Syd Sixers | 7 wickets |
20111217 | Melb Stars | Syd Thunder | Syd Thunder | 6 wickets |
20111218 | Scorchers | Hurricanes | Hurricanes | 31 runs |
20111218 | Strikers | Melb Reneg | Strikers | 67 runs |
20111220 | Heat | Melb Stars | Melb Stars | 8 runs |
20111221 | Hurricanes | Syd Sixers | Hurricanes | 42 runs |
20111222 | Melb Reneg | Scorchers | Scorchers | 8 wickets |
20111223 | Syd Thunder | Strikers | Syd Thunder | 6 wickets |
20111227 | Syd Sixers | Melb Stars | Syd Sixers | 2 runs |
20111228 | Strikers | Hurricanes | Hurricanes | 14 runs |
20111229 | Scorchers | Heat | Scorchers | 10 runs |
20111230 | Syd Thunder | Melb Reneg | Melb Reneg | 6 runs |
20120101 | Hurricanes | Syd Thunder | Hurricanes | 5 wickets |
20120102 | Melb Reneg | Syd Sixers | Melb Reneg | 8 wickets |
20120103 | Heat | Strikers | Strikers | 31 runs |
20120104 | Melb Stars | Scorchers | Scorchers | 8 runs |
20120106 | Heat | Hurricanes | Heat | 3 runs |
20120107 | Melb Stars | Melb Reneg | Melb Stars | 11 runs |
20120108 | Syd Thunder | Syd Sixers | Syd Sixers | 17 runs |
20120108 | Scorchers | Strikers | Scorchers | 42 runs |
20120109 | Hurricanes | Melb Stars | Melb Stars | 19 runs |
20120110 | Strikers | Syd Sixers | Syd Sixers | 64 runs |
20120111 | Syd Thunder | Scorchers | Scorchers | 9 wickets |
20120112 | Melb Reneg | Heat | Heat | 12 runs |
20120117 | Heat | Syd Thunder | Heat | 91 runs |
20120118 | Syd Sixers | Scorchers | Syd Sixers | 1 run |
20120118 | Hurricanes | Melb Reneg | Hurricanes | 7 wickets |
20120119 | Melb Stars | Strikers | Melb Stars | 6 wickets |
Let us assume that the different codes given to the teams are as follows:-
Heat – Team 1
Hurricanes – Team 2
Melb Reneg – Team 3
Melb Stars – Team 4
Strikers – Team 5
Syd Thunder – Team 6
Syd Sixers – Team 7
Scorchers – Team 8
There is a total of 8 teams in the league, each team will play 7 matches against each other.
Hence, the no. of total matches will be 28, i.e. (7 matches * 8 teams)/2 teams per match
MatchDateSK | Team 1 | Team 2 | Winner | Margin |
20111216 | 7 | 1 | 7 | 7 wickets |
20111217 | 4 | 6 | 6 | 6 wickets |
20111218 | 8 | 2 | 2 | 31 runs |
20111218 | 5 | 3 | 5 | 67 runs |
20111220 | 1 | 4 | 4 | 8 runs |
20111221 | 2 | 7 | 2 | 42 runs |
20111222 | 3 | 8 | 8 | 8 wickets |
20111223 | 6 | 5 | 6 | 6 wickets |
20111227 | 7 | 4 | 7 | 2 runs |
20111228 | 5 | 2 | 2 | 14 runs |
20111229 | 8 | 1 | 8 | 10 runs |
20111230 | 6 | 3 | 3 | 6 runs |
20120101 | 2 | 6 | 2 | 5 wickets |
20120102 | 3 | 7 | 3 | 8 wickets |
20120103 | 1 | 5 | 5 | 31 runs |
20120104 | 4 | 8 | 8 | 8 runs |
20120106 | 1 | 2 | 1 | 3 runs |
20120107 | 4 | 3 | 4 | 11 runs |
20120108 | 6 | 7 | 7 | 17 runs |
20120108 | 8 | 5 | 8 | 42 runs |
20120109 | 2 | 4 | 4 | 19 runs |
20120110 | 5 | 7 | 7 | 64 runs |
20120111 | 6 | 8 | 8 | 9 wickets |
20120112 | 3 | 1 | 1 | 12 runs |
20120117 | 1 | 6 | 1 | 91 runs – Test data |
20120118 | 7 | 8 | 7 | 1 run – Test data |
20120118 | 2 | 3 | 2 | 7 wickets – Test data |
20120119 | 4 | 5 | 4 | 6 wickets – Test data |
The Top 4 teams were – 2, 4, 7, 8. Hence, these teams were the semi-finalists in the tournament.
Dominance Matrix of order 1, D:
Counting the results of the match and feeding it into the matrix prepare dominance matrix.
In the above matrices, the rows of the dominance matrix denote the team names, likewise in the matrix given below, the columns titles are now representing the different teams.
Total sum indicates the number of wins for each team, which can be used to rank them.
Here each defeat is represented as 0, and win is represented as 1 in the matrix.
For e.g.,
i(1,2)= 1 means, Team 1 defeats Team 2.
or
i(8,1)= 1 means, Team 8 defeats Team 1.
The original dominance matrix of the results of big bash season 2011 is given below:-
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total sum | |
1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 2 |
2 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 5 |
3 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 2 |
4 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 4 |
5 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 2 |
6 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 2 |
7 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 5 |
8 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 5 |
As we can observe from the above matrix that there are 3 teams at the first position with equal points, i.e. Team 2, team 7 and team 8.
Problem Outline: How to assign a particular position to team 2, team 7 and team 8 in the top 3.
Method: We’ll find the squares and cubes of the original matrix D, one step at a time to break the tie between the three teams
Step 1: Dominance matrix of order 2, D2
We are squaring the original matrix D, which can be represented using formulae as:
D * D = D2
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total sums | |
1 | 0 | 0 | 1 | 1 | 2 | 2 | 2 | 1 | 9 |
2 | 3 | 0 | 2 | 3 | 3 | 3 | 1 | 1 | 16 |
3 | 1 | 0 | 0 | 2 | 2 | 1 | 0 | 1 | 7 |
4 | 1 | 1 | 3 | 0 | 1 | 3 | 2 | 1 | 12 |
5 | 0 | 1 | 1 | 0 | 0 | 2 | 1 | 0 | 5 |
6 | 2 | 1 | 2 | 0 | 1 | 0 | 0 | 0 | 6 |
7 | 3 | 2 | 4 | 2 | 3 | 2 | 0 | 0 | 16 |
8 | 2 | 2 | 3 | 1 | 2 | 2 | 1 | 0 | 13 |
Here, i(1,5) = 2 represents that Team 1 has defeated 2 teams, and those two teams have defeated team 5.
Step 3: Dominance Matrix of order 3, D3
Now, for the last step we’ll find out the cube to the matrix D, which can be represented using formulae as:
D2 * D = D3
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total sum | |
1 | 6 | 1 | 4 | 5 | 6 | 4 | 1 | 2 | 29 |
2 | 8 | 6 | 10 | 5 | 8 | 7 | 2 | 1 | 47 |
3 | 5 | 3 | 6 | 2 | 4 | 2 | 0 | 0 | 22 |
4 | 4 | 1 | 4 | 6 | 7 | 8 | 4 | 3 | 37 |
5 | 1 | 0 | 1 | 3 | 4 | 3 | 2 | 2 | 16 |
6 | 1 | 2 | 4 | 0 | 1 | 5 | 3 | 1 | 17 |
7 | 5 | 5 | 10 | 2 | 6 | 9 | 6 | 2 | 45 |
8 | 4 | 3 | 7 | 3 | 6 | 8 | 5 | 3 | 39 |
Now, let’s assume that the last round (4 matches) have not yet happened. Thus, by removing the results/ scores of the last 4 matches from the matrix and we will be able to predict the result using the dominance matrix itself.
Step 1:Dominance Matrix of order D`:
Here is the updated dominance matrix-
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total Sum | |
1 | 0 | 1 | 1 | 0 | 0 | – | 0 | 0 | 2 |
2 | 0 | 0 | – | 0 | 1 | 1 | 1 | 1 | 4 |
3 | 0 | – | 0 | 0 | 0 | 1 | 1 | 0 | 2 |
4 | 1 | 1 | 1 | 0 | – | 0 | 0 | 0 | 3 |
5 | 1 | 0 | 1 | – | 0 | 0 | 0 | 0 | 2 |
6 | – | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 2 |
7 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | – | 4 |
8 | 1 | 0 | 1 | 1 | 1 | 1 | – | 0 | 5 |
“-“: These represent the scores of the last round (4 matches) which were removed from the matrix.
For matrix calculations, we need to replace “-“ by “0”, otherwise the matrix could not be determined. Below is the updated dominance matrix, D`–
4 & 5. Problem outline:Use the dominance matrix to rank the teams on the results so far and make predictions about the outcomes of the three games yet to be played. Also, Discuss second and third order influences and their significance. Choose a supremacy model to use with your data and compare its predictions to those made for the three games yet to be played in part 4.
Step 1 :
Here, “0” represents either loss or did not play.
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total Sum | |
1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 2 |
2 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 4 |
3 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 2 |
4 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 3 |
5 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 2 |
6 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 2 |
7 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 4 |
8 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 5 |
Now this dominance Matrix D` shows that the top 4 teams are:
Rank 1 – Team 8
Rank 2 – Team 2, 7
Rank 4 – Team 4
All other teams have equal points, so to rank them, we will generate a second order matrix.
This dominance matrix, D` will also help us to predict the result of the last round in the later steps.
As per the match schedule give in the dataset above, the last round of 4 matches will be held between the following teams-
Team 1 v/s Team 6
Team 7 v/s Team 8
Team 2 v/s Team 3
Team 4 v/s Team 5
Step 2: Dominance matrix of order 2, D’2
As per the match schedule given above, we couldn’t predict the result of the matches yet because as till the second last round (as per matrix D’) –
- Team 1 v/s Team 6: Team 1 has a total score of 2 and team 6 has the same score, so only the last match can decide who will go ahead.
- Team 7 v/s Team 8: Team 7 has a total score of 4 and team 8 has a score of 5, since there is a difference of just 1 point. Hence, the last match will decide if it’s a tie or team 8 will win.
- Team 2 v/s Team 3: The result between Team 2 and team 3 can be determined. Even if team 3 wins the last round, team 2 will be the final winner.
- Team 4 v/s Team 5: Same as case 2.
Hence, to determine which teams will go ahead and the ranking after 6 rounds, we need to prepare another matrix.
Squaring the dominance matrix D’, we get
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total Sum | |
1 | 0 | 0 | 0 | 0 | 1 | 2 | 2 | 1 | 6 |
2 | 3 | 0 | 2 | 3 | 3 | 2 | 0 | 0 | 13 |
3 | 1 | 0 | 0 | 2 | 2 | 1 | 0 | 0 | 6 |
4 | 0 | 1 | 1 | 0 | 1 | 2 | 2 | 1 | 8 |
5 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 4 |
6 | 2 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 5 |
7 | 2 | 2 | 3 | 1 | 1 | 0 | 0 | 0 | 9 |
8 | 2 | 2 | 3 | 1 | 1 | 1 | 1 | 0 | 11 |
Here, i(1,5) = 2 , represents that Team 1 has defeated 2 teams which defeated team 5.
Ranking after Round 6 :
Prediction :
Hence from the above matrix D’2, we can observe that –
Between team 1 v/s team 6 = Tie or team 1 (Not so sure)
Between team 7 v/s team 8 = Team 8
Between team 2 v/s team 3 = Team 2
Between team 4 v/s team 5 = Team 4
In order to calculate the rank between Team 1 and Team 3 & predict the result of match between Team 1 & Team 6, we need to calculate third order of dominance matrix.
Step 3: Let’s calculate D’3 to predict result of 1 v/s 6 & the ranking:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total Sum | |
1 | 4 | 0 | 2 | 5 | 5 | 3 | 0 | 0 | 19 |
2 | 6 | 6 | 9 | 2 | 2 | 2 | 2 | 0 | 29 |
3 | 4 | 3 | 5 | 1 | 1 | 0 | 0 | 0 | 14 |
4 | 4 | 0 | 2 | 5 | 6 | 5 | 2 | 1 | 25 |
5 | 1 | 0 | 0 | 2 | 3 | 3 | 2 | 1 | 12 |
6 | 0 | 2 | 2 | 0 | 1 | 3 | 3 | 1 | 12 |
7 | 2 | 3 | 4 | 0 | 2 | 5 | 5 | 2 | 23 |
8 | 3 | 3 | 4 | 2 | 4 | 6 | 5 | 2 | 29 |
As per the above matrix it can be observed that the winner between team 1 v/s team 6 is predicted to be team 1. Also we can observe the actual ranking of the teams.
Ranking after Round 6 :
- Rank 1 – Team 8
- Rank 2 – Team 2
- Rank 3 – Team 7
- Rank 4 – Team 4
- Rank 5 – Team 1
- Rank 6 – Team 3
- Rank 7 – Team 6
- Rank 8 – Team 5
Hence, the final prediction of winners according to dominance matrix D` is as follows –
Team 1 v/s Team 6 – Team 1
Team 2 v/s Team 3 – Team 2
Team 4 v/s Team 5 – Team 4
Team 7 v/s Team 8 – Team 8
Actual Winners as per matrix D were-
Team 1 v/s Team 6 – Team 1
Team 2 v/s Team 3 – Team 2
Team 4 v/s Team 5 – Team 4
Team 7 v/s Team 8 – Team 7
So the dominance matrix in this case predicted results with 75 % of accuracy.
6.Problem Outline : Use the supremacy model to make a prediction of the final ladder placings of all the teams in your sample. Compare your result with the actual ranking at the end of the season and discuss the result.
After looking at the prediction of the results of matches of final round, here is the predicted final standings of the team –
- Team 2
- Team 8
- Team 7
- Team 4
- Team 1
- Team 3
- Team 6
- Team 5
Actual Standings after all the rounds were :
- Team 2
- Team 7
- Team 8
- Team 4
- Team 1
- Team 3
- Team 6
- Team 5
Accuracy of Dominance matrix – 87.25 %
7. Problem Outline : Investigate ways of refining your dominance model which might improve the predictions made. Considering:
•different supremacy models
•adding further game outcomes to the dominance matrix
•some way of incorporating winning margins
Ways to refine the model :
- Start with first dominance matrix of first 4 round
- Update the results of dominance matrix with results of each round
- This will hypertune the dominance matrix and later help us improve the model
Ways to incorporate winning margin :
- In cricket, wins are recorded either in terms of wickets and runs.
- Incorporating margin, in cricket depends upon which team bats first and then only it can be predicted easily.
- Incorporating winning margin in football, hockey, table tennis is ease as there wins are recorded in only one manner.
- Point difference percentage can be feed into dominance matrix to incorporate winning margin.
8. Problem Outline : Using the results from above, summarise the findings. Comment on how accurately your models relate to the real situation. Discuss and limitations of the models, and the reasonableness of the solutions found.
Using Dominance matrix and 2nd, 3rd order of supremacy vector, we found the ranking of team after round 6, predicted the results of round 7, and predicted the rank of teams after round 7.
Also, we generated the ranking for the tournament after round 7.
Our model predicted the results of round 7 with 75 % accuracy and ranks of teams with 87.5 % accuracy.
Hypertuning the features for this model, is a an operation overhead, as incorporating team’s winning margin, venue, weather, player’s details is a huge complex task for this model.
Conclusion
We can conclude that the dominance matrix is a reliable technique to make predictions based on current season performances about which team might win a competition.
Get best HND Assignments with HND Assignment Help team now !!