Logistic Regression Analysis: Predicting Match Outcomes In Professional Tennis
Abstract
In this SIP project, A logistic regression model is created in Rstudio to predict the outcomes of professional tennis matches solely based on one player's performance. Statistics were collected about every grand slam tennis match in 2018. The four grand slams are The Australian Open, The French Open, Wimbledon, and The US Open. These datasets contained information such as serve and return statistics, information about the player, information about the tournament and much more. Manipulation of the data was performed to fit a dataset that would be capable of creating a logistic regression model needed in this project. All four predictor variables of the logistic regression model included serve and break point statistics. Therefore, there is some evidence to believe that these predictors are good indicators of who is likely to win a match. Given the match statistics, the logistic regression model created accurately predicts 85% of match outcomes. The cutoff point between a win or a loss is at .5.