Automated Inference regarding Goals in Elite Football Data

Title Automated Inference regarding Goals in Elite Football Data
Summary Automated Inference regarding Goals in Elite Football Data

Jordet, G., Aksum, K. M., Pedersen, D. N., Walvekar, A., Trivedi, A., McCall, A., ... & Priestley, D. (2020). Scanning, contextual factors, and association with performance in english premier league footballers: an investigation across a season. Frontiers in psychology, 11, 553813.

Decroos, T., & Davis, J. (2019, September). Player vectors: Characterizing soccer players’ playing style from match event streams. In Joint European conference on machine learning and knowledge discovery in databases (pp. 569-584). Springer, Cham.

Supervisor Andreas, Summrina, Kunru, Martin
Level Master
Status Open

Goal: To automatically infer/detect goal-related patterns in elite football data

Motivation: Football/soccer is the most popular sport in the world, with approx. 4 billion fans and a estimated market size of $1883.46 million in 2019, and also accordingly the most studied in the AI literature. Work is ongoing all over the world to detect football events in association games and leverage insights to provide enhanced performance, but there is a need for more automation (much remains hand-coded) and some uncertainty exists with regard to exactly what can be done with the data.

Challenge: Various challenges exist: Recently it's become possible to get data using computer vision not just for one's own team, but also for opposing teams, but it's not clear how to best use these data and tie them to performance, and a single game can result in very many data related to player and ball positions; i.e., various companies provide reports of games, but they are mostly just descriptive, with simple metrics; more analysis should be possible. (As well, although there is much data in general, football is a game in which few goals are scored (e.g. compared to tennis, baseball, or basketball), which is a challenge for machine learning algorithms that require many data, and has many players (11 on the pitch), resulting in high complexity.)

Approach: Both theory and practice will be explored. First, we will identify theoretical gaps in the literature related to how such data could be used. Second, we will explore the practical side, starting with using some kind of statistical/machine learning approach to try to reproduce the performance of some current hand-picked heuristic scores regarding goals. Third, we will explore more advanced kinds of inference, possibly using LSTMs or other techniques. Elite sports data obtained from the Norwegian Women's team, which was ranked 11th in the world in 2020, will be used.

Expected outcomes: a thesis report, code, video. Ideally the results should be sufficient to form the basis for a paper.