Betfair’s data scientist Kaushik Lakshman has given a little insight into building predictive sports models.
In this conversation he takes us through what goes into building a predictive model for the NBA, a league and a sport that he is passionate about.
Lakshman utilises historical data and his own built statistical models and techniques to formulate predictions about NBA games which he continually updates to make it relevant as the league season progresses.
To a noob like me, what do you use to build these predictive models?
Kaushik Lakshman: “I use a couple different tools. The programming language that I use to code all my stuff is R, and R and Python seems to be the flavour for doing data science and modelling these days. There is no restriction as to why you can’t use something else but these two are open source, free and lots of documentation. I just happen to be better at R than Python.
“In terms of the modelling and algorithms it’s usually problem dependent, the data set and what I’m trying to predict.”
Talking about building models from scratch to automation, is data model building accessible for novices?
KL: “I think there might be a bit of a prerequisite in terms of trying to understand how it all works. For somebody who has no experience in this area of work to progress to building an automated model, I think the barriers to entry is reducing.
“I think there’s a few different things at play here particularly there’s a very popular Venn diagram that people tend to use to describe what makes an ideal data modeller, it’s a bit of programming, a bit of statistics, a bit of domain knowledge. The right balance of these three things together will probably help you build a new model.”
You’re doing a predictive model for the 2018-19 NBA season?
KL: “I build models as a part of my everyday job. It just so happens I have gotten my hands onto a very nice NBA data set so I thought I’d put those things together and create a model, maybe bet on it and hopefully it’s successful!”
What data points or criteria do you use to build a predictive model on the NBA?
KL: “A model is essentially a representation of factors that may influence the result. I use a little bit of my own knowledge of basketball, I’ve been playing the sport since I was 12, many years of experience, many years of understanding and intuition about the sport and I understand a few metrics. I pick up a lot of box scores to try and play around with some of the variables there and try and create more variables. An example I keep using is something as simple as assist to turnover ratio, it’s usually a hallmark of a good quality team, try and create features like that from box scores.
“I also add a few features that are schedule related, for instance in the NBA it is widely known that teams on the second game of a back to back don’t perform as well as their first game. I’ll try and code these features into the data to try to make it as general as possible but also capture as much information as well.”
A lot of things like the NBA season starting a fortnight earlier meaning less back to backs, reduced length of road trips could that make your modelling a little bit different as opposed to previous seasons?
KL: “Absolutely, and that’s the thing with sport it’s a landscape that keeps changing. It’s really trying to keep optimising the best way of doing things either for players, audiences or television revenue. There is a little bit of continual optimising but for the most part the sport essentially stays the same over the preceding five years. I won’t use data that’s older than that.
“There’s still back to backs but other things like three games in four nights doesn’t exist anymore. I can find out other ways of trying to code that information in there so it’s applicable for something from four years ago to last year.”
What kinds of outcomes have you predicted for the season ahead?
KL: “At the moment it tells me who will win each match. I’m tweaking it slightly so it can tell me something about the margin of victory. There’s a lot of bets that can be placed on the line markets so I’ll look to tweak that to tell me the likelihood of events happening in terms of line markets as well. This is all at an individual match level.”
Does Betfair get any benefit from your work, does it boost their modelling credentials?
KL: “Not necessarily. This is stuff that I do outside of work. We do have data scientists whose job is to build sports models with the primary view of trying to educate some of our customers. Betfair is a bit different from other bookmakers out there in a sense it’s an exchange and our revenues are made from commissions from winning bets. Our strategy is to try and educate our customers to make more models and get more sophisticated with their betting so they can win more. We’ve got internal data scientists as well as some providers we outsource as experts who build these tools which we showcase to our customers.”
What makes a more sophisticated bettor?
KL: “There’s a few different ways. We’ve got a wide mix of customers some are born and bred into racing families and they have a lot of knowledge of the industry growing up that they can work into betting insight. We also have a bunch of data scientists who have never watched a race in their life or even set foot at a racecourse but they purely play with the data. Then we have people who come from a financial background who essentially trade in prices.
“There’s no one solution that fits all. It’s all about finding that edge and maintaining it.”