(Legally) Exploiting Bookmaker Differences for Profit with Selenium and Pandas
Never lose a bet again by leveraging simple maths and web scraping.
Photo by Aidan Howe on Unsplash
As a student and data science hobbyist, I’m usually on the search for projects that are both interesting, and in some way beneficial to the user. A few months ago, I built myself a financial tracker to better manage my money, and after studying the output for a little while, I came to a shocking realisation: I was gonna go broke.
Not immediately of course, but living in London is taking its toll on my bank account and my cost of living is going to be even higher next year. As a result, I’m either going to have to get a job, or severely cut my spending. I don’t have time to work alongside my studies, so I thought I was going to have to give up a few of my pastimes.
Then, one morning, I read a fascinating article on Medium by Frank Andrade that introduced me to the concept of ‘surebetting’, a method of gambling in which one takes advantage of differences in how bookies calculate odds for a particular game and places opposing bets with different bookies, resulting in a guaranteed profit.
In essence, if the reciprocals of the odds on opposing bets on different sites add to less than 1, you can place bets on the opposing outcomes, and if you calculate exactly what bets to place, you can turn a profit regardless of how the match plays out.
For example, if you were betting on Andy Murray to win or lose a particular tennis match, and the odds for Murray to win on Betfair were 2.1, and the odds for him to lose on on Ladbrokes were 2.0, that’s a surebet, as:
1/2.1 + 1/2 < 1
Of course, you can’t just split the bets 50/50, but there’s a formula you can use to calculate how much to place with each bookie to turn a profit.
The article mentioned that you could scrape different bookmakers’ sites then run some maths to find these surebets, and I ended up going down a rabbit hole of Andrade’s writings and landed on some code he’d written to find surebets on football (soccer…) matches on a market called ‘Both Teams to Score’. This was cool, and I ran the code for a while, but there were a few things I wasn’t happy with:
- The program only checked one market
- It only had support for three bookmakers, and wasn’t easily expandable
- I had to run the program every time I wanted to find surebets
- I had to run an individual script for each bookmaker to collect data before running a different script to process it
So, I did the only logical thing: I rewrote the whole program from scratch, making it
completely expandable to different sports and markets by just changing a few lines in a
JSON file, ensuring that new bookmakers could be added very easily and even
parallelising it with multiprocessing
so the web scraping would complete as fast as
possible.
How does it work?
# Main function
def get_data(queue, sport, markets=None):
if markets is None:
markets = []
# Initialise the webdriver
driver = initialise_webdriver()
# Open page and accept cookies
driver.get(SITE_LINK)
accept_cookies(driver)
# Select relevant sport from list and return availability
sport_available = select_sport(driver, sport)
# If sport not available, log message and return empty dictionary
if not sport_available:
print(f'- Betfair: No live {sport.lower()} available right now.')
queue.put({})
return
# Get all the odds
try:
odds_dict = get_all_odds(driver, markets)
except TimeoutError:
print(f'- Betfair: Timed out, returning.')
queue.put({})
return
# Finished with the driver. It can sleep now
driver.quit()
Above is a snippet of my code for scraping the Betfair site, and at the time of writing, the program also scrapes Ladbrokes and bwin for data.
The program stores all the data it finds in a number of pandas DataFrames, which are then all passed back to the main script.
Following the scrape, it will find surebets using the maths mentioned above, and it can do this for both two- and three-way bets, such as ‘Both Teams to Score’ or ‘1X2’ (win, lose or draw).
# Formula to find surebets in dataframes
def find_surebets(surebet_df, market):
# Separate odds into separate columns and clean
# x column
surebet_df[[f'{market}_x_1', f'{market}_x_2']] = (
surebet_df[f'{market}_x']
.apply(utils.replace_comma)
.str
.split('\n', expand=True)
.iloc[:, 0:2]
.apply(pd.Series)
)
surebet_df[f'{market}_x_1'] = (
surebet_df[f'{market}_x_1']
.apply(utils.convert_odds)
.astype(float)
)
)
surebet_df[f'{market}_x_2'] = (
surebet_df[f'{market}_x_2']
.apply(utils.convert_odds)
.astype(float)
)
)
# y column
surebet_df[[f'{market}_y_1', f'{market}_y_2']] = (
surebet_df[f'{market}_y']
.apply(utils.replace_comma)
.str
.split('\n', expand=True)
.iloc[:, 0:2]
.apply(pd.Series)
)
surebet_df[f'{market}_y_1'] = (
surebet_df[f'{market}_y_1']
.apply(utils.convert_odds)
.astype(float)
)
)
surebet_df[f'{market}_y_2'] = (
surebet_df[f'{market}_y_2']
.apply(utils.convert_odds)
.astype(float)
)
)
# Add reciprocals of odds pairs
surebet_df[f'{market}_surebets_1'] = (
1 / surebet_df[f'{market}_x_1']) + (1 / surebet_df[f'{market}_y_2']
)
surebet_df[f'{market}_surebets_2'] = (
1 / surebet_df[f'{market}_x_2']) + (1 / surebet_df[f'{market}_y_1']
)
# Clean frame
surebet_df = surebet_df[
[
'Competitors_x',
f'{market}_x',
'Competitors_y',
f'{market}_y',
f'{market}_surebets_1',
f'{market}_surebets_2',
]
]
# Remove non-surebets and reset index
surebet_df = surebet_df[
(surebet_df[f'{market}_surebets_1'] < 1) |
(surebet_df[f'{market}_surebets_2'] < 1)
]
surebet_df.reset_index(drop=True, inplace=True)
return surebet_df
Once this is done, there’s a simple way to calculate the individual stakes required for betting, and this is done very simply using SymPy.
# Unrounded calculations for two way bets
def two_way_unrounded_calculations(odds1, odds2, total_stake):
x, y = symbols('x y')
# Equation for total stake
total_stake_eq = Eq(x + y - total_stake, 0)
# Odds multiplied by their stake must be equal
individual_stakes_eq = Eq((odds2 * y) - (odds1 * x), 0)
# Solve equations to get stakes and return
stakes = solve((total_stake_eq, individual_stakes_eq), (x, y))
stake1 = float(stakes[x])
stake2 = float(stakes[y])
return stake1, stake2
The reason the above function is called do_unrounded_calculations
is because bookies
might find it a bit suspicious if you’re placing bets of precise decimal values, so the
stakes are rounded to a base chosen by the user, and profits are calculated from there.
If you want to see the rest of the code, you can find it on my GitHub
I’ve currently written in support for football and tennis, and I do have plans to add more sports once I’ve got more sites in place. Also the code is as yet untested on MacOS and Linux as I don’t have access to those platforms. If you fancy helping out by testing/fixing stuff on those platforms, I would be more than grateful.
I would highly recommend reading the article that was the inspiration for this project, and if you also want to check out the program that got my financial ass into gear, please do that also!
Also, if you have any questions, please feel free to send them to me on GitHub or connect with me on LinkedIn!