Tuesday 24 February 2015

Lowzer's Low Country Revue: Belgium - Belgian Jupiler Pro League and its stats deficit

The Belgian Jupiler Pro League (JPL), which is Belgium's top-flight football division, has a statistics deficit when compared to other top-flight leagues in Europe. 


According to the current-season 'UEFA rankings for club competitions' the JPL is ranked 8th, seven places higher than the Dutch Eredivise. When you go the Eredivise homepage, you can quickly click your way to finding statistics on, for example, goal attempts, dribbles, successful passes and so on. Go on, look.

When you land on the JPL homepage, you have the option of clicking on 'classement' (competition), which then takes you to a page where you have the option of looking at more statistics. Unlike in the Eredivisie, however, the JPL only provides a limited number of statistics available to the public (more on this here). The categories of data on the JPL page range from timing of goals, number of clean sheets to longest losing streak etc. In short, the JPL does not have the same quantity, variety, quality and/or availability of the type of data most often used by football analysts with a fondness for "fancy" stats.

Where are the stats, then?

The good news is: there is one source of data on JPL games I have found: soccerway. It has some of the data we like to see: possession, number of corners, shots taken, etc. However, - and this is a big however - the reliability of this data is not 100%.

To illustrate my point, I think it is best to walk you through an example of what I mean. If we look at Anderlecht's 3-1 defeat of Mouscron-Péruwelz in Gameweek 1 we can first see the scorers and the time of the goals as shown by the screenshot below:

screenshot from www.soccerway.com 

All good so far. We have three goals for Anderlecht and three pieces of data on the goals. We have one goal for Mouscron-Péruwelz and one goal scorer. Good. 

Until...

screenshot from www.soccerway.com

And, here we see a problem. Anderlecht have scored three goals in this game yet according to the second screenshot they only shot two shots on target. So, either they scored one of their goals with a shot off-target (unlikely) or there is a mistake in the data.

How have I addressed this?

I tried unsuccessfully to get in touch with soccerway.com and/or people with more knowledge of its data than me. All leads have been inconclusive. Nobody seems to be able to verify the authenticity/definitions of soccerway's data on the JPL.

The problem kept coming up. I saw teams scoring 6 goals with 1 shot on target registered. I stopped in my tracks when things like this came up. So, I developed a strategy in the name of consistency to try and overcome the weird data from soccerway. I manually imported the soccerway data into a spreadsheet (which, if you really want send me an email sattherethinkin@gmail.com) and I did this according to a few rules and assumptions which I will lay out here for reference:

  • I first looked at the score of the game
  • If the goals scored were less than the shots on target I did not alter the data
    • as an example: In Gameweek 1 Standard Liège played Sporting Charleroi and the final score was 3-0 Standard. The soccerway stats also showed that Standard registered 6 shots on target, Charleroi registered 5. I assumed this was correct because it is plausible. I was not at the game and I (perhaps foolishly, perhaps not) am giving the benefit of the doubt to soccerway
  • If the goals scored were greater than the shots on target, I added the number of goals scored to the respective team's shots numbers.
    • as an example: In Gameweek 1, the example above of Anderlecht and Mouscron, I added Anderlecht's 3 goals to their shots for, and added their three goals to their shots on target. Effectively I presumed that Soccerway had just got it wrong
Why did I do it this way?

Partly because I wanted some data and partly because I wanted to do it in a way that places some faith on soccerway - their data on the EPL for example seems pretty reliable - and places some faith on logic. I am not explaining my method as completely infallible: some people may ask why I did not just add the goals scored to the shots total across-the-board. 

I am simply trying to put some context to the figures for future reference. Because of limitations I have discussed on JPL data I cannot claim 100% reliability of the data I am using, but it is at least a start. We will see what insights it can bring...

This page will be left as a kind of bookmark and I will periodically update as methods change etc. 

More soon...



No comments:

Post a Comment