Today we are going to be introducing how to calculate and use rolling averages in our Pandas Dataframes. To do this we are going to need a large dataset of box scores over at least a few weeks worth of games, if you do not have a dataset to accomplish this, go ahead and check out my webscraping tutorials for instructions on how to obtain one.

Web Scraping Playlist

Now that we have a dataset to use, I will be using a dataset containing all NBA games from the 2019-2020 season up until 02/13/2020. You don’t have to use the same dataset, but just be aware if you are using a different one then the numbers shown here will likely differ from your own.

Okay then, first things first we need to import pandas and load our dataset into a dataframe.

import pandas as pd

df = pd.read_excel(r"insert your dataset file here")

Next we are just going to use the head function to take a quick peek at our data and make sure it looks fine. Also, this is a dataset we previously used in a Pandas tutorial, so the fanduel fantasy points scored are already incorporated. If you are unsure about how to perform this calculation, please check out my previous Pandas tutorials for instructions on how to calculate this.

Previous Pandas Tutorial

df.head()

	Player	Match_Up	Game_Date	MIN	PTS	REB	AST	STL	BLK	TOV	FP
0	Terrance Ferguson	OKC @ NOP	02/13/2020	20	3	1	2	1	0	0	10.2
1	Danilo Gallinari	OKC @ NOP	02/13/2020	33	29	2	1	2	0	2	36.9
2	Gordon Hayward	BOS vs. LAC	02/13/2020	48	21	13	4	1	1	0	48.6
3	Enes Kanter	BOS vs. LAC	02/13/2020	20	8	8	2	0	1	2	21.6
4	Jaxson Hayes	NOP vs. OKC	02/13/2020	8	4	2	3	0	0	0	10.9

Now that we’ve taken a look at the data, we are going to need to convert the Game_Date field to an actual datetime data type for Pandas to be able to sort the games by date properly. Pandas by default reads this field in as a generic string value, so when we try to sort the games to get a proper rolling average, the games will just be sorted January-December or December-January rather than October-February.

df['Game_Date'] = df['Game_Date'].astype('datetime64')

Due to the nature of the rolling function, we will want to resort our data first by player name, then by game date in order to get the correct results. This is because the rolling function will look to the index value to ‘order’ the player fp scores, so even if we are looking at the games in chronological order, if the index contradicts this, then our rolling average window will be inaccurate. To accomplish this we will make use of the ‘ignore_index’ parameter of the sort function. This will sort the data as we see fit, then reassign the index values based on this sorted function.

df = df.sort_values(['Player', 'Game_Date'], ascending= [True, True], ignore_index=True)

And once again, we will utilize the head function to take a peak at our data to make sure it is making sense, and doing what we expect it to do.

df.head(15)

	Player	Match_Up	Game_Date	MIN	PTS	REB	AST	STL	BLK	TOV	FP
0	Aaron Gordon	ORL vs. CLE	2019-10-23	29	9	2	3	2	1	1	23.9
1	Aaron Gordon	ORL @ ATL	2019-10-26	30	10	7	3	2	0	1	27.9
2	Aaron Gordon	ORL @ TOR	2019-10-28	30	2	5	5	0	2	3	18.5
3	Aaron Gordon	ORL vs. NYK	2019-10-30	28	15	11	1	1	1	0	35.7
4	Aaron Gordon	ORL vs. MIL	2019-11-01	28	11	4	3	0	0	1	19.3
5	Aaron Gordon	ORL vs. DEN	2019-11-02	32	21	9	3	0	0	0	36.3
6	Aaron Gordon	ORL @ OKC	2019-11-05	29	15	7	3	0	0	2	25.9
7	Aaron Gordon	ORL @ DAL	2019-11-06	37	23	4	1	2	0	4	31.3
8	Aaron Gordon	ORL vs. MEM	2019-11-08	32	17	4	2	1	0	2	25.8
9	Aaron Gordon	ORL vs. IND	2019-11-10	30	13	2	3	0	1	1	21.9
10	Aaron Gordon	ORL vs. PHI	2019-11-13	36	18	13	7	1	1	2	48.1
11	Aaron Gordon	ORL vs. SAS	2019-11-15	36	14	8	4	0	1	2	30.6
12	Aaron Gordon	ORL vs. WAS	2019-11-17	37	13	9	2	0	0	0	26.8
13	Aaron Gordon	ORL @ TOR	2019-11-20	12	2	2	0	0	0	0	4.4
14	Aaron Gordon	ORL vs. TOR	2019-11-29	32	8	8	2	2	1	0	29.6

Now we can see that the sort function worked as expected, with the data being sorted first by player name, A->Z, then by game date, Old->New.

We are ready now to employ the rolling function. To do this we are going to build a little more complex of a command to accomplish this. First off we need to utilize the groupby function to group the boxscores by player to ensure we are only taking into account games that each player played in their respective rolling average calculations, the we will add the rolling function, specifying the number of games we want to incorporate as the window parameter, and the minimum periods necessary to generate a score. I personally like to use a min_period of 1 so that there is always a value, even if the player has played less than 3 games in the time frame being used in the dataset. Next we direct the rolling function to the FP column, and tell the function that the mean is what we want calculated in the rolling function. After this, we can reset the index and drop the extra ‘player’ column from grouping the data.

df['Last3']= df.groupby('Player',sort=False).rolling(window=3, min_periods=1).FP.mean().reset_index(drop='Player')

Once the rolling average for the past 3 games has been calculated we can go ahead and re-sort the data by game date from newest to oldest, set the index back to player name and take a look at the results using the head function.

df = df.sort_values('Game_Date', ascending=False)
df = df.set_index('Player')
df.head(15)

	Match_Up	Game_Date	MIN	PTS	REB	AST	STL	BLK	TOV	FP	Last3
Player
Jaxson Hayes	NOP vs. OKC	2020-02-13	8	4	2	3	0	0	0	10.9	8.633333
Derrick Favors	NOP vs. OKC	2020-02-13	20	8	9	2	0	0	1	20.8	23.466667
Lou Williams	LAC @ BOS	2020-02-13	44	35	6	8	1	2	3	60.2	42.033333
Terrance Ferguson	OKC @ NOP	2020-02-13	20	3	1	2	1	0	0	10.2	5.200000
Ivica Zubac	LAC @ BOS	2020-02-13	14	7	4	1	0	1	1	15.3	22.566667
Nerlens Noel	OKC @ NOP	2020-02-13	11	2	3	0	0	0	0	5.6	6.600000
Steven Adams	OKC @ NOP	2020-02-13	28	11	11	2	1	3	1	38.2	31.800000
Kemba Walker	BOS vs. LAC	2020-02-13	46	19	9	7	0	0	1	39.3	32.133333
Nicolo Melli	NOP vs. OKC	2020-02-13	24	6	3	0	0	3	2	16.6	21.533333
E'Twaun Moore	NOP vs. OKC	2020-02-13	20	2	4	2	1	1	1	14.8	12.600000
Danilo Gallinari	OKC @ NOP	2020-02-13	33	29	2	1	2	0	2	36.9	32.500000
Josh Hart	NOP vs. OKC	2020-02-13	27	7	8	3	0	0	1	20.1	23.566667
JJ Redick	NOP vs. OKC	2020-02-13	31	24	2	3	2	0	2	34.9	33.033333
Kawhi Leonard	LAC @ BOS	2020-02-13	46	28	11	4	1	1	3	50.2	48.066667
Daniel Theis	BOS vs. LAC	2020-02-13	24	12	5	0	1	2	2	25.0	28.333333

Now that we can see the calculated rolling averages, lets go ahead and check a player on the list here, since it’s hard to verify without seeing multiple games from a single player. I will be choosing Terrance Ferguson from the above table for demonstration. You can use the .loc function to grab a subset of data from the dataframe.

dfT = df.loc['Terrance Ferguson']

dfT.head(15)

	Match_Up	Game_Date	MIN	PTS	REB	AST	STL	BLK	TOV	FP	Last3
Player
Terrance Ferguson	OKC @ NOP	2020-02-13	20	3	1	2	1	0	0	10.2	5.200000
Terrance Ferguson	OKC vs. SAS	2020-02-11	18	0	0	1	0	0	0	1.5	4.933333
Terrance Ferguson	OKC vs. BOS	2020-02-09	26	0	2	1	0	0	0	3.9	4.833333
Terrance Ferguson	OKC vs. DET	2020-02-07	14	1	2	2	1	0	0	9.4	3.533333
Terrance Ferguson	OKC vs. CLE	2020-02-05	11	0	1	0	0	0	0	1.2	5.233333
Terrance Ferguson	OKC vs. MIA	2020-01-17	7	0	0	0	0	0	0	0.0	11.200000
Terrance Ferguson	OKC vs. TOR	2020-01-15	21	2	0	1	3	1	1	14.5	14.600000
Terrance Ferguson	OKC @ MIN	2020-01-13	28	14	3	1	0	0	0	19.1	14.166667
Terrance Ferguson	OKC vs. LAL	2020-01-11	24	0	1	2	1	1	0	10.2	11.300000
Terrance Ferguson	OKC vs. HOU	2020-01-09	22	6	1	2	0	2	3	13.2	10.900000
Terrance Ferguson	OKC @ BKN	2020-01-07	40	7	0	1	0	1	1	10.5	11.400000
Terrance Ferguson	OKC @ PHI	2020-01-06	21	6	0	0	1	0	0	9.0	11.966667
Terrance Ferguson	OKC @ CLE	2020-01-04	26	13	1	1	0	0	1	14.7	9.366667
Terrance Ferguson	OKC @ SAS	2020-01-02	29	2	1	0	2	1	0	12.2	5.833333
Terrance Ferguson	OKC vs. DAL	2019-12-31	18	0	1	0	0	0	0	1.2	3.766667

Reviewing the most recent 15 games the calculation appears to have worked accordingly, but lets double check the first 15 games as well to see if the min_periods=1 parameter worked as well.

dfT.tail(15)

	Match_Up	Game_Date	MIN	PTS	REB	AST	STL	BLK	TOV	FP	Last3
Player
Terrance Ferguson	OKC @ POR	2019-11-27	25	9	1	0	2	0	1	15.2	14.866667
Terrance Ferguson	OKC @ GSW	2019-11-25	27	6	0	1	0	2	0	13.5	12.700000
Terrance Ferguson	OKC vs. LAL	2019-11-22	32	11	2	3	0	0	2	15.9	14.333333
Terrance Ferguson	OKC @ LAL	2019-11-19	31	3	1	1	1	0	0	8.7	17.766667
Terrance Ferguson	OKC @ LAC	2019-11-18	35	5	7	0	2	0	1	18.4	18.433333
Terrance Ferguson	OKC vs. PHI	2019-11-15	39	19	1	2	1	0	0	26.2	16.433333
Terrance Ferguson	OKC vs. MIL	2019-11-10	24	5	1	1	0	1	0	10.7	11.900000
Terrance Ferguson	OKC vs. GSW	2019-11-09	20	8	2	2	0	0	1	12.4	10.500000
Terrance Ferguson	OKC @ SAS	2019-11-07	21	6	3	0	0	1	0	12.6	9.900000
Terrance Ferguson	OKC vs. ORL	2019-11-05	23	5	0	1	0	0	0	6.5	8.666667
Terrance Ferguson	OKC vs. POR	2019-10-30	35	2	3	0	1	1	1	10.6	13.300000
Terrance Ferguson	OKC @ HOU	2019-10-28	19	0	2	1	2	0	1	8.9	12.033333
Terrance Ferguson	OKC vs. GSW	2019-10-27	30	13	2	4	0	0	1	20.4	10.700000
Terrance Ferguson	OKC vs. WAS	2019-10-25	22	0	4	0	0	1	1	6.8	5.850000
Terrance Ferguson	OKC @ UTA	2019-10-23	24	2	2	1	0	0	1	4.9	4.900000

Now looking at the first 15 games, everything seems to be working as expected, with the first game scoring the same as the Last3 variable as it should since it was the only game in that window, then the second game being an average of the first 2, and finally the 3 game average beginning after the 3rd game of the season.

Analyzing Data with Pandas

High Ceilings And Floors Part One

Introduction to Pandas and Dataframes