Monte Carlo Part 2

Applying Monte-Carlo(esque) Simulation to Real Life Scenario

Good Day everyone, today we are going to be building upon our last lesson and actually applying what we've learned to predicting a real life scenario for daily fantasy sports.

We will be using our tried and true SampleDataForYT dataset, but we will be slicing off the data to account for all games up until a specific date, then using a real player list from fanduel for the next day to simulate actual projections.

There are a few disclaimers though, as usual this dataset is completely useless for actual projections, we all can do much better than simply pulling in a few rolling averages for each player. Additionally, this is not entirely a real life scenario, as it is all for a previous season, and the season average is still for the larger datset as a whole, not the season average up until that point.

With that said lets jump in!

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split 
import random
dataset = pd.read_excel("SampleDataForYT.xlsx")
dataset.head()
Player Match_Up Game_Date FP Last3 Last5 Last7 SeasonAve
0 Abdel Nader OKC @ NOP 2020-02-13 19.3 11.433333 10.46 9.728571 11.038235
1 Brad Wanamaker BOS vs. LAC 2020-02-13 16.7 18.066667 17.70 21.757143 15.162745
2 Chris Paul OKC @ NOP 2020-02-13 46.6 44.333333 40.16 40.485714 36.553704
3 Daniel Theis BOS vs. LAC 2020-02-13 25.0 28.333333 22.06 23.728571 23.464583
4 Danilo Gallinari OKC @ NOP 2020-02-13 36.9 32.500000 31.56 30.914286 30.508511
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14397 entries, 0 to 14396
Data columns (total 8 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Player     14397 non-null  object        
 1   Match_Up   14397 non-null  object        
 2   Game_Date  14397 non-null  datetime64[ns]
 3   FP         14397 non-null  float64       
 4   Last3      14397 non-null  float64       
 5   Last5      14397 non-null  float64       
 6   Last7      14397 non-null  float64       
 7   SeasonAve  14397 non-null  float64       
dtypes: datetime64[ns](1), float64(5), object(2)
memory usage: 899.9+ KB

For the first deviation, we will be pulling in a player list from fanduel for January 4th, 2020. This is a completely arbitrary dataset, only used because I have it handy, and it falls within the data range that we have already been working with.

playerList = pd.read_csv("20200104.csv")
playerList.head()
Id Position First Name Nickname Last Name FPPG Played Salary Game Team Opponent Injury Indicator Injury Details Tier Unnamed: 14 Unnamed: 15
0 42349-40199 SF Giannis Giannis Antetokounmpo Antetokounmpo 57.957575 33 12000 SA@MIL MIL SA NaN NaN NaN NaN NaN
1 42349-84669 PG Luka Luka Doncic Doncic 53.653333 30 10900 CHA@DAL DAL CHA NaN NaN NaN NaN NaN
2 42349-15557 C Andre Andre Drummond Drummond 48.348485 33 9700 DET@GS DET GS NaN NaN NaN NaN NaN
3 42349-84671 PG Trae Trae Young Young 45.159374 32 9000 IND@ATL ATL IND NaN NaN NaN NaN NaN
4 42349-55062 C Nikola Nikola Jokic Jokic 41.567648 34 8900 DEN@WAS DEN WAS NaN NaN NaN NaN NaN

As we can see this is a standard player list from fanduel, with 2 extra empty columns at the end that pandas brought in that we don't neeed. To prepare for what we will need accomplish lets think about what needs to occur here.

First, we have a full season worth of data here, but we only want to utilize the games to train on up until the day before this player list. So we will need to slice out our data based on date.

Next, we want to only have one data point per player on the player list, so once we have out trained model, we can feed in one dataset with the most recent rolling window for each player associated with it.

We will be utilizing the group by method in pandas and returning the first value for each player after it has been sorted from most recent to least recent.

Lets start with sorting our box scores by game date.

dataset = dataset.sort_values('Game_Date', ascending=False)

Next up, we will be dropping all players with an injury designation of 'Out' as we know they will not be playing, and then setting the index for the playerList dataframe to the player name.

playerList = playerList[playerList['Injury Indicator'] != 'O'].set_index('Nickname')

After this, if we think about the types of data we want to pull through from the player list, it's going to be just the player and the team. So we need to go ahead and redefine the playerList datafame to only include those two values.

playerList = playerList[[ 'Team']]

Then we are going to go ahead and start working on removing box scores that happened on January 4th and after. To accomplish this we are going to create a copy of our dataset datafame and call this dataframe2, then we are going to set the index to the Game_Date field.

dataset2 = dataset.copy()
dataset2.set_index('Game_Date', inplace=True)
dataset2.head()
Player Match_Up FP Last3 Last5 Last7 SeasonAve
Game_Date
2020-02-13 Abdel Nader OKC @ NOP 19.3 11.433333 10.46 9.728571 11.038235
2020-02-13 Brad Wanamaker BOS vs. LAC 16.7 18.066667 17.70 21.757143 15.162745
2020-02-13 Chris Paul OKC @ NOP 46.6 44.333333 40.16 40.485714 36.553704
2020-02-13 Daniel Theis BOS vs. LAC 25.0 28.333333 22.06 23.728571 23.464583
2020-02-13 Danilo Gallinari OKC @ NOP 36.9 32.500000 31.56 30.914286 30.508511

Now we are going to define a new dataframe (good practice when learning so you don't accidentally overwrite anything and have to start over) and slice out all values so that we only keep values where the game_date, or now the index, is less than or equal to January 3, 2020.

dataset3 = dataset2[(dataset2.index<='2020-1-3')]
dataset3.head()
Player Match_Up FP Last3 Last5 Last7 SeasonAve
Game_Date
2020-01-03 Aaron Gordon ORL vs. MIA 29.1 25.733333 31.12 27.928571 29.914286
2020-01-03 Al Horford PHI @ HOU 19.6 15.833333 21.86 23.828571 30.150000
2020-01-03 Alex Len ATL @ BOS 26.6 26.666667 26.14 22.142857 20.597436
2020-01-03 Allen Crabbe ATL @ BOS 13.2 10.600000 13.70 11.114286 10.134286
2020-01-03 Anfernee Simons POR @ WAS 4.9 10.466667 16.32 15.100000 14.944444

Now if we sort this dataframe by player name, we can see that we have every box score from each player prior to our defined date. However, we only want the most recent value to use to form our predictions.

dataset3.sort_values('Player').head(15)
Player Match_Up FP Last3 Last5 Last7 SeasonAve
Game_Date
2020-01-03 Aaron Gordon ORL vs. MIA 29.1 25.733333 31.120000 27.928571 29.914286
2019-11-13 Aaron Gordon ORL vs. PHI 48.1 31.933333 30.600000 29.800000 29.914286
2019-12-15 Aaron Gordon ORL @ NOP 17.8 30.266667 29.880000 33.014286 29.914286
2019-12-17 Aaron Gordon ORL @ UTA 14.0 21.666667 27.280000 30.128571 29.914286
2019-11-08 Aaron Gordon ORL vs. MEM 25.8 27.666667 27.720000 27.542857 29.914286
2019-12-18 Aaron Gordon ORL @ DEN 25.9 19.233333 26.140000 27.042857 29.914286
2019-11-06 Aaron Gordon ORL @ DAL 31.3 31.166667 29.700000 27.842857 29.914286
2019-12-20 Aaron Gordon ORL @ POR 33.2 24.366667 24.820000 27.928571 29.914286
2019-11-05 Aaron Gordon ORL @ OKC 25.9 27.166667 27.140000 26.785714 29.914286
2019-11-02 Aaron Gordon ORL vs. DEN 36.3 30.433333 27.540000 26.933333 29.914286
2019-12-23 Aaron Gordon ORL vs. CHI 45.2 34.766667 27.220000 29.871429 29.914286
2019-11-01 Aaron Gordon ORL vs. MIL 19.3 24.500000 25.060000 25.060000 29.914286
2019-10-30 Aaron Gordon ORL vs. NYK 35.7 27.366667 26.500000 26.500000 29.914286
2019-12-27 Aaron Gordon ORL vs. PHI 35.7 38.033333 30.800000 29.285714 29.914286
2019-10-28 Aaron Gordon ORL @ TOR 18.5 23.433333 23.433333 23.433333 29.914286

Fortunately, the season average is going to be the same for each record of each player since we didn't recalculate it to be the season average up until this point (in real life this would not be the case, however we also would not have to be slicing off data like this so it's fine for this example) so we can go ahead and group by player and season average, keeping only the first record since we are already sorted by most recent game date to oldest.

dataset3 = dataset3.groupby(["Player","SeasonAve"]).first()
dataset3.sort_values('Player').head(25)
Match_Up FP Last3 Last5 Last7
Player SeasonAve
Aaron Gordon 29.914286 ORL vs. MIA 29.1 25.733333 31.12 27.928571
Aaron Holiday 19.195918 IND vs. DEN 19.0 24.966667 29.64 27.785714
Abdel Nader 11.038235 OKC vs. DAL 5.0 6.000000 8.58 9.000000
Al Horford 30.150000 PHI @ HOU 19.6 15.833333 21.86 23.828571
Al-Farouq Aminu 16.170588 ORL vs. TOR 12.9 11.866667 13.08 16.385714
Alec Burks 28.585714 GSW @ MIN 11.1 26.633333 25.08 31.271429
Alex Caruso 14.219149 LAL vs. DAL 12.2 14.666667 12.38 13.871429
Alex Len 20.597436 ATL @ BOS 26.6 26.666667 26.14 22.142857
Allen Crabbe 10.134286 ATL @ BOS 13.2 10.600000 13.70 11.114286
Allonzo Trier 9.850000 NYK @ WAS 0.2 7.366667 9.30 9.514286
Andre Drummond 48.035294 DET @ LAC 32.4 48.866667 43.38 45.028571
Andrew Wiggins 36.664444 MIN @ SAC 37.5 38.200000 37.12 38.557143
Anfernee Simons 14.944444 POR @ WAS 4.9 10.466667 16.32 15.100000
Anthony Davis 52.032609 LAL vs. NOP 72.1 53.033333 49.22 52.328571
Anthony Tolliver 10.405714 POR @ WAS 3.4 9.200000 9.32 9.557143
Aron Baynes 22.363636 PHX vs. NYK 39.4 28.066667 25.54 24.200000
Austin Rivers 15.638000 HOU vs. PHI 6.0 9.533333 12.48 12.400000
Avery Bradley 14.587179 LAL vs. NOP 12.9 18.133333 13.54 11.357143
Bam Adebayo 40.303704 MIA @ ORL 34.5 33.533333 33.64 37.128571
Ben McLemore 16.598077 HOU vs. PHI 3.2 12.500000 11.40 12.371429
Ben Simmons 43.598113 PHI @ HOU 79.1 53.833333 52.44 51.628571
Bismack Biyombo 18.823256 CHA @ CLE 15.2 17.833333 22.72 19.685714
Blake Griffin 26.127778 DET @ SAS 17.4 25.033333 21.58 19.900000
Bobby Portis 19.109091 NYK @ PHX 26.8 24.033333 20.10 24.342857
Bogdan Bogdanovic 24.655814 SAC vs. MEM 19.4 18.666667 24.94 25.457143

Now we can see that we no longer have any duplicates for each player, so we will go ahead and reset the index, and then set the index back to the players name.

dataset3 = dataset3.reset_index().set_index('Player')
dataset3.head(25)
SeasonAve Match_Up FP Last3 Last5 Last7
Player
Aaron Gordon 29.914286 ORL vs. MIA 29.1 25.733333 31.12 27.928571
Aaron Holiday 19.195918 IND vs. DEN 19.0 24.966667 29.64 27.785714
Abdel Nader 11.038235 OKC vs. DAL 5.0 6.000000 8.58 9.000000
Al Horford 30.150000 PHI @ HOU 19.6 15.833333 21.86 23.828571
Al-Farouq Aminu 16.170588 ORL vs. TOR 12.9 11.866667 13.08 16.385714
Alec Burks 28.585714 GSW @ MIN 11.1 26.633333 25.08 31.271429
Alex Caruso 14.219149 LAL vs. DAL 12.2 14.666667 12.38 13.871429
Alex Len 20.597436 ATL @ BOS 26.6 26.666667 26.14 22.142857
Allen Crabbe 10.134286 ATL @ BOS 13.2 10.600000 13.70 11.114286
Allonzo Trier 9.850000 NYK @ WAS 0.2 7.366667 9.30 9.514286
Andre Drummond 48.035294 DET @ LAC 32.4 48.866667 43.38 45.028571
Andrew Wiggins 36.664444 MIN @ SAC 37.5 38.200000 37.12 38.557143
Anfernee Simons 14.944444 POR @ WAS 4.9 10.466667 16.32 15.100000
Anthony Davis 52.032609 LAL vs. NOP 72.1 53.033333 49.22 52.328571
Anthony Tolliver 10.405714 POR @ WAS 3.4 9.200000 9.32 9.557143
Aron Baynes 22.363636 PHX vs. NYK 39.4 28.066667 25.54 24.200000
Austin Rivers 15.638000 HOU vs. PHI 6.0 9.533333 12.48 12.400000
Avery Bradley 14.587179 LAL vs. NOP 12.9 18.133333 13.54 11.357143
Bam Adebayo 40.303704 MIA @ ORL 34.5 33.533333 33.64 37.128571
Ben McLemore 16.598077 HOU vs. PHI 3.2 12.500000 11.40 12.371429
Ben Simmons 43.598113 PHI @ HOU 79.1 53.833333 52.44 51.628571
Bismack Biyombo 18.823256 CHA @ CLE 15.2 17.833333 22.72 19.685714
Blake Griffin 26.127778 DET @ SAS 17.4 25.033333 21.58 19.900000
Bobby Portis 19.109091 NYK @ PHX 26.8 24.033333 20.10 24.342857
Bogdan Bogdanovic 24.655814 SAC vs. MEM 19.4 18.666667 24.94 25.457143

Now we are going to join our two dataframes together, joining the most recent predictive data to our player list data. We will be using a left join because our player list is the left dataframe in the way we have it set up. You could use a right join if you prefer by setting it up as dataset3.join(playerList, how='right').

We don't need any further parameters because the player name is the index for both dataframes, if this were not the case we would need to specify which columns to join on.

This will move our dataset3 columns over into the playerList dataframe for every record that has a matching player name, if a player on the playerList dataframe does not match up to a player on the dataset3 dataframe, it will simply leave those columns blank. If you only want to include records where the player names match, you would want to use an inner join rather than a left/right join.

playerList2 = playerList.join(dataset3, how='left')
playerList2.head(30)
Team SeasonAve Match_Up FP Last3 Last5 Last7
Nickname
Giannis Antetokounmpo MIL 57.629167 MIL vs. MIN 60.4 50.233333 53.24 57.100000
Luka Doncic DAL 53.890698 DAL vs. BKN 56.1 49.633333 53.56 54.014286
Andre Drummond DET 48.035294 DET @ LAC 32.4 48.866667 43.38 45.028571
Trae Young ATL 47.630000 ATL @ BOS 43.0 38.233333 45.46 47.257143
Nikola Jokic DEN 45.792727 DEN @ IND 28.4 30.200000 38.52 40.385714
Rudy Gobert UTA 41.232692 UTA @ CHI 38.4 36.766667 41.62 43.214286
Bradley Beal WAS 44.806522 WAS vs. ORL 37.3 36.000000 44.60 46.071429
Brandon Ingram NO 41.010638 NOP @ LAL 33.1 41.266667 43.38 44.171429
LaMarcus Aldridge SA 37.226000 SAS vs. OKC 44.4 46.233333 47.98 44.814286
Jrue Holiday NO 39.428261 NOP @ LAL 27.6 36.200000 39.22 37.685714
Domantas Sabonis IND 41.486538 IND vs. DEN 49.3 44.666667 39.86 42.314286
Nikola Vucevic ORL 41.504545 ORL vs. MIA 44.7 41.933333 43.46 45.371429
Jayson Tatum BOS 39.728000 BOS vs. ATL 27.3 29.600000 33.32 39.357143
Zach LaVine CHI 39.712727 CHI vs. UTA 40.3 35.666667 38.22 40.357143
Shai Gilgeous-Alexander OKC 35.169091 OKC @ SAS 48.9 47.600000 43.04 43.800000
DeMar DeRozan SA 38.976923 SAS vs. OKC 36.1 41.700000 42.14 37.942857
John Collins ATL 39.990000 ATL @ BOS 26.1 36.466667 39.72 39.985714
Donovan Mitchell UTA 37.175472 UTA @ CHI 27.3 36.766667 37.90 39.042857
Devonte' Graham CHA 34.744444 CHA @ CLE 33.9 34.666667 38.02 35.485714
Jaylen Brown BOS 32.925000 BOS vs. ATL 39.0 38.100000 37.36 39.300000
De'Aaron Fox SAC 38.942857 SAC vs. MEM 65.3 43.000000 43.48 40.542857
Khris Middleton MIL 35.570213 MIL vs. MIN 31.6 37.100000 40.00 39.000000
Chris Paul OKC 36.553704 OKC @ SAS 38.1 39.800000 41.50 37.685714
Gordon Hayward BOS 33.305405 BOS vs. ATL 29.7 34.466667 33.28 30.771429
Kevin Love CLE 34.191304 CLE vs. CHA 35.6 36.400000 36.68 35.728571
Richaun Holmes SAC 30.900000 SAC vs. MEM 31.4 35.200000 36.66 37.114286
Will Barton DEN 31.827083 DEN @ IND 35.5 35.633333 35.24 35.571429
Draymond Green GS 29.602439 GSW @ MIN 16.6 27.233333 30.56 31.128571
Buddy Hield SAC 32.250000 SAC vs. MEM 39.9 38.533333 34.18 29.314286
Jamal Murray DEN 33.845455 DEN @ IND 41.5 32.600000 28.72 29.400000

Once our join has succeeded, we will go ahead and only keep the columns we will feed into our random forest, and then we are going to take a peak at our data sorted to see if we have any blank rows for mismatched names.

playerList2 = playerList2[['Last3', 'Last5', 'Last7', 'SeasonAve']]
playerList2.sort_values('Last7')
Last3 Last5 Last7 SeasonAve
Nickname
Frank Jackson 2.566667 4.32 3.428571 10.888571
Jordan Poole 4.200000 6.52 5.014286 14.668750
Jacob Evans 8.600000 8.08 6.542857 10.250000
Kenrich Williams 4.766667 5.36 6.900000 15.900000
Nicolo Melli 3.800000 4.66 7.414286 14.125641
... ... ... ... ...
Frank Mason NaN NaN NaN NaN
Wenyen Gabriel NaN NaN NaN NaN
Mike Muscala NaN NaN NaN NaN
Drew Eubanks NaN NaN NaN NaN
Cristiano Felicio NaN NaN NaN NaN

264 rows × 4 columns

We do indeed have some blank records. This will not be able to pushed throught he random forest algorithm as it stands. We have a couple options, we can either replace all the NaNs with a numerical value, such as a 0 or the average of the column, or simply drop the records with a NaN in them. For some datasets it may make sense to populate with a numerical value, however in this instance we are just going to drop any row with a NaN to prepare for the random forest.

playerList2.dropna(inplace=True)
playerList2.sort_values('Last7')
Last3 Last5 Last7 SeasonAve
Nickname
Frank Jackson 2.566667 4.32 3.428571 10.888571
Jordan Poole 4.200000 6.52 5.014286 14.668750
Jacob Evans 8.600000 8.08 6.542857 10.250000
Kenrich Williams 4.766667 5.36 6.900000 15.900000
Nicolo Melli 3.800000 4.66 7.414286 14.125641
... ... ... ... ...
Nikola Vucevic 41.933333 43.46 45.371429 41.504545
Bradley Beal 36.000000 44.60 46.071429 44.806522
Trae Young 38.233333 45.46 47.257143 47.630000
Luka Doncic 49.633333 53.56 54.014286 53.890698
Giannis Antetokounmpo 50.233333 53.24 57.100000 57.629167

174 rows × 4 columns

Alright, now it may start to get a little confusing, so we are going to recap the process we are about to begin real quick.

We need the full dataset (minus games happening on or after January 4th) to train our model on, but we will be calculating predictions on the datset we just finished manipulating. So we will create our Features and Labels numpy arrays to break out into our training and testing datasets off of our initial dataset, but once we get into our predictions stage we will be calling the dataframe we just created.

So, on to creating our dataframe to feed our train and test datasets.

dataset4 = dataset[['Player', 'Last3', 'Last5', 'Last7','SeasonAve', 'FP']].set_index('Player')
#dataset.head()
featureNames = ['Last3', 'Last5', 'Last7', 'SeasonAve']
labelName = ['FP']
dfFeatures = dataset4[featureNames]
#dfFeatures.head()
dfLabels = dataset4[labelName]
#dfLabels.head()
labels = np.array(dfLabels)
features = np.array(dfFeatures)

Next up, we will be running a similar for loop as our last lesson, however we will be using our new playerList dataset to drive the predictions rather than our Feature dataset. The following method is how you would set up your for loop if you wanted to train and test your model every. single. time. that you ran your code. This is not the most efficient, but it is certainly an option. Generally I like to retrain my models after a couple weeks, or if a significant change in data has occurred, like at the beginning of the season when every new datapoint significantly alters the overall data spread.

##WHEN YOU WANT TO RETRAIN YOUR MODELS EVERY SINGLE CONTEST DAY

compareDF = playerList2.copy()
for i in range(0,10):
    n = random.randint(0,100)
    train, test, trainLabels, testLabels = train_test_split(features, labels, test_size=(0.4), random_state = n)
    reg = RandomForestRegressor(random_state=n)
    reg.fit(train, trainLabels)
    predictions = reg.predict(playerList2)
    compareDF[f'predict_{n}'] = predictions
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
compareDF.head(15)
Last3 Last5 Last7 SeasonAve predict_68 predict_96 predict_89 predict_29 predict_98 predict_92 predict_18 predict_23 predict_90 predict_86
Nickname
Giannis Antetokounmpo 50.233333 53.24 57.100000 57.629167 48.144 51.306 57.146 52.281 57.86900 57.228 56.066 51.892 57.142 55.500
Luka Doncic 49.633333 53.56 54.014286 53.890698 49.734 51.617 45.863 53.768 47.28100 55.586 51.688 53.818 53.765 54.411
Andre Drummond 48.866667 43.38 45.028571 48.035294 37.932 41.594 39.394 38.562 47.09275 44.612 42.832 37.646 37.518 45.959
Trae Young 38.233333 45.46 47.257143 47.630000 41.868 40.024 41.513 35.700 39.10100 42.033 42.509 42.135 36.573 41.767
Nikola Jokic 30.200000 38.52 40.385714 45.792727 29.098 28.041 30.592 31.797 29.65100 29.863 28.514 29.532 28.612 28.342
Rudy Gobert 36.766667 41.62 43.214286 41.232692 39.017 37.541 36.724 35.832 37.04000 37.445 37.841 36.168 35.652 35.756
Bradley Beal 36.000000 44.60 46.071429 44.806522 38.560 39.879 35.365 38.174 40.55200 37.772 40.619 38.923 42.808 39.374
Brandon Ingram 41.266667 43.38 44.171429 41.010638 33.768 34.803 34.729 36.160 34.60700 35.315 36.080 34.915 38.133 33.720
LaMarcus Aldridge 46.233333 47.98 44.814286 37.226000 41.126 44.250 42.822 43.620 45.51000 45.242 45.381 43.653 46.194 44.115
Jrue Holiday 36.200000 39.22 37.685714 39.428261 35.170 32.030 34.808 31.468 29.57300 30.936 35.881 28.728 36.331 37.532
Domantas Sabonis 44.666667 39.86 42.314286 41.486538 45.164 48.549 46.507 48.401 48.42900 48.558 48.339 47.454 48.641 47.586
Nikola Vucevic 41.933333 43.46 45.371429 41.504545 42.217 36.073 37.675 41.312 41.48200 42.468 42.038 38.837 41.434 44.360
Jayson Tatum 29.600000 33.32 39.357143 39.728000 27.192 28.217 31.819 32.099 27.76600 27.866 27.914 30.589 28.662 28.145
Zach LaVine 35.666667 38.22 40.357143 39.712727 40.118 41.449 42.567 42.143 38.98800 39.311 41.457 42.520 45.305 38.907
Shai Gilgeous-Alexander 47.600000 43.04 43.800000 35.169091 49.146 46.772 48.614 47.751 45.73300 50.535 49.171 49.988 49.694 47.597

Once our model iterations have finished running, we can go ahead and take a look at the results. You'll likely notice that these are not nearly as similar to each other as our models last time. This is because none of these records were included in the training dataset, and as you can see, it makes a pretty big difference. Without the training dataset being included in the prediction dataset, there will be quite a bit more variability in predictions. And that's even with only including this small sample size of data.

dataset.sort_values('Game_Date').head()
Player Match_Up Game_Date FP Last3 Last5 Last7 SeasonAve
14396 Troy Daniels LAL @ LAC 2019-10-22 6.5 6.5 6.5 6.5 7.640625
14374 Jrue Holiday NOP @ TOR 2019-10-22 27.8 27.8 27.8 27.8 39.428261
14373 Josh Hart NOP @ TOR 2019-10-22 30.5 30.5 30.5 30.5 23.997917
14372 Jahlil Okafor NOP @ TOR 2019-10-22 12.4 12.4 12.4 12.4 17.424000
14371 JaVale McGee LAL @ LAC 2019-10-22 11.4 11.4 11.4 11.4 20.578431

Now just for comparison to see how accurate it really is or isn't, we are going to create a new dataframe copy of our original dataset, and only slice the games these models predicted for, the games occuring on January 4th 2020.

datasetNew = dataset.copy()
datasetNew.set_index('Game_Date', inplace=True)
datasetNew = datasetNew[(datasetNew.index=='2020-1-4')]
datasetNew = datasetNew[['FP', 'Player']].set_index('Player')
datasetNew.head()
FP
Player
Aaron Gordon 31.4
Aaron Holiday 32.8
Alec Burks 43.9
Alex Len 39.3
Allen Crabbe 6.1

Now that we have the actual fantasy points scored on that day, lets go ahead and join our datset with all of our predictions to it. Notice here how we are using a right join, because we only want to look at records that we actually have a prediction for in our compareDF.

datasetNew = datasetNew.join(compareDF, how='right').drop(['Last3', 'Last5', 'Last7', 'SeasonAve'], axis=1)
datasetNew.head(25)
FP predict_68 predict_96 predict_89 predict_29 predict_98 predict_92 predict_18 predict_23 predict_90 predict_86
Nickname
Giannis Antetokounmpo 46.1 48.144 51.306 57.146 52.281 57.86900 57.228 56.066 51.892 57.142 55.500
Luka Doncic 69.4 49.734 51.617 45.863 53.768 47.28100 55.586 51.688 53.818 53.765 54.411
Andre Drummond 51.1 37.932 41.594 39.394 38.562 47.09275 44.612 42.832 37.646 37.518 45.959
Trae Young 59.8 41.868 40.024 41.513 35.700 39.10100 42.033 42.509 42.135 36.573 41.767
Nikola Jokic 33.0 29.098 28.041 30.592 31.797 29.65100 29.863 28.514 29.532 28.612 28.342
Rudy Gobert 37.4 39.017 37.541 36.724 35.832 37.04000 37.445 37.841 36.168 35.652 35.756
Bradley Beal NaN 38.560 39.879 35.365 38.174 40.55200 37.772 40.619 38.923 42.808 39.374
Brandon Ingram 31.5 33.768 34.803 34.729 36.160 34.60700 35.315 36.080 34.915 38.133 33.720
LaMarcus Aldridge 31.0 41.126 44.250 42.822 43.620 45.51000 45.242 45.381 43.653 46.194 44.115
Jrue Holiday 35.5 35.170 32.030 34.808 31.468 29.57300 30.936 35.881 28.728 36.331 37.532
Domantas Sabonis 45.2 45.164 48.549 46.507 48.401 48.42900 48.558 48.339 47.454 48.641 47.586
Nikola Vucevic 45.6 42.217 36.073 37.675 41.312 41.48200 42.468 42.038 38.837 41.434 44.360
Jayson Tatum 48.4 27.192 28.217 31.819 32.099 27.76600 27.866 27.914 30.589 28.662 28.145
Zach LaVine 53.9 40.118 41.449 42.567 42.143 38.98800 39.311 41.457 42.520 45.305 38.907
Shai Gilgeous-Alexander 32.4 49.146 46.772 48.614 47.751 45.73300 50.535 49.171 49.988 49.694 47.597
DeMar DeRozan 40.3 43.944 44.242 41.317 40.031 37.95900 39.348 38.037 39.341 39.574 38.543
John Collins NaN 39.035 38.781 41.282 39.138 30.26600 30.218 30.925 38.464 38.677 40.370
Donovan Mitchell 45.6 31.106 30.279 31.191 30.979 34.50400 37.221 34.993 30.174 34.680 30.527
Devonte' Graham 51.9 35.197 34.223 36.450 34.733 34.64500 34.877 35.350 34.472 33.593 37.336
Jaylen Brown 27.9 37.393 35.563 35.130 35.298 38.14000 36.627 33.855 34.418 34.936 35.874
De'Aaron Fox 28.1 57.422 51.091 56.294 57.122 42.39900 57.492 56.329 57.131 47.274 41.196
Khris Middleton 29.5 37.125 33.836 37.816 33.685 33.33900 33.149 33.111 33.251 32.625 35.212
Chris Paul 22.8 38.434 41.518 37.202 38.693 38.52500 38.426 38.709 37.536 37.914 35.999
Gordon Hayward 34.5 31.466 30.770 34.276 36.297 31.07300 32.245 30.507 35.598 35.456 36.008
Kevin Love 14.1 35.106 34.257 35.201 32.794 35.41200 34.797 35.469 36.260 35.687 34.653

Now I won't bother going through and re-running the entire thing to calculate the error for the models instead of the scores, because in this realistic-ish scenario you won't have anything to compare to. However, if you notice there are a few records without an actual FP score, these are players that did not have an Out injury designation at the time the player list was pulled, but ended up not playing.

From here, you can run any type of analysis you would like, but in the next lesson we will be going over how to run the same setup without having to train your datasets every single time.

Previous
Previous

Monte Carlo Part 3

Next
Next

Monte Carlo-esque Simulation Introduction