Monte Carlo Part 2

Oct 2

Applying Monte-Carlo(esque) Simulation to Real Life Scenario

Good Day everyone, today we are going to be building upon our last lesson and actually applying what we've learned to predicting a real life scenario for daily fantasy sports.

We will be using our tried and true SampleDataForYT dataset, but we will be slicing off the data to account for all games up until a specific date, then using a real player list from fanduel for the next day to simulate actual projections.

There are a few disclaimers though, as usual this dataset is completely useless for actual projections, we all can do much better than simply pulling in a few rolling averages for each player. Additionally, this is not entirely a real life scenario, as it is all for a previous season, and the season average is still for the larger datset as a whole, not the season average up until that point.

With that said lets jump in!

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split 
import random

dataset = pd.read_excel("SampleDataForYT.xlsx")
dataset.head()

	Player	Match_Up	Game_Date	FP	Last3	Last5	Last7	SeasonAve
0	Abdel Nader	OKC @ NOP	2020-02-13	19.3	11.433333	10.46	9.728571	11.038235
1	Brad Wanamaker	BOS vs. LAC	2020-02-13	16.7	18.066667	17.70	21.757143	15.162745
2	Chris Paul	OKC @ NOP	2020-02-13	46.6	44.333333	40.16	40.485714	36.553704
3	Daniel Theis	BOS vs. LAC	2020-02-13	25.0	28.333333	22.06	23.728571	23.464583
4	Danilo Gallinari	OKC @ NOP	2020-02-13	36.9	32.500000	31.56	30.914286	30.508511

dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14397 entries, 0 to 14396
Data columns (total 8 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Player     14397 non-null  object        
 1   Match_Up   14397 non-null  object        
 2   Game_Date  14397 non-null  datetime64[ns]
 3   FP         14397 non-null  float64       
 4   Last3      14397 non-null  float64       
 5   Last5      14397 non-null  float64       
 6   Last7      14397 non-null  float64       
 7   SeasonAve  14397 non-null  float64       
dtypes: datetime64[ns](1), float64(5), object(2)
memory usage: 899.9+ KB

For the first deviation, we will be pulling in a player list from fanduel for January 4th, 2020. This is a completely arbitrary dataset, only used because I have it handy, and it falls within the data range that we have already been working with.

playerList = pd.read_csv("20200104.csv")

playerList.head()

	Id	Position	First Name	Nickname	Last Name	FPPG	Played	Salary	Game	Team	Opponent	Injury Indicator	Injury Details	Tier	Unnamed: 14	Unnamed: 15
0	42349-40199	SF	Giannis	Giannis Antetokounmpo	Antetokounmpo	57.957575	33	12000	SA@MIL	MIL	SA	NaN	NaN	NaN	NaN	NaN
1	42349-84669	PG	Luka	Luka Doncic	Doncic	53.653333	30	10900	CHA@DAL	DAL	CHA	NaN	NaN	NaN	NaN	NaN
2	42349-15557	C	Andre	Andre Drummond	Drummond	48.348485	33	9700	DET@GS	DET	GS	NaN	NaN	NaN	NaN	NaN
3	42349-84671	PG	Trae	Trae Young	Young	45.159374	32	9000	IND@ATL	ATL	IND	NaN	NaN	NaN	NaN	NaN
4	42349-55062	C	Nikola	Nikola Jokic	Jokic	41.567648	34	8900	DEN@WAS	DEN	WAS	NaN	NaN	NaN	NaN	NaN

As we can see this is a standard player list from fanduel, with 2 extra empty columns at the end that pandas brought in that we don't neeed. To prepare for what we will need accomplish lets think about what needs to occur here.

First, we have a full season worth of data here, but we only want to utilize the games to train on up until the day before this player list. So we will need to slice out our data based on date.

Next, we want to only have one data point per player on the player list, so once we have out trained model, we can feed in one dataset with the most recent rolling window for each player associated with it.

We will be utilizing the group by method in pandas and returning the first value for each player after it has been sorted from most recent to least recent.

Lets start with sorting our box scores by game date.

dataset = dataset.sort_values('Game_Date', ascending=False)

Next up, we will be dropping all players with an injury designation of 'Out' as we know they will not be playing, and then setting the index for the playerList dataframe to the player name.

playerList = playerList[playerList['Injury Indicator'] != 'O'].set_index('Nickname')

After this, if we think about the types of data we want to pull through from the player list, it's going to be just the player and the team. So we need to go ahead and redefine the playerList datafame to only include those two values.

playerList = playerList[[ 'Team']]

Then we are going to go ahead and start working on removing box scores that happened on January 4th and after. To accomplish this we are going to create a copy of our dataset datafame and call this dataframe2, then we are going to set the index to the Game_Date field.

dataset2 = dataset.copy()

dataset2.set_index('Game_Date', inplace=True)

dataset2.head()

	Player	Match_Up	FP	Last3	Last5	Last7	SeasonAve
Game_Date
2020-02-13	Abdel Nader	OKC @ NOP	19.3	11.433333	10.46	9.728571	11.038235
2020-02-13	Brad Wanamaker	BOS vs. LAC	16.7	18.066667	17.70	21.757143	15.162745
2020-02-13	Chris Paul	OKC @ NOP	46.6	44.333333	40.16	40.485714	36.553704
2020-02-13	Daniel Theis	BOS vs. LAC	25.0	28.333333	22.06	23.728571	23.464583
2020-02-13	Danilo Gallinari	OKC @ NOP	36.9	32.500000	31.56	30.914286	30.508511

Now we are going to define a new dataframe (good practice when learning so you don't accidentally overwrite anything and have to start over) and slice out all values so that we only keep values where the game_date, or now the index, is less than or equal to January 3, 2020.

dataset3 = dataset2[(dataset2.index<='2020-1-3')]

dataset3.head()

	Player	Match_Up	FP	Last3	Last5	Last7	SeasonAve
Game_Date
2020-01-03	Aaron Gordon	ORL vs. MIA	29.1	25.733333	31.12	27.928571	29.914286
2020-01-03	Al Horford	PHI @ HOU	19.6	15.833333	21.86	23.828571	30.150000
2020-01-03	Alex Len	ATL @ BOS	26.6	26.666667	26.14	22.142857	20.597436
2020-01-03	Allen Crabbe	ATL @ BOS	13.2	10.600000	13.70	11.114286	10.134286
2020-01-03	Anfernee Simons	POR @ WAS	4.9	10.466667	16.32	15.100000	14.944444

Now if we sort this dataframe by player name, we can see that we have every box score from each player prior to our defined date. However, we only want the most recent value to use to form our predictions.

dataset3.sort_values('Player').head(15)

	Player	Match_Up	FP	Last3	Last5	Last7	SeasonAve
Game_Date
2020-01-03	Aaron Gordon	ORL vs. MIA	29.1	25.733333	31.120000	27.928571	29.914286
2019-11-13	Aaron Gordon	ORL vs. PHI	48.1	31.933333	30.600000	29.800000	29.914286
2019-12-15	Aaron Gordon	ORL @ NOP	17.8	30.266667	29.880000	33.014286	29.914286
2019-12-17	Aaron Gordon	ORL @ UTA	14.0	21.666667	27.280000	30.128571	29.914286
2019-11-08	Aaron Gordon	ORL vs. MEM	25.8	27.666667	27.720000	27.542857	29.914286
2019-12-18	Aaron Gordon	ORL @ DEN	25.9	19.233333	26.140000	27.042857	29.914286
2019-11-06	Aaron Gordon	ORL @ DAL	31.3	31.166667	29.700000	27.842857	29.914286
2019-12-20	Aaron Gordon	ORL @ POR	33.2	24.366667	24.820000	27.928571	29.914286
2019-11-05	Aaron Gordon	ORL @ OKC	25.9	27.166667	27.140000	26.785714	29.914286
2019-11-02	Aaron Gordon	ORL vs. DEN	36.3	30.433333	27.540000	26.933333	29.914286
2019-12-23	Aaron Gordon	ORL vs. CHI	45.2	34.766667	27.220000	29.871429	29.914286
2019-11-01	Aaron Gordon	ORL vs. MIL	19.3	24.500000	25.060000	25.060000	29.914286
2019-10-30	Aaron Gordon	ORL vs. NYK	35.7	27.366667	26.500000	26.500000	29.914286
2019-12-27	Aaron Gordon	ORL vs. PHI	35.7	38.033333	30.800000	29.285714	29.914286
2019-10-28	Aaron Gordon	ORL @ TOR	18.5	23.433333	23.433333	23.433333	29.914286

Fortunately, the season average is going to be the same for each record of each player since we didn't recalculate it to be the season average up until this point (in real life this would not be the case, however we also would not have to be slicing off data like this so it's fine for this example) so we can go ahead and group by player and season average, keeping only the first record since we are already sorted by most recent game date to oldest.

dataset3 = dataset3.groupby(["Player","SeasonAve"]).first()

dataset3.sort_values('Player').head(25)

		Match_Up	FP	Last3	Last5	Last7
Player	SeasonAve
Aaron Gordon	29.914286	ORL vs. MIA	29.1	25.733333	31.12	27.928571
Aaron Holiday	19.195918	IND vs. DEN	19.0	24.966667	29.64	27.785714
Abdel Nader	11.038235	OKC vs. DAL	5.0	6.000000	8.58	9.000000
Al Horford	30.150000	PHI @ HOU	19.6	15.833333	21.86	23.828571
Al-Farouq Aminu	16.170588	ORL vs. TOR	12.9	11.866667	13.08	16.385714
Alec Burks	28.585714	GSW @ MIN	11.1	26.633333	25.08	31.271429
Alex Caruso	14.219149	LAL vs. DAL	12.2	14.666667	12.38	13.871429
Alex Len	20.597436	ATL @ BOS	26.6	26.666667	26.14	22.142857
Allen Crabbe	10.134286	ATL @ BOS	13.2	10.600000	13.70	11.114286
Allonzo Trier	9.850000	NYK @ WAS	0.2	7.366667	9.30	9.514286
Andre Drummond	48.035294	DET @ LAC	32.4	48.866667	43.38	45.028571
Andrew Wiggins	36.664444	MIN @ SAC	37.5	38.200000	37.12	38.557143
Anfernee Simons	14.944444	POR @ WAS	4.9	10.466667	16.32	15.100000
Anthony Davis	52.032609	LAL vs. NOP	72.1	53.033333	49.22	52.328571
Anthony Tolliver	10.405714	POR @ WAS	3.4	9.200000	9.32	9.557143
Aron Baynes	22.363636	PHX vs. NYK	39.4	28.066667	25.54	24.200000
Austin Rivers	15.638000	HOU vs. PHI	6.0	9.533333	12.48	12.400000
Avery Bradley	14.587179	LAL vs. NOP	12.9	18.133333	13.54	11.357143
Bam Adebayo	40.303704	MIA @ ORL	34.5	33.533333	33.64	37.128571
Ben McLemore	16.598077	HOU vs. PHI	3.2	12.500000	11.40	12.371429
Ben Simmons	43.598113	PHI @ HOU	79.1	53.833333	52.44	51.628571
Bismack Biyombo	18.823256	CHA @ CLE	15.2	17.833333	22.72	19.685714
Blake Griffin	26.127778	DET @ SAS	17.4	25.033333	21.58	19.900000
Bobby Portis	19.109091	NYK @ PHX	26.8	24.033333	20.10	24.342857
Bogdan Bogdanovic	24.655814	SAC vs. MEM	19.4	18.666667	24.94	25.457143

Now we can see that we no longer have any duplicates for each player, so we will go ahead and reset the index, and then set the index back to the players name.

dataset3 = dataset3.reset_index().set_index('Player')

dataset3.head(25)

	SeasonAve	Match_Up	FP	Last3	Last5	Last7
Player
Aaron Gordon	29.914286	ORL vs. MIA	29.1	25.733333	31.12	27.928571
Aaron Holiday	19.195918	IND vs. DEN	19.0	24.966667	29.64	27.785714
Abdel Nader	11.038235	OKC vs. DAL	5.0	6.000000	8.58	9.000000
Al Horford	30.150000	PHI @ HOU	19.6	15.833333	21.86	23.828571
Al-Farouq Aminu	16.170588	ORL vs. TOR	12.9	11.866667	13.08	16.385714
Alec Burks	28.585714	GSW @ MIN	11.1	26.633333	25.08	31.271429
Alex Caruso	14.219149	LAL vs. DAL	12.2	14.666667	12.38	13.871429
Alex Len	20.597436	ATL @ BOS	26.6	26.666667	26.14	22.142857
Allen Crabbe	10.134286	ATL @ BOS	13.2	10.600000	13.70	11.114286
Allonzo Trier	9.850000	NYK @ WAS	0.2	7.366667	9.30	9.514286
Andre Drummond	48.035294	DET @ LAC	32.4	48.866667	43.38	45.028571
Andrew Wiggins	36.664444	MIN @ SAC	37.5	38.200000	37.12	38.557143
Anfernee Simons	14.944444	POR @ WAS	4.9	10.466667	16.32	15.100000
Anthony Davis	52.032609	LAL vs. NOP	72.1	53.033333	49.22	52.328571
Anthony Tolliver	10.405714	POR @ WAS	3.4	9.200000	9.32	9.557143
Aron Baynes	22.363636	PHX vs. NYK	39.4	28.066667	25.54	24.200000
Austin Rivers	15.638000	HOU vs. PHI	6.0	9.533333	12.48	12.400000
Avery Bradley	14.587179	LAL vs. NOP	12.9	18.133333	13.54	11.357143
Bam Adebayo	40.303704	MIA @ ORL	34.5	33.533333	33.64	37.128571
Ben McLemore	16.598077	HOU vs. PHI	3.2	12.500000	11.40	12.371429
Ben Simmons	43.598113	PHI @ HOU	79.1	53.833333	52.44	51.628571
Bismack Biyombo	18.823256	CHA @ CLE	15.2	17.833333	22.72	19.685714
Blake Griffin	26.127778	DET @ SAS	17.4	25.033333	21.58	19.900000
Bobby Portis	19.109091	NYK @ PHX	26.8	24.033333	20.10	24.342857
Bogdan Bogdanovic	24.655814	SAC vs. MEM	19.4	18.666667	24.94	25.457143

Now we are going to join our two dataframes together, joining the most recent predictive data to our player list data. We will be using a left join because our player list is the left dataframe in the way we have it set up. You could use a right join if you prefer by setting it up as dataset3.join(playerList, how='right').

We don't need any further parameters because the player name is the index for both dataframes, if this were not the case we would need to specify which columns to join on.

This will move our dataset3 columns over into the playerList dataframe for every record that has a matching player name, if a player on the playerList dataframe does not match up to a player on the dataset3 dataframe, it will simply leave those columns blank. If you only want to include records where the player names match, you would want to use an inner join rather than a left/right join.

playerList2 = playerList.join(dataset3, how='left')

playerList2.head(30)

	Team	SeasonAve	Match_Up	FP	Last3	Last5	Last7
Nickname
Giannis Antetokounmpo	MIL	57.629167	MIL vs. MIN	60.4	50.233333	53.24	57.100000
Luka Doncic	DAL	53.890698	DAL vs. BKN	56.1	49.633333	53.56	54.014286
Andre Drummond	DET	48.035294	DET @ LAC	32.4	48.866667	43.38	45.028571
Trae Young	ATL	47.630000	ATL @ BOS	43.0	38.233333	45.46	47.257143
Nikola Jokic	DEN	45.792727	DEN @ IND	28.4	30.200000	38.52	40.385714
Rudy Gobert	UTA	41.232692	UTA @ CHI	38.4	36.766667	41.62	43.214286
Bradley Beal	WAS	44.806522	WAS vs. ORL	37.3	36.000000	44.60	46.071429
Brandon Ingram	NO	41.010638	NOP @ LAL	33.1	41.266667	43.38	44.171429
LaMarcus Aldridge	SA	37.226000	SAS vs. OKC	44.4	46.233333	47.98	44.814286
Jrue Holiday	NO	39.428261	NOP @ LAL	27.6	36.200000	39.22	37.685714
Domantas Sabonis	IND	41.486538	IND vs. DEN	49.3	44.666667	39.86	42.314286
Nikola Vucevic	ORL	41.504545	ORL vs. MIA	44.7	41.933333	43.46	45.371429
Jayson Tatum	BOS	39.728000	BOS vs. ATL	27.3	29.600000	33.32	39.357143
Zach LaVine	CHI	39.712727	CHI vs. UTA	40.3	35.666667	38.22	40.357143
Shai Gilgeous-Alexander	OKC	35.169091	OKC @ SAS	48.9	47.600000	43.04	43.800000
DeMar DeRozan	SA	38.976923	SAS vs. OKC	36.1	41.700000	42.14	37.942857
John Collins	ATL	39.990000	ATL @ BOS	26.1	36.466667	39.72	39.985714
Donovan Mitchell	UTA	37.175472	UTA @ CHI	27.3	36.766667	37.90	39.042857
Devonte' Graham	CHA	34.744444	CHA @ CLE	33.9	34.666667	38.02	35.485714
Jaylen Brown	BOS	32.925000	BOS vs. ATL	39.0	38.100000	37.36	39.300000
De'Aaron Fox	SAC	38.942857	SAC vs. MEM	65.3	43.000000	43.48	40.542857
Khris Middleton	MIL	35.570213	MIL vs. MIN	31.6	37.100000	40.00	39.000000
Chris Paul	OKC	36.553704	OKC @ SAS	38.1	39.800000	41.50	37.685714
Gordon Hayward	BOS	33.305405	BOS vs. ATL	29.7	34.466667	33.28	30.771429
Kevin Love	CLE	34.191304	CLE vs. CHA	35.6	36.400000	36.68	35.728571
Richaun Holmes	SAC	30.900000	SAC vs. MEM	31.4	35.200000	36.66	37.114286
Will Barton	DEN	31.827083	DEN @ IND	35.5	35.633333	35.24	35.571429
Draymond Green	GS	29.602439	GSW @ MIN	16.6	27.233333	30.56	31.128571
Buddy Hield	SAC	32.250000	SAC vs. MEM	39.9	38.533333	34.18	29.314286
Jamal Murray	DEN	33.845455	DEN @ IND	41.5	32.600000	28.72	29.400000

Once our join has succeeded, we will go ahead and only keep the columns we will feed into our random forest, and then we are going to take a peak at our data sorted to see if we have any blank rows for mismatched names.

playerList2 = playerList2[['Last3', 'Last5', 'Last7', 'SeasonAve']]

playerList2.sort_values('Last7')

	Last3	Last5	Last7	SeasonAve
Nickname
Frank Jackson	2.566667	4.32	3.428571	10.888571
Jordan Poole	4.200000	6.52	5.014286	14.668750
Jacob Evans	8.600000	8.08	6.542857	10.250000
Kenrich Williams	4.766667	5.36	6.900000	15.900000
Nicolo Melli	3.800000	4.66	7.414286	14.125641
...	...	...	...	...
Frank Mason	NaN	NaN	NaN	NaN
Wenyen Gabriel	NaN	NaN	NaN	NaN
Mike Muscala	NaN	NaN	NaN	NaN
Drew Eubanks	NaN	NaN	NaN	NaN
Cristiano Felicio	NaN	NaN	NaN	NaN

264 rows × 4 columns

We do indeed have some blank records. This will not be able to pushed throught he random forest algorithm as it stands. We have a couple options, we can either replace all the NaNs with a numerical value, such as a 0 or the average of the column, or simply drop the records with a NaN in them. For some datasets it may make sense to populate with a numerical value, however in this instance we are just going to drop any row with a NaN to prepare for the random forest.

playerList2.dropna(inplace=True)

playerList2.sort_values('Last7')

	Last3	Last5	Last7	SeasonAve
Nickname
Frank Jackson	2.566667	4.32	3.428571	10.888571
Jordan Poole	4.200000	6.52	5.014286	14.668750
Jacob Evans	8.600000	8.08	6.542857	10.250000
Kenrich Williams	4.766667	5.36	6.900000	15.900000
Nicolo Melli	3.800000	4.66	7.414286	14.125641
...	...	...	...	...
Nikola Vucevic	41.933333	43.46	45.371429	41.504545
Bradley Beal	36.000000	44.60	46.071429	44.806522
Trae Young	38.233333	45.46	47.257143	47.630000
Luka Doncic	49.633333	53.56	54.014286	53.890698
Giannis Antetokounmpo	50.233333	53.24	57.100000	57.629167

174 rows × 4 columns

Alright, now it may start to get a little confusing, so we are going to recap the process we are about to begin real quick.

We need the full dataset (minus games happening on or after January 4th) to train our model on, but we will be calculating predictions on the datset we just finished manipulating. So we will create our Features and Labels numpy arrays to break out into our training and testing datasets off of our initial dataset, but once we get into our predictions stage we will be calling the dataframe we just created.

So, on to creating our dataframe to feed our train and test datasets.

dataset4 = dataset[['Player', 'Last3', 'Last5', 'Last7','SeasonAve', 'FP']].set_index('Player')
#dataset.head()

featureNames = ['Last3', 'Last5', 'Last7', 'SeasonAve']
labelName = ['FP']
dfFeatures = dataset4[featureNames]
#dfFeatures.head()

dfLabels = dataset4[labelName]
#dfLabels.head()

labels = np.array(dfLabels)
features = np.array(dfFeatures)

Next up, we will be running a similar for loop as our last lesson, however we will be using our new playerList dataset to drive the predictions rather than our Feature dataset. The following method is how you would set up your for loop if you wanted to train and test your model every. single. time. that you ran your code. This is not the most efficient, but it is certainly an option. Generally I like to retrain my models after a couple weeks, or if a significant change in data has occurred, like at the beginning of the season when every new datapoint significantly alters the overall data spread.

##WHEN YOU WANT TO RETRAIN YOUR MODELS EVERY SINGLE CONTEST DAY

compareDF = playerList2.copy()
for i in range(0,10):
    n = random.randint(0,100)
    train, test, trainLabels, testLabels = train_test_split(features, labels, test_size=(0.4), random_state = n)
    reg = RandomForestRegressor(random_state=n)
    reg.fit(train, trainLabels)
    predictions = reg.predict(playerList2)
    compareDF[f'predict_{n}'] = predictions

<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)
<ipython-input-30-458b6ac04ffd>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  reg.fit(train, trainLabels)

compareDF.head(15)

	Last3	Last5	Last7	SeasonAve	predict_68	predict_96	predict_89	predict_29	predict_98	predict_92	predict_18	predict_23	predict_90	predict_86
Nickname
Giannis Antetokounmpo	50.233333	53.24	57.100000	57.629167	48.144	51.306	57.146	52.281	57.86900	57.228	56.066	51.892	57.142	55.500
Luka Doncic	49.633333	53.56	54.014286	53.890698	49.734	51.617	45.863	53.768	47.28100	55.586	51.688	53.818	53.765	54.411
Andre Drummond	48.866667	43.38	45.028571	48.035294	37.932	41.594	39.394	38.562	47.09275	44.612	42.832	37.646	37.518	45.959
Trae Young	38.233333	45.46	47.257143	47.630000	41.868	40.024	41.513	35.700	39.10100	42.033	42.509	42.135	36.573	41.767
Nikola Jokic	30.200000	38.52	40.385714	45.792727	29.098	28.041	30.592	31.797	29.65100	29.863	28.514	29.532	28.612	28.342
Rudy Gobert	36.766667	41.62	43.214286	41.232692	39.017	37.541	36.724	35.832	37.04000	37.445	37.841	36.168	35.652	35.756
Bradley Beal	36.000000	44.60	46.071429	44.806522	38.560	39.879	35.365	38.174	40.55200	37.772	40.619	38.923	42.808	39.374
Brandon Ingram	41.266667	43.38	44.171429	41.010638	33.768	34.803	34.729	36.160	34.60700	35.315	36.080	34.915	38.133	33.720
LaMarcus Aldridge	46.233333	47.98	44.814286	37.226000	41.126	44.250	42.822	43.620	45.51000	45.242	45.381	43.653	46.194	44.115
Jrue Holiday	36.200000	39.22	37.685714	39.428261	35.170	32.030	34.808	31.468	29.57300	30.936	35.881	28.728	36.331	37.532
Domantas Sabonis	44.666667	39.86	42.314286	41.486538	45.164	48.549	46.507	48.401	48.42900	48.558	48.339	47.454	48.641	47.586
Nikola Vucevic	41.933333	43.46	45.371429	41.504545	42.217	36.073	37.675	41.312	41.48200	42.468	42.038	38.837	41.434	44.360
Jayson Tatum	29.600000	33.32	39.357143	39.728000	27.192	28.217	31.819	32.099	27.76600	27.866	27.914	30.589	28.662	28.145
Zach LaVine	35.666667	38.22	40.357143	39.712727	40.118	41.449	42.567	42.143	38.98800	39.311	41.457	42.520	45.305	38.907
Shai Gilgeous-Alexander	47.600000	43.04	43.800000	35.169091	49.146	46.772	48.614	47.751	45.73300	50.535	49.171	49.988	49.694	47.597

Once our model iterations have finished running, we can go ahead and take a look at the results. You'll likely notice that these are not nearly as similar to each other as our models last time. This is because none of these records were included in the training dataset, and as you can see, it makes a pretty big difference. Without the training dataset being included in the prediction dataset, there will be quite a bit more variability in predictions. And that's even with only including this small sample size of data.

dataset.sort_values('Game_Date').head()

	Player	Match_Up	Game_Date	FP	Last3	Last5	Last7	SeasonAve
14396	Troy Daniels	LAL @ LAC	2019-10-22	6.5	6.5	6.5	6.5	7.640625
14374	Jrue Holiday	NOP @ TOR	2019-10-22	27.8	27.8	27.8	27.8	39.428261
14373	Josh Hart	NOP @ TOR	2019-10-22	30.5	30.5	30.5	30.5	23.997917
14372	Jahlil Okafor	NOP @ TOR	2019-10-22	12.4	12.4	12.4	12.4	17.424000
14371	JaVale McGee	LAL @ LAC	2019-10-22	11.4	11.4	11.4	11.4	20.578431

Now just for comparison to see how accurate it really is or isn't, we are going to create a new dataframe copy of our original dataset, and only slice the games these models predicted for, the games occuring on January 4th 2020.

datasetNew = dataset.copy()
datasetNew.set_index('Game_Date', inplace=True)
datasetNew = datasetNew[(datasetNew.index=='2020-1-4')]

datasetNew = datasetNew[['FP', 'Player']].set_index('Player')

datasetNew.head()

	FP
Player
Aaron Gordon	31.4
Aaron Holiday	32.8
Alec Burks	43.9
Alex Len	39.3
Allen Crabbe	6.1

Now that we have the actual fantasy points scored on that day, lets go ahead and join our datset with all of our predictions to it. Notice here how we are using a right join, because we only want to look at records that we actually have a prediction for in our compareDF.

datasetNew = datasetNew.join(compareDF, how='right').drop(['Last3', 'Last5', 'Last7', 'SeasonAve'], axis=1)

datasetNew.head(25)

	FP	predict_68	predict_96	predict_89	predict_29	predict_98	predict_92	predict_18	predict_23	predict_90	predict_86
Nickname
Giannis Antetokounmpo	46.1	48.144	51.306	57.146	52.281	57.86900	57.228	56.066	51.892	57.142	55.500
Luka Doncic	69.4	49.734	51.617	45.863	53.768	47.28100	55.586	51.688	53.818	53.765	54.411
Andre Drummond	51.1	37.932	41.594	39.394	38.562	47.09275	44.612	42.832	37.646	37.518	45.959
Trae Young	59.8	41.868	40.024	41.513	35.700	39.10100	42.033	42.509	42.135	36.573	41.767
Nikola Jokic	33.0	29.098	28.041	30.592	31.797	29.65100	29.863	28.514	29.532	28.612	28.342
Rudy Gobert	37.4	39.017	37.541	36.724	35.832	37.04000	37.445	37.841	36.168	35.652	35.756
Bradley Beal	NaN	38.560	39.879	35.365	38.174	40.55200	37.772	40.619	38.923	42.808	39.374
Brandon Ingram	31.5	33.768	34.803	34.729	36.160	34.60700	35.315	36.080	34.915	38.133	33.720
LaMarcus Aldridge	31.0	41.126	44.250	42.822	43.620	45.51000	45.242	45.381	43.653	46.194	44.115
Jrue Holiday	35.5	35.170	32.030	34.808	31.468	29.57300	30.936	35.881	28.728	36.331	37.532
Domantas Sabonis	45.2	45.164	48.549	46.507	48.401	48.42900	48.558	48.339	47.454	48.641	47.586
Nikola Vucevic	45.6	42.217	36.073	37.675	41.312	41.48200	42.468	42.038	38.837	41.434	44.360
Jayson Tatum	48.4	27.192	28.217	31.819	32.099	27.76600	27.866	27.914	30.589	28.662	28.145
Zach LaVine	53.9	40.118	41.449	42.567	42.143	38.98800	39.311	41.457	42.520	45.305	38.907
Shai Gilgeous-Alexander	32.4	49.146	46.772	48.614	47.751	45.73300	50.535	49.171	49.988	49.694	47.597
DeMar DeRozan	40.3	43.944	44.242	41.317	40.031	37.95900	39.348	38.037	39.341	39.574	38.543
John Collins	NaN	39.035	38.781	41.282	39.138	30.26600	30.218	30.925	38.464	38.677	40.370
Donovan Mitchell	45.6	31.106	30.279	31.191	30.979	34.50400	37.221	34.993	30.174	34.680	30.527
Devonte' Graham	51.9	35.197	34.223	36.450	34.733	34.64500	34.877	35.350	34.472	33.593	37.336
Jaylen Brown	27.9	37.393	35.563	35.130	35.298	38.14000	36.627	33.855	34.418	34.936	35.874
De'Aaron Fox	28.1	57.422	51.091	56.294	57.122	42.39900	57.492	56.329	57.131	47.274	41.196
Khris Middleton	29.5	37.125	33.836	37.816	33.685	33.33900	33.149	33.111	33.251	32.625	35.212
Chris Paul	22.8	38.434	41.518	37.202	38.693	38.52500	38.426	38.709	37.536	37.914	35.999
Gordon Hayward	34.5	31.466	30.770	34.276	36.297	31.07300	32.245	30.507	35.598	35.456	36.008
Kevin Love	14.1	35.106	34.257	35.201	32.794	35.41200	34.797	35.469	36.260	35.687	34.653

Now I won't bother going through and re-running the entire thing to calculate the error for the models instead of the scores, because in this realistic-ish scenario you won't have anything to compare to. However, if you notice there are a few records without an actual FP score, these are players that did not have an Out injury designation at the time the player list was pulled, but ended up not playing.

From here, you can run any type of analysis you would like, but in the next lesson we will be going over how to run the same setup without having to train your datasets every single time.

Nick's Niche

Monte Carlo Part 2

Applying Monte-Carlo(esque) Simulation to Real Life Scenario

Monte Carlo Part 3

Monte Carlo-esque Simulation Introduction