Applying Monte-Carlo(esque) Simulation to Real Life Scenario (Part 2)

This lesson we will be building upon our previous example, except this time we will be training our models one time, then saving them to our machines for later use. This is a more efficient process as the training of the model is one of the more time consuming parts of our previous iteration. Offloading that as a one time process and simply calling the trained models into service each time will potentially save quite a bit of time depending on how many models you are running, and how complex they are.

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split 
import random

dataset = pd.read_excel("SampleDataForYT.xlsx")
#dataset.head()

playerList = pd.read_csv("20200104.csv")

#playerList.head()

dataset = dataset.sort_values('Game_Date', ascending=False)

playerList = playerList[playerList['Injury Indicator'] != 'O'].set_index('Nickname')

playerList = playerList[[ 'Team']]

dataset2 = dataset.copy()

dataset2.set_index('Game_Date', inplace=True)

#dataset2.head()

dataset3 = dataset2[(dataset2.index<='2020-1-3')]

#dataset3.head()

#dataset3.sort_values('Player').head(15)

dataset3 = dataset3.groupby(["Player","SeasonAve"]).first()

#dataset3.sort_values('Player').head(25)

dataset3 = dataset3.reset_index().set_index('Player')

#dataset3.head(25)

playerList2 = playerList.join(dataset3, how='left')

#playerList2.head(30)

playerList2 = playerList2[['Last3', 'Last5', 'Last7', 'SeasonAve']]

#playerList2.sort_values('Last7')

playerList2.dropna(inplace=True)

#playerList2.sort_values('Last7')

dataset4 = dataset[['Player', 'Last3', 'Last5', 'Last7','SeasonAve', 'FP']].set_index('Player')
#dataset4.head()

featureNames = ['Last3', 'Last5', 'Last7', 'SeasonAve']
labelName = ['FP']
dfFeatures = dataset4[featureNames]
#dfFeatures.head()

dfLabels = dataset4[labelName]
#dfLabels.head()

labels = np.array(dfLabels)
features = np.array(dfFeatures)

Alright, now that we have our data all prepped and ready to go, it's time to start differing from our previous approach. The first thing we need to do is import the pickle module to be able to save our models for future usage.

import pickle

Now, the change we are going to make to our for loop is once we fit our model to our dataset, we are going to pickle it up and save it to a local location.

##WHEN YOU DON'T WANT TO RETRAIN YOUR MODELS EVERY SINGLE CONTEST DAY
for i in range(0,10):
    n = random.randint(0,100)
    train, test, trainLabels, testLabels = train_test_split(features, labels, test_size=(0.4), random_state = n)
    reg = RandomForestRegressor(random_state=n)
    reg.fit(train, trainLabels)
    with open(f'models\\model{i}.txt', 'wb') as myFile:
        pickle.dump(reg, myFile)

Now that we have saved our models locally, we are going to check and make sure they all were saved appropriately by import the os module and checking the contents of that folder we saved them to.

import os

models = os.listdir('models')

models

['model0.txt',
 'model1.txt',
 'model2.txt',
 'model3.txt',
 'model4.txt',
 'model5.txt',
 'model6.txt',
 'model7.txt',
 'model8.txt',
 'model9.txt']

Now that we can see our models are indeed saved where we intended them to be, it's time to start calling them into action.

To do that, we are going to update our code from our previous lesson. We are still going to create a compareDF to append our results to, then initiate our for loop to run one time for each model in our models folder, define the full file path for the model, load our model with the pickle module, then calculate our predictions for our dataset, then append them into our dataframe just keeping the numbered portion of the model name to show in the column title.

for model in models:
    print(os.path.join('models',model))

models\model0.txt
models\model1.txt
models\model2.txt
models\model3.txt
models\model4.txt
models\model5.txt
models\model6.txt
models\model7.txt
models\model8.txt
models\model9.txt

compareDF = playerList2.copy()
for model in models:
    modelPath = os.path.join('models',model)
    with open(modelPath, 'rb') as myFile:
        reg = pickle.load(myFile)
        predictions = reg.predict(playerList2)
        compareDF[f'predict_{model[5:6]}'] = predictions

Now we will go ahead and compare our predictions back to our dataset just as we had done previously.

compareDF.head(15)

	Last3	Last5	Last7	SeasonAve	predict_0	predict_1	predict_2	predict_3	predict_4	predict_5	predict_6	predict_7	predict_8	predict_9
Nickname
Giannis Antetokounmpo	50.233333	53.24	57.100000	57.629167	48.020	45.921	56.302	45.614	49.614	49.023	57.442	56.402	56.402	49.266
Luka Doncic	49.633333	53.56	54.014286	53.890698	53.414	54.092	45.207	47.589	49.594	43.641	54.040	48.112	48.112	51.154
Andre Drummond	48.866667	43.38	45.028571	48.035294	45.758	48.525	38.774	50.783	46.318	38.721	36.804	40.561	40.561	48.775
Trae Young	38.233333	45.46	47.257143	47.630000	42.454	39.503	38.494	39.167	40.804	33.771	40.109	41.764	41.764	38.013
Nikola Jokic	30.200000	38.52	40.385714	45.792727	28.880	27.623	29.457	29.628	30.016	29.194	28.878	29.591	29.591	29.557
Rudy Gobert	36.766667	41.62	43.214286	41.232692	36.224	36.721	36.584	37.523	39.966	36.719	38.337	32.877	32.877	37.733
Bradley Beal	36.000000	44.60	46.071429	44.806522	38.611	41.158	36.464	37.119	37.083	38.721	38.253	38.598	38.598	37.461
Brandon Ingram	41.266667	43.38	44.171429	41.010638	30.962	33.092	33.252	35.190	35.199	38.062	35.300	30.588	30.588	34.534
LaMarcus Aldridge	46.233333	47.98	44.814286	37.226000	45.891	44.723	45.376	42.473	45.129	44.557	45.614	45.393	45.393	44.914
Jrue Holiday	36.200000	39.22	37.685714	39.428261	29.039	30.884	29.788	33.861	28.165	30.378	30.390	29.893	29.893	33.173
Domantas Sabonis	44.666667	39.86	42.314286	41.486538	47.929	47.896	50.568	48.904	47.114	47.131	49.190	50.210	50.210	48.082
Nikola Vucevic	41.933333	43.46	45.371429	41.504545	39.500	40.864	41.355	39.099	33.890	40.335	42.799	37.660	37.660	41.401
Jayson Tatum	29.600000	33.32	39.357143	39.728000	31.504	30.233	28.405	28.372	28.621	29.331	27.792	29.397	29.397	29.901
Zach LaVine	35.666667	38.22	40.357143	39.712727	39.558	40.813	39.945	40.111	42.163	39.952	41.662	38.578	38.578	40.870
Shai Gilgeous-Alexander	47.600000	43.04	43.800000	35.169091	46.909	46.443	49.452	47.036	47.595	46.924	46.995	46.933	46.933	41.715

dataset.sort_values('Game_Date').head()

	Player	Match_Up	Game_Date	FP	Last3	Last5	Last7	SeasonAve
14396	Troy Daniels	LAL @ LAC	2019-10-22	6.5	6.5	6.5	6.5	7.640625
14395	Terence Davis	TOR vs. NOP	2019-10-22	20.0	20.0	20.0	20.0	16.605660
14394	Serge Ibaka	TOR vs. NOP	2019-10-22	19.0	19.0	19.0	19.0	30.009091
14393	Quinn Cook	LAL @ LAC	2019-10-22	7.2	7.2	7.2	7.2	12.844444
14392	Patrick Patterson	LAC vs. LAL	2019-10-22	7.6	7.6	7.6	7.6	11.119444

datasetNew = dataset.copy()
datasetNew.set_index('Game_Date', inplace=True)
datasetNew = datasetNew[(datasetNew.index=='2020-1-4')]

datasetNew = datasetNew[['FP', 'Player']].set_index('Player')

datasetNew.head()

	FP
Player
Malik Beasley	11.0
Marvin Williams	11.2
Maurice Harkless	1.6
Matthew Dellavedova	5.2
Mason Plumlee	11.5

datasetNew = datasetNew.join(compareDF, how='right').drop(['Last3', 'Last5', 'Last7', 'SeasonAve'], axis=1)

datasetNew.head(25)

	FP	predict_0	predict_1	predict_2	predict_3	predict_4	predict_5	predict_6	predict_7	predict_8	predict_9
Nickname
Giannis Antetokounmpo	46.1	48.020	45.921	56.302	45.614	49.614	49.023	57.442	56.402	56.402	49.266
Luka Doncic	69.4	53.414	54.092	45.207	47.589	49.594	43.641	54.040	48.112	48.112	51.154
Andre Drummond	51.1	45.758	48.525	38.774	50.783	46.318	38.721	36.804	40.561	40.561	48.775
Trae Young	59.8	42.454	39.503	38.494	39.167	40.804	33.771	40.109	41.764	41.764	38.013
Nikola Jokic	33.0	28.880	27.623	29.457	29.628	30.016	29.194	28.878	29.591	29.591	29.557
Rudy Gobert	37.4	36.224	36.721	36.584	37.523	39.966	36.719	38.337	32.877	32.877	37.733
Bradley Beal	NaN	38.611	41.158	36.464	37.119	37.083	38.721	38.253	38.598	38.598	37.461
Brandon Ingram	31.5	30.962	33.092	33.252	35.190	35.199	38.062	35.300	30.588	30.588	34.534
LaMarcus Aldridge	31.0	45.891	44.723	45.376	42.473	45.129	44.557	45.614	45.393	45.393	44.914
Jrue Holiday	35.5	29.039	30.884	29.788	33.861	28.165	30.378	30.390	29.893	29.893	33.173
Domantas Sabonis	45.2	47.929	47.896	50.568	48.904	47.114	47.131	49.190	50.210	50.210	48.082
Nikola Vucevic	45.6	39.500	40.864	41.355	39.099	33.890	40.335	42.799	37.660	37.660	41.401
Jayson Tatum	48.4	31.504	30.233	28.405	28.372	28.621	29.331	27.792	29.397	29.397	29.901
Zach LaVine	53.9	39.558	40.813	39.945	40.111	42.163	39.952	41.662	38.578	38.578	40.870
Shai Gilgeous-Alexander	32.4	46.909	46.443	49.452	47.036	47.595	46.924	46.995	46.933	46.933	41.715
DeMar DeRozan	40.3	45.846	38.644	43.254	38.582	39.691	37.868	38.269	45.505	45.505	44.297
John Collins	NaN	39.538	30.167	30.732	31.315	29.619	36.632	39.941	30.114	30.114	30.765
Donovan Mitchell	45.6	29.653	30.150	30.764	35.792	37.174	31.432	35.533	30.085	30.085	31.880
Devonte' Graham	51.9	37.515	36.359	35.027	35.989	33.470	39.104	34.536	34.392	34.392	34.725
Jaylen Brown	27.9	38.922	34.664	36.564	38.599	36.291	36.200	36.422	36.972	36.972	29.734
De'Aaron Fox	28.1	43.047	59.524	41.899	56.430	49.339	44.586	43.075	45.276	45.276	44.776
Khris Middleton	29.5	41.324	32.948	37.726	33.896	32.535	32.643	37.300	32.364	32.364	36.606
Chris Paul	22.8	38.750	38.642	40.023	39.130	37.768	40.970	37.033	40.846	40.846	38.045
Gordon Hayward	34.5	35.018	31.095	31.078	30.433	31.381	30.300	31.134	32.538	32.538	34.857
Kevin Love	14.1	35.889	34.705	35.503	35.780	35.203	34.807	33.254	35.588	35.588	35.813

Conclusion

This method can be utilized for as many model iterations as you would like, we just chose 10 today to keep it quick and simple. There are pros and cons to this method just as there are for training the model every usage. This method is quicker, but will not be taking into account the most recent stats for training the model on. The previous method takes longer to run each time, but will always be utilizing the most up to date information to train the model with.

Monte Carlo Part 3

Applying Monte-Carlo(esque) Simulation to Real Life Scenario (Part 2)

Conclusion

Monte Carlo Part 2