Monte Carlo Part 3
Applying Monte-Carlo(esque) Simulation to Real Life Scenario (Part 2)
This lesson we will be building upon our previous example, except this time we will be training our models one time, then saving them to our machines for later use. This is a more efficient process as the training of the model is one of the more time consuming parts of our previous iteration. Offloading that as a one time process and simply calling the trained models into service each time will potentially save quite a bit of time depending on how many models you are running, and how complex they are.
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
import random
dataset = pd.read_excel("SampleDataForYT.xlsx")
#dataset.head()
playerList = pd.read_csv("20200104.csv")
#playerList.head()
dataset = dataset.sort_values('Game_Date', ascending=False)
playerList = playerList[playerList['Injury Indicator'] != 'O'].set_index('Nickname')
playerList = playerList[[ 'Team']]
dataset2 = dataset.copy()
dataset2.set_index('Game_Date', inplace=True)
#dataset2.head()
dataset3 = dataset2[(dataset2.index<='2020-1-3')]
#dataset3.head()
#dataset3.sort_values('Player').head(15)
dataset3 = dataset3.groupby(["Player","SeasonAve"]).first()
#dataset3.sort_values('Player').head(25)
dataset3 = dataset3.reset_index().set_index('Player')
#dataset3.head(25)
playerList2 = playerList.join(dataset3, how='left')
#playerList2.head(30)
playerList2 = playerList2[['Last3', 'Last5', 'Last7', 'SeasonAve']]
#playerList2.sort_values('Last7')
playerList2.dropna(inplace=True)
#playerList2.sort_values('Last7')
dataset4 = dataset[['Player', 'Last3', 'Last5', 'Last7','SeasonAve', 'FP']].set_index('Player')
#dataset4.head()
featureNames = ['Last3', 'Last5', 'Last7', 'SeasonAve']
labelName = ['FP']
dfFeatures = dataset4[featureNames]
#dfFeatures.head()
dfLabels = dataset4[labelName]
#dfLabels.head()
labels = np.array(dfLabels)
features = np.array(dfFeatures)
Alright, now that we have our data all prepped and ready to go, it's time to start differing from our previous approach. The first thing we need to do is import the pickle module to be able to save our models for future usage.
import pickle
Now, the change we are going to make to our for loop is once we fit our model to our dataset, we are going to pickle it up and save it to a local location.
##WHEN YOU DON'T WANT TO RETRAIN YOUR MODELS EVERY SINGLE CONTEST DAY
for i in range(0,10):
n = random.randint(0,100)
train, test, trainLabels, testLabels = train_test_split(features, labels, test_size=(0.4), random_state = n)
reg = RandomForestRegressor(random_state=n)
reg.fit(train, trainLabels)
with open(f'models\\model{i}.txt', 'wb') as myFile:
pickle.dump(reg, myFile)
Now that we have saved our models locally, we are going to check and make sure they all were saved appropriately by import the os module and checking the contents of that folder we saved them to.
import os
models = os.listdir('models')
models
['model0.txt',
'model1.txt',
'model2.txt',
'model3.txt',
'model4.txt',
'model5.txt',
'model6.txt',
'model7.txt',
'model8.txt',
'model9.txt']
Now that we can see our models are indeed saved where we intended them to be, it's time to start calling them into action.
To do that, we are going to update our code from our previous lesson. We are still going to create a compareDF to append our results to, then initiate our for loop to run one time for each model in our models folder, define the full file path for the model, load our model with the pickle module, then calculate our predictions for our dataset, then append them into our dataframe just keeping the numbered portion of the model name to show in the column title.
for model in models:
print(os.path.join('models',model))
models\model0.txt
models\model1.txt
models\model2.txt
models\model3.txt
models\model4.txt
models\model5.txt
models\model6.txt
models\model7.txt
models\model8.txt
models\model9.txt
compareDF = playerList2.copy()
for model in models:
modelPath = os.path.join('models',model)
with open(modelPath, 'rb') as myFile:
reg = pickle.load(myFile)
predictions = reg.predict(playerList2)
compareDF[f'predict_{model[5:6]}'] = predictions
Now we will go ahead and compare our predictions back to our dataset just as we had done previously.
compareDF.head(15)
Last3 | Last5 | Last7 | SeasonAve | predict_0 | predict_1 | predict_2 | predict_3 | predict_4 | predict_5 | predict_6 | predict_7 | predict_8 | predict_9 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Nickname | ||||||||||||||
Giannis Antetokounmpo | 50.233333 | 53.24 | 57.100000 | 57.629167 | 48.020 | 45.921 | 56.302 | 45.614 | 49.614 | 49.023 | 57.442 | 56.402 | 56.402 | 49.266 |
Luka Doncic | 49.633333 | 53.56 | 54.014286 | 53.890698 | 53.414 | 54.092 | 45.207 | 47.589 | 49.594 | 43.641 | 54.040 | 48.112 | 48.112 | 51.154 |
Andre Drummond | 48.866667 | 43.38 | 45.028571 | 48.035294 | 45.758 | 48.525 | 38.774 | 50.783 | 46.318 | 38.721 | 36.804 | 40.561 | 40.561 | 48.775 |
Trae Young | 38.233333 | 45.46 | 47.257143 | 47.630000 | 42.454 | 39.503 | 38.494 | 39.167 | 40.804 | 33.771 | 40.109 | 41.764 | 41.764 | 38.013 |
Nikola Jokic | 30.200000 | 38.52 | 40.385714 | 45.792727 | 28.880 | 27.623 | 29.457 | 29.628 | 30.016 | 29.194 | 28.878 | 29.591 | 29.591 | 29.557 |
Rudy Gobert | 36.766667 | 41.62 | 43.214286 | 41.232692 | 36.224 | 36.721 | 36.584 | 37.523 | 39.966 | 36.719 | 38.337 | 32.877 | 32.877 | 37.733 |
Bradley Beal | 36.000000 | 44.60 | 46.071429 | 44.806522 | 38.611 | 41.158 | 36.464 | 37.119 | 37.083 | 38.721 | 38.253 | 38.598 | 38.598 | 37.461 |
Brandon Ingram | 41.266667 | 43.38 | 44.171429 | 41.010638 | 30.962 | 33.092 | 33.252 | 35.190 | 35.199 | 38.062 | 35.300 | 30.588 | 30.588 | 34.534 |
LaMarcus Aldridge | 46.233333 | 47.98 | 44.814286 | 37.226000 | 45.891 | 44.723 | 45.376 | 42.473 | 45.129 | 44.557 | 45.614 | 45.393 | 45.393 | 44.914 |
Jrue Holiday | 36.200000 | 39.22 | 37.685714 | 39.428261 | 29.039 | 30.884 | 29.788 | 33.861 | 28.165 | 30.378 | 30.390 | 29.893 | 29.893 | 33.173 |
Domantas Sabonis | 44.666667 | 39.86 | 42.314286 | 41.486538 | 47.929 | 47.896 | 50.568 | 48.904 | 47.114 | 47.131 | 49.190 | 50.210 | 50.210 | 48.082 |
Nikola Vucevic | 41.933333 | 43.46 | 45.371429 | 41.504545 | 39.500 | 40.864 | 41.355 | 39.099 | 33.890 | 40.335 | 42.799 | 37.660 | 37.660 | 41.401 |
Jayson Tatum | 29.600000 | 33.32 | 39.357143 | 39.728000 | 31.504 | 30.233 | 28.405 | 28.372 | 28.621 | 29.331 | 27.792 | 29.397 | 29.397 | 29.901 |
Zach LaVine | 35.666667 | 38.22 | 40.357143 | 39.712727 | 39.558 | 40.813 | 39.945 | 40.111 | 42.163 | 39.952 | 41.662 | 38.578 | 38.578 | 40.870 |
Shai Gilgeous-Alexander | 47.600000 | 43.04 | 43.800000 | 35.169091 | 46.909 | 46.443 | 49.452 | 47.036 | 47.595 | 46.924 | 46.995 | 46.933 | 46.933 | 41.715 |
dataset.sort_values('Game_Date').head()
Player | Match_Up | Game_Date | FP | Last3 | Last5 | Last7 | SeasonAve | |
---|---|---|---|---|---|---|---|---|
14396 | Troy Daniels | LAL @ LAC | 2019-10-22 | 6.5 | 6.5 | 6.5 | 6.5 | 7.640625 |
14395 | Terence Davis | TOR vs. NOP | 2019-10-22 | 20.0 | 20.0 | 20.0 | 20.0 | 16.605660 |
14394 | Serge Ibaka | TOR vs. NOP | 2019-10-22 | 19.0 | 19.0 | 19.0 | 19.0 | 30.009091 |
14393 | Quinn Cook | LAL @ LAC | 2019-10-22 | 7.2 | 7.2 | 7.2 | 7.2 | 12.844444 |
14392 | Patrick Patterson | LAC vs. LAL | 2019-10-22 | 7.6 | 7.6 | 7.6 | 7.6 | 11.119444 |
datasetNew = dataset.copy()
datasetNew.set_index('Game_Date', inplace=True)
datasetNew = datasetNew[(datasetNew.index=='2020-1-4')]
datasetNew = datasetNew[['FP', 'Player']].set_index('Player')
datasetNew.head()
FP | |
---|---|
Player | |
Malik Beasley | 11.0 |
Marvin Williams | 11.2 |
Maurice Harkless | 1.6 |
Matthew Dellavedova | 5.2 |
Mason Plumlee | 11.5 |
datasetNew = datasetNew.join(compareDF, how='right').drop(['Last3', 'Last5', 'Last7', 'SeasonAve'], axis=1)
datasetNew.head(25)
FP | predict_0 | predict_1 | predict_2 | predict_3 | predict_4 | predict_5 | predict_6 | predict_7 | predict_8 | predict_9 | |
---|---|---|---|---|---|---|---|---|---|---|---|
Nickname | |||||||||||
Giannis Antetokounmpo | 46.1 | 48.020 | 45.921 | 56.302 | 45.614 | 49.614 | 49.023 | 57.442 | 56.402 | 56.402 | 49.266 |
Luka Doncic | 69.4 | 53.414 | 54.092 | 45.207 | 47.589 | 49.594 | 43.641 | 54.040 | 48.112 | 48.112 | 51.154 |
Andre Drummond | 51.1 | 45.758 | 48.525 | 38.774 | 50.783 | 46.318 | 38.721 | 36.804 | 40.561 | 40.561 | 48.775 |
Trae Young | 59.8 | 42.454 | 39.503 | 38.494 | 39.167 | 40.804 | 33.771 | 40.109 | 41.764 | 41.764 | 38.013 |
Nikola Jokic | 33.0 | 28.880 | 27.623 | 29.457 | 29.628 | 30.016 | 29.194 | 28.878 | 29.591 | 29.591 | 29.557 |
Rudy Gobert | 37.4 | 36.224 | 36.721 | 36.584 | 37.523 | 39.966 | 36.719 | 38.337 | 32.877 | 32.877 | 37.733 |
Bradley Beal | NaN | 38.611 | 41.158 | 36.464 | 37.119 | 37.083 | 38.721 | 38.253 | 38.598 | 38.598 | 37.461 |
Brandon Ingram | 31.5 | 30.962 | 33.092 | 33.252 | 35.190 | 35.199 | 38.062 | 35.300 | 30.588 | 30.588 | 34.534 |
LaMarcus Aldridge | 31.0 | 45.891 | 44.723 | 45.376 | 42.473 | 45.129 | 44.557 | 45.614 | 45.393 | 45.393 | 44.914 |
Jrue Holiday | 35.5 | 29.039 | 30.884 | 29.788 | 33.861 | 28.165 | 30.378 | 30.390 | 29.893 | 29.893 | 33.173 |
Domantas Sabonis | 45.2 | 47.929 | 47.896 | 50.568 | 48.904 | 47.114 | 47.131 | 49.190 | 50.210 | 50.210 | 48.082 |
Nikola Vucevic | 45.6 | 39.500 | 40.864 | 41.355 | 39.099 | 33.890 | 40.335 | 42.799 | 37.660 | 37.660 | 41.401 |
Jayson Tatum | 48.4 | 31.504 | 30.233 | 28.405 | 28.372 | 28.621 | 29.331 | 27.792 | 29.397 | 29.397 | 29.901 |
Zach LaVine | 53.9 | 39.558 | 40.813 | 39.945 | 40.111 | 42.163 | 39.952 | 41.662 | 38.578 | 38.578 | 40.870 |
Shai Gilgeous-Alexander | 32.4 | 46.909 | 46.443 | 49.452 | 47.036 | 47.595 | 46.924 | 46.995 | 46.933 | 46.933 | 41.715 |
DeMar DeRozan | 40.3 | 45.846 | 38.644 | 43.254 | 38.582 | 39.691 | 37.868 | 38.269 | 45.505 | 45.505 | 44.297 |
John Collins | NaN | 39.538 | 30.167 | 30.732 | 31.315 | 29.619 | 36.632 | 39.941 | 30.114 | 30.114 | 30.765 |
Donovan Mitchell | 45.6 | 29.653 | 30.150 | 30.764 | 35.792 | 37.174 | 31.432 | 35.533 | 30.085 | 30.085 | 31.880 |
Devonte' Graham | 51.9 | 37.515 | 36.359 | 35.027 | 35.989 | 33.470 | 39.104 | 34.536 | 34.392 | 34.392 | 34.725 |
Jaylen Brown | 27.9 | 38.922 | 34.664 | 36.564 | 38.599 | 36.291 | 36.200 | 36.422 | 36.972 | 36.972 | 29.734 |
De'Aaron Fox | 28.1 | 43.047 | 59.524 | 41.899 | 56.430 | 49.339 | 44.586 | 43.075 | 45.276 | 45.276 | 44.776 |
Khris Middleton | 29.5 | 41.324 | 32.948 | 37.726 | 33.896 | 32.535 | 32.643 | 37.300 | 32.364 | 32.364 | 36.606 |
Chris Paul | 22.8 | 38.750 | 38.642 | 40.023 | 39.130 | 37.768 | 40.970 | 37.033 | 40.846 | 40.846 | 38.045 |
Gordon Hayward | 34.5 | 35.018 | 31.095 | 31.078 | 30.433 | 31.381 | 30.300 | 31.134 | 32.538 | 32.538 | 34.857 |
Kevin Love | 14.1 | 35.889 | 34.705 | 35.503 | 35.780 | 35.203 | 34.807 | 33.254 | 35.588 | 35.588 | 35.813 |
Conclusion
This method can be utilized for as many model iterations as you would like, we just chose 10 today to keep it quick and simple. There are pros and cons to this method just as there are for training the model every usage. This method is quicker, but will not be taking into account the most recent stats for training the model on. The previous method takes longer to run each time, but will always be utilizing the most up to date information to train the model with.