Monte Carlo Part 3

Applying Monte-Carlo(esque) Simulation to Real Life Scenario (Part 2)

This lesson we will be building upon our previous example, except this time we will be training our models one time, then saving them to our machines for later use. This is a more efficient process as the training of the model is one of the more time consuming parts of our previous iteration. Offloading that as a one time process and simply calling the trained models into service each time will potentially save quite a bit of time depending on how many models you are running, and how complex they are.

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split 
import random
dataset = pd.read_excel("SampleDataForYT.xlsx")
#dataset.head()
playerList = pd.read_csv("20200104.csv")
#playerList.head()
dataset = dataset.sort_values('Game_Date', ascending=False)
playerList = playerList[playerList['Injury Indicator'] != 'O'].set_index('Nickname')
playerList = playerList[[ 'Team']]
dataset2 = dataset.copy()
dataset2.set_index('Game_Date', inplace=True)
#dataset2.head()
dataset3 = dataset2[(dataset2.index<='2020-1-3')]
#dataset3.head()
#dataset3.sort_values('Player').head(15)
dataset3 = dataset3.groupby(["Player","SeasonAve"]).first()
#dataset3.sort_values('Player').head(25)
dataset3 = dataset3.reset_index().set_index('Player')
#dataset3.head(25)
playerList2 = playerList.join(dataset3, how='left')
#playerList2.head(30)
playerList2 = playerList2[['Last3', 'Last5', 'Last7', 'SeasonAve']]
#playerList2.sort_values('Last7')
playerList2.dropna(inplace=True)
#playerList2.sort_values('Last7')
dataset4 = dataset[['Player', 'Last3', 'Last5', 'Last7','SeasonAve', 'FP']].set_index('Player')
#dataset4.head()
featureNames = ['Last3', 'Last5', 'Last7', 'SeasonAve']
labelName = ['FP']
dfFeatures = dataset4[featureNames]
#dfFeatures.head()
dfLabels = dataset4[labelName]
#dfLabels.head()
labels = np.array(dfLabels)
features = np.array(dfFeatures)

Alright, now that we have our data all prepped and ready to go, it's time to start differing from our previous approach. The first thing we need to do is import the pickle module to be able to save our models for future usage.

import pickle

Now, the change we are going to make to our for loop is once we fit our model to our dataset, we are going to pickle it up and save it to a local location.

##WHEN YOU DON'T WANT TO RETRAIN YOUR MODELS EVERY SINGLE CONTEST DAY
for i in range(0,10):
    n = random.randint(0,100)
    train, test, trainLabels, testLabels = train_test_split(features, labels, test_size=(0.4), random_state = n)
    reg = RandomForestRegressor(random_state=n)
    reg.fit(train, trainLabels)
    with open(f'models\\model{i}.txt', 'wb') as myFile:
        pickle.dump(reg, myFile)

Now that we have saved our models locally, we are going to check and make sure they all were saved appropriately by import the os module and checking the contents of that folder we saved them to.

import os
models = os.listdir('models')
models
['model0.txt',
 'model1.txt',
 'model2.txt',
 'model3.txt',
 'model4.txt',
 'model5.txt',
 'model6.txt',
 'model7.txt',
 'model8.txt',
 'model9.txt']

Now that we can see our models are indeed saved where we intended them to be, it's time to start calling them into action.

To do that, we are going to update our code from our previous lesson. We are still going to create a compareDF to append our results to, then initiate our for loop to run one time for each model in our models folder, define the full file path for the model, load our model with the pickle module, then calculate our predictions for our dataset, then append them into our dataframe just keeping the numbered portion of the model name to show in the column title.

for model in models:
    print(os.path.join('models',model))
models\model0.txt
models\model1.txt
models\model2.txt
models\model3.txt
models\model4.txt
models\model5.txt
models\model6.txt
models\model7.txt
models\model8.txt
models\model9.txt
compareDF = playerList2.copy()
for model in models:
    modelPath = os.path.join('models',model)
    with open(modelPath, 'rb') as myFile:
        reg = pickle.load(myFile)
        predictions = reg.predict(playerList2)
        compareDF[f'predict_{model[5:6]}'] = predictions

Now we will go ahead and compare our predictions back to our dataset just as we had done previously.

compareDF.head(15)
Last3 Last5 Last7 SeasonAve predict_0 predict_1 predict_2 predict_3 predict_4 predict_5 predict_6 predict_7 predict_8 predict_9
Nickname
Giannis Antetokounmpo 50.233333 53.24 57.100000 57.629167 48.020 45.921 56.302 45.614 49.614 49.023 57.442 56.402 56.402 49.266
Luka Doncic 49.633333 53.56 54.014286 53.890698 53.414 54.092 45.207 47.589 49.594 43.641 54.040 48.112 48.112 51.154
Andre Drummond 48.866667 43.38 45.028571 48.035294 45.758 48.525 38.774 50.783 46.318 38.721 36.804 40.561 40.561 48.775
Trae Young 38.233333 45.46 47.257143 47.630000 42.454 39.503 38.494 39.167 40.804 33.771 40.109 41.764 41.764 38.013
Nikola Jokic 30.200000 38.52 40.385714 45.792727 28.880 27.623 29.457 29.628 30.016 29.194 28.878 29.591 29.591 29.557
Rudy Gobert 36.766667 41.62 43.214286 41.232692 36.224 36.721 36.584 37.523 39.966 36.719 38.337 32.877 32.877 37.733
Bradley Beal 36.000000 44.60 46.071429 44.806522 38.611 41.158 36.464 37.119 37.083 38.721 38.253 38.598 38.598 37.461
Brandon Ingram 41.266667 43.38 44.171429 41.010638 30.962 33.092 33.252 35.190 35.199 38.062 35.300 30.588 30.588 34.534
LaMarcus Aldridge 46.233333 47.98 44.814286 37.226000 45.891 44.723 45.376 42.473 45.129 44.557 45.614 45.393 45.393 44.914
Jrue Holiday 36.200000 39.22 37.685714 39.428261 29.039 30.884 29.788 33.861 28.165 30.378 30.390 29.893 29.893 33.173
Domantas Sabonis 44.666667 39.86 42.314286 41.486538 47.929 47.896 50.568 48.904 47.114 47.131 49.190 50.210 50.210 48.082
Nikola Vucevic 41.933333 43.46 45.371429 41.504545 39.500 40.864 41.355 39.099 33.890 40.335 42.799 37.660 37.660 41.401
Jayson Tatum 29.600000 33.32 39.357143 39.728000 31.504 30.233 28.405 28.372 28.621 29.331 27.792 29.397 29.397 29.901
Zach LaVine 35.666667 38.22 40.357143 39.712727 39.558 40.813 39.945 40.111 42.163 39.952 41.662 38.578 38.578 40.870
Shai Gilgeous-Alexander 47.600000 43.04 43.800000 35.169091 46.909 46.443 49.452 47.036 47.595 46.924 46.995 46.933 46.933 41.715
dataset.sort_values('Game_Date').head()
Player Match_Up Game_Date FP Last3 Last5 Last7 SeasonAve
14396 Troy Daniels LAL @ LAC 2019-10-22 6.5 6.5 6.5 6.5 7.640625
14395 Terence Davis TOR vs. NOP 2019-10-22 20.0 20.0 20.0 20.0 16.605660
14394 Serge Ibaka TOR vs. NOP 2019-10-22 19.0 19.0 19.0 19.0 30.009091
14393 Quinn Cook LAL @ LAC 2019-10-22 7.2 7.2 7.2 7.2 12.844444
14392 Patrick Patterson LAC vs. LAL 2019-10-22 7.6 7.6 7.6 7.6 11.119444
datasetNew = dataset.copy()
datasetNew.set_index('Game_Date', inplace=True)
datasetNew = datasetNew[(datasetNew.index=='2020-1-4')]
datasetNew = datasetNew[['FP', 'Player']].set_index('Player')
datasetNew.head()
FP
Player
Malik Beasley 11.0
Marvin Williams 11.2
Maurice Harkless 1.6
Matthew Dellavedova 5.2
Mason Plumlee 11.5
datasetNew = datasetNew.join(compareDF, how='right').drop(['Last3', 'Last5', 'Last7', 'SeasonAve'], axis=1)
datasetNew.head(25)
FP predict_0 predict_1 predict_2 predict_3 predict_4 predict_5 predict_6 predict_7 predict_8 predict_9
Nickname
Giannis Antetokounmpo 46.1 48.020 45.921 56.302 45.614 49.614 49.023 57.442 56.402 56.402 49.266
Luka Doncic 69.4 53.414 54.092 45.207 47.589 49.594 43.641 54.040 48.112 48.112 51.154
Andre Drummond 51.1 45.758 48.525 38.774 50.783 46.318 38.721 36.804 40.561 40.561 48.775
Trae Young 59.8 42.454 39.503 38.494 39.167 40.804 33.771 40.109 41.764 41.764 38.013
Nikola Jokic 33.0 28.880 27.623 29.457 29.628 30.016 29.194 28.878 29.591 29.591 29.557
Rudy Gobert 37.4 36.224 36.721 36.584 37.523 39.966 36.719 38.337 32.877 32.877 37.733
Bradley Beal NaN 38.611 41.158 36.464 37.119 37.083 38.721 38.253 38.598 38.598 37.461
Brandon Ingram 31.5 30.962 33.092 33.252 35.190 35.199 38.062 35.300 30.588 30.588 34.534
LaMarcus Aldridge 31.0 45.891 44.723 45.376 42.473 45.129 44.557 45.614 45.393 45.393 44.914
Jrue Holiday 35.5 29.039 30.884 29.788 33.861 28.165 30.378 30.390 29.893 29.893 33.173
Domantas Sabonis 45.2 47.929 47.896 50.568 48.904 47.114 47.131 49.190 50.210 50.210 48.082
Nikola Vucevic 45.6 39.500 40.864 41.355 39.099 33.890 40.335 42.799 37.660 37.660 41.401
Jayson Tatum 48.4 31.504 30.233 28.405 28.372 28.621 29.331 27.792 29.397 29.397 29.901
Zach LaVine 53.9 39.558 40.813 39.945 40.111 42.163 39.952 41.662 38.578 38.578 40.870
Shai Gilgeous-Alexander 32.4 46.909 46.443 49.452 47.036 47.595 46.924 46.995 46.933 46.933 41.715
DeMar DeRozan 40.3 45.846 38.644 43.254 38.582 39.691 37.868 38.269 45.505 45.505 44.297
John Collins NaN 39.538 30.167 30.732 31.315 29.619 36.632 39.941 30.114 30.114 30.765
Donovan Mitchell 45.6 29.653 30.150 30.764 35.792 37.174 31.432 35.533 30.085 30.085 31.880
Devonte' Graham 51.9 37.515 36.359 35.027 35.989 33.470 39.104 34.536 34.392 34.392 34.725
Jaylen Brown 27.9 38.922 34.664 36.564 38.599 36.291 36.200 36.422 36.972 36.972 29.734
De'Aaron Fox 28.1 43.047 59.524 41.899 56.430 49.339 44.586 43.075 45.276 45.276 44.776
Khris Middleton 29.5 41.324 32.948 37.726 33.896 32.535 32.643 37.300 32.364 32.364 36.606
Chris Paul 22.8 38.750 38.642 40.023 39.130 37.768 40.970 37.033 40.846 40.846 38.045
Gordon Hayward 34.5 35.018 31.095 31.078 30.433 31.381 30.300 31.134 32.538 32.538 34.857
Kevin Love 14.1 35.889 34.705 35.503 35.780 35.203 34.807 33.254 35.588 35.588 35.813

Conclusion

This method can be utilized for as many model iterations as you would like, we just chose 10 today to keep it quick and simple. There are pros and cons to this method just as there are for training the model every usage. This method is quicker, but will not be taking into account the most recent stats for training the model on. The previous method takes longer to run each time, but will always be utilizing the most up to date information to train the model with.

Next
Next

Monte Carlo Part 2