High Floor and Ceiling Part 3

Now that we have a better understanding of what standard deviations, and Z-scores are, and how they can be applied to our stats here, we can start to work through how to functionally apply this and identify different types of players for different player pools.

One big issue you may have considered already is if a player plays for more than one team over the course of the season. It's not exactly uncommon for a player to have different levels of productivity and consistency when moving from team to team as these things can depend on teammates, how often they are utilized in the offensive scheme, etc. We will take a quick look at an example here, and then work towards breaking out our calculations by a combination of player and team in order to have more accurate numbers to work with.

As a quick example, we will take a look at Marcus Morris who played for both the Knicks and Clippers over the course of this season (when the stats were pulled), but only played a few games for the clippers so his standard deviation will likely be skewed.

import pandas as pd
df = pd.read_excel(r"C:\Users\nfwya\OneDrive\Code Backup\nba\boxScores\20200215_Pandas.xlsx")
dfM = df[df['Player'] == 'Marcus Morris Sr.']
dfM.head(17)
Player Match_Up Game_Date MIN PTS REB AST STL BLK TOV FP
49 Marcus Morris Sr. NYK vs. POR 01/01/2020 33 18 7 3 0 0 0 30.9
297 Marcus Morris Sr. NYK @ PHX 01/03/2020 35 25 3 2 2 1 0 40.6
574 Marcus Morris Sr. NYK @ LAC 01/05/2020 35 38 5 1 2 1 3 51.5
1949 Marcus Morris Sr. NYK vs. PHX 01/16/2020 33 17 3 1 0 0 1 21.1
2217 Marcus Morris Sr. NYK vs. PHI 01/18/2020 36 20 6 3 0 0 1 30.7
2476 Marcus Morris Sr. NYK @ CLE 01/20/2020 30 19 3 0 1 0 0 25.6
2717 Marcus Morris Sr. NYK vs. LAL 01/22/2020 37 20 6 1 0 0 4 24.7
2955 Marcus Morris Sr. NYK vs. TOR 01/24/2020 37 21 10 2 0 0 1 35.0
3179 Marcus Morris Sr. NYK vs. BKN 01/26/2020 34 21 4 0 1 1 1 30.8
3412 Marcus Morris Sr. NYK @ CHA 01/28/2020 35 23 4 1 1 0 4 28.3
3527 Marcus Morris Sr. NYK vs. MEM 01/29/2020 26 17 6 0 1 1 4 26.2
3894 Marcus Morris Sr. NYK @ IND 02/01/2020 35 28 6 0 1 0 2 36.2
4125 Marcus Morris Sr. NYK @ CLE 02/03/2020 36 26 2 1 0 0 1 28.9
4810 Marcus Morris Sr. LAC @ CLE 02/09/2020 22 10 4 2 3 0 1 25.8
5042 Marcus Morris Sr. LAC @ PHI 02/11/2020 35 13 5 1 0 1 1 22.5
5274 Marcus Morris Sr. LAC @ BOS 02/13/2020 42 10 8 2 0 2 2 26.6
5453 Marcus Morris Sr. NYK @ SAS 10/23/2019 39 26 4 1 3 0 1 40.3

As we can see here, Up until February 3, 2020 Marcus was playing with the Knicks, but starting on February 9th, he is a member of the Clippers. This data was pulled in mid February of that year so he had only played 3 games with the clippers at that time, and it seems his fantasy output was pretty consistent over those 3 games with them, so his standard deviation will likely be quite different than his with the Knicks.

Now, we need to isolate the player's team and create a new column to hold that data for each player for every game they play. In order to do that, we will utilize a lambda function as we have done in the past to pull the team abbreviation out of the match up field. Luckily for us, there is very little logic we need to build into this function, as the player's team is always listed first in the match up field regardless of whether they were home or away that day.

So let's go ahead and define that lambda function, apply it to a new column titled 'Team', then we will go ahead and calculate the mean and standard deviation using both the Player and Team fields as our index values.

teamFxn = lambda x: x['Match_Up'].split(" ")[0]
df['Team'] = df.apply(teamFxn, axis=1)
dfStat = df.groupby(['Player', 'Team']).FP.agg({'mean', 'std'})
df = df.merge(dfStat, left_on=['Player', 'Team'], right_index=True)
df = df.sort_values('Game_Date')
df.head()
Player Match_Up Game_Date MIN PTS REB AST STL BLK TOV FP Team mean std
0 Anfernee Simons POR @ NYK 01/01/2020 26 3 3 3 1 0 1 13.1 POR 14.944444 7.667123
10 CJ McCollum POR @ NYK 01/01/2020 32 17 4 4 0 0 0 27.8 POR 33.944231 10.935020
46 Kyle Korver MIL vs. MIN 01/01/2020 18 8 4 2 0 1 3 15.8 MIL 12.036957 6.266112
47 Kyle Kuzma LAL vs. PHX 01/01/2020 27 19 4 1 1 0 0 28.3 LAL 20.513636 9.673642
48 LeBron James LAL vs. PHX 01/01/2020 38 31 13 12 2 1 5 68.6 LAL 51.711765 9.848099

Now we can go ahead and check back on our main man Marcus Morris and see if we calculated everything correctly, and if our hunch on his standard deviation being much smaller with the Clippers is true.

dfM = df[df['Player'] == 'Marcus Morris Sr.']
dfM.head(17)
Player Match_Up Game_Date MIN PTS REB AST STL BLK TOV FP Team mean std
49 Marcus Morris Sr. NYK vs. POR 01/01/2020 33 18 7 3 0 0 0 30.9 NYK 29.909302 10.926286
297 Marcus Morris Sr. NYK @ PHX 01/03/2020 35 25 3 2 2 1 0 40.6 NYK 29.909302 10.926286
574 Marcus Morris Sr. NYK @ LAC 01/05/2020 35 38 5 1 2 1 3 51.5 NYK 29.909302 10.926286
1949 Marcus Morris Sr. NYK vs. PHX 01/16/2020 33 17 3 1 0 0 1 21.1 NYK 29.909302 10.926286
2217 Marcus Morris Sr. NYK vs. PHI 01/18/2020 36 20 6 3 0 0 1 30.7 NYK 29.909302 10.926286
2476 Marcus Morris Sr. NYK @ CLE 01/20/2020 30 19 3 0 1 0 0 25.6 NYK 29.909302 10.926286
2717 Marcus Morris Sr. NYK vs. LAL 01/22/2020 37 20 6 1 0 0 4 24.7 NYK 29.909302 10.926286
2955 Marcus Morris Sr. NYK vs. TOR 01/24/2020 37 21 10 2 0 0 1 35.0 NYK 29.909302 10.926286
3179 Marcus Morris Sr. NYK vs. BKN 01/26/2020 34 21 4 0 1 1 1 30.8 NYK 29.909302 10.926286
3412 Marcus Morris Sr. NYK @ CHA 01/28/2020 35 23 4 1 1 0 4 28.3 NYK 29.909302 10.926286
3527 Marcus Morris Sr. NYK vs. MEM 01/29/2020 26 17 6 0 1 1 4 26.2 NYK 29.909302 10.926286
3894 Marcus Morris Sr. NYK @ IND 02/01/2020 35 28 6 0 1 0 2 36.2 NYK 29.909302 10.926286
4125 Marcus Morris Sr. NYK @ CLE 02/03/2020 36 26 2 1 0 0 1 28.9 NYK 29.909302 10.926286
4810 Marcus Morris Sr. LAC @ CLE 02/09/2020 22 10 4 2 3 0 1 25.8 LAC 24.966667 2.173323
5042 Marcus Morris Sr. LAC @ PHI 02/11/2020 35 13 5 1 0 1 1 22.5 LAC 24.966667 2.173323
5274 Marcus Morris Sr. LAC @ BOS 02/13/2020 42 10 8 2 0 2 2 26.6 LAC 24.966667 2.173323
5453 Marcus Morris Sr. NYK @ SAS 10/23/2019 39 26 4 1 3 0 1 40.3 NYK 29.909302 10.926286

Sure enough, Marcus' numbers for the knicks and clippers do in fact differ, and his standard deviation is significantly lower with the Clippers as well. This brings up another good aspect to consider, how do we account for skewed deviations when working with smaller sample sizes? And is this a big enough problem for us to even worry about?

To start investigating this issue, we are going to create a new dataframe consisting of just the Player, Team, Mean, and Standard Deviation, and then drop all duplicates and sort by standard deviation. This will allow us to only have to see a single record for each player/team combination, and review the players with the lowest standard deviations. However we are not going to permanently drop the duplicates at this time, just create a temporary dataframe to take a quick look.

dfPlayers = df[['Player', 'Team', 'mean', 'std']]
dfPlayers.drop_duplicates().sort_values('std').head(15)
Player Team mean std
4774 Glenn Robinson III PHI 10.950000 0.353553
4765 Dewayne Dedmon ATL 31.066667 1.078579
4810 Marcus Morris Sr. LAC 24.966667 2.173323
2322 Allen Crabbe MIN 6.671429 3.526431
4902 James Ennis III ORL 9.850000 3.747666
2584 Anthony Tolliver SAC 5.550000 4.122499
2124 Chris Chiozza BKN 6.200000 4.949747
5346 Chris Chiozza WAS 14.375000 4.977880
208 Theo Pinson BKN 11.204348 5.060675
168 Matthew Dellavedova CLE 8.887500 5.075744
447 Marco Belinelli SAS 10.885366 5.539114
513 Yogi Ferrell SAC 10.096875 5.574914
318 Troy Daniels LAL 7.640625 5.629580
480 Semi Ojeleye BOS 8.165116 5.659627
112 Dzanan Musa BKN 11.570833 5.664495

Taking a look at this list of players with the lowest standard deviations, we can pretty quickly notice that it is mostly comprised of players that either do not play in very many games, or players that we already know were traded over the course of the season. In order to keep track of this so we don't have to manually identify these outliers every time, we are going to introduce a new data field called Key. In this field we are going to concatenate the Player and Team fields, then run a quick count function over our full dataframe (not the temporary one where we dropped the duplicates) to see how many games that player played for that team to produce these numbers.

dfPlayers['Key'] = dfPlayers['Player'] + dfPlayers['Team']
dfPlayerTeam = dfPlayers.groupby('Key').count()
C:\Users\nfwya\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
dfPlayerTeam.reset_index(inplace=True)
dfPlayerTeam = dfPlayerTeam[['Key', 'Player']]
dfPlayerTeam
Key Player
0 Aaron GordonORL 49
1 Aaron HolidayIND 49
2 Abdel NaderOKC 34
3 Al HorfordPHI 50
4 Al-Farouq AminuORL 17
... ... ...
364 Will BartonDEN 48
365 Willie Cauley-SteinDAL 7
366 Willie Cauley-SteinGSW 41
367 Yogi FerrellSAC 32
368 Zach LaVineCHI 55

369 rows × 2 columns

Now that we have the number of games each player played for their respective team at the time this data was pulled, we can go ahead and add this column to our main dataframe for analysis.

dfPlayers = dfPlayers.merge(dfPlayerTeam, on=['Key'])
dfPlayers.columns = ['Player', 'Team', 'mean', 'std', 'Key', 'games']
dfPlayers.drop_duplicates().head()
Player Team mean std Key games
0 Anfernee Simons POR 14.944444 7.667123 Anfernee SimonsPOR 54
54 CJ McCollum POR 33.944231 10.935020 CJ McCollumPOR 52
106 Kyle Korver MIL 12.036957 6.266112 Kyle KorverMIL 46
152 Kyle Kuzma LAL 20.513636 9.673642 Kyle KuzmaLAL 44
196 LeBron James LAL 51.711765 9.848099 LeBron JamesLAL 51

After that, we can now see how many games a player has played to formulate these metrics. Let's go ahead and stick with our Marcus Morris example to see how that works.

dfPlayers.drop_duplicates().loc[dfPlayers['Player'] == 'Marcus Morris Sr.']
Player Team mean std Key games
247 Marcus Morris Sr. NYK 29.909302 10.926286 Marcus Morris Sr.NYK 43
14049 Marcus Morris Sr. LAC 24.966667 2.173323 Marcus Morris Sr.LAC 3

Now that we have the number of games as a metric, we can easily see that we shouldn't put much faith in the small standard deviation for Marcus with the clippers since he only has 3 games under his belt contributing to it. We can probably feel confident about his average since it is fairly close to his previous average with the Knicks, as long as there hasn't been a major injury or anything elevating his playtime in these 3 games (but it's the Clippers so let's be real. PG and Kawhi only play when they feel like it).

Now that we have the metrics we need, we're going to go ahead and create a new permanent dataframe, with the duplicates dropped for good this time, and we are also going to filter out player that have played less than 20 games for their respective team. I arbitrarily picked 20 games here, this is a variable that will depend on how many games have been played in the season, and your comfort with smaller sample sizes.

df20 = dfPlayers[dfPlayers['games'] >= 20].drop_duplicates()
df20.describe()
mean std games
count 317.000000 317.000000 317.000000
mean 23.719149 9.421841 43.952681
std 10.065573 1.846430 8.967291
min 7.640625 5.060675 20.000000
25% 15.954545 8.100564 38.000000
50% 21.855769 9.519882 46.000000
75% 29.602439 10.471087 52.000000
max 57.629167 15.366266 57.000000

As we can see here, filtering out players with less than 20 games leaves us with a much more expected set of standard deviations. The lowest being 5.06, and the average being 9.4 . Let's go ahead and take a look at the 15 players with the lowest standard deviations in this filtered subset.

df20.sort_values('std').head(15)
Player Team mean std Key games
8222 Theo Pinson BKN 11.204348 5.060675 Theo PinsonBKN 23
7957 Matthew Dellavedova CLE 8.887500 5.075744 Matthew DellavedovaCLE 40
12068 Marco Belinelli SAS 10.885366 5.539114 Marco BelinelliSAS 41
12440 Yogi Ferrell SAC 10.096875 5.574914 Yogi FerrellSAC 32
10097 Troy Daniels LAL 7.640625 5.629580 Troy DanielsLAL 32
12221 Semi Ojeleye BOS 8.165116 5.659627 Semi OjeleyeBOS 43
5919 Dzanan Musa BKN 11.570833 5.664495 Dzanan MusaBKN 24
8551 Terrance Ferguson OKC 10.105000 5.686419 Terrance FergusonOKC 40
4667 Justin Jackson DAL 11.520930 5.969588 Justin JacksonDAL 43
9820 Mike Scott PHI 11.566038 6.128965 Mike ScottPHI 53
5754 Georges Niang UTA 10.667442 6.136996 Georges NiangUTA 43
12949 Ryan Arcidiacono CHI 10.051064 6.157494 Ryan ArcidiaconoCHI 47
12595 Noah Vonleh MIN 12.916000 6.225129 Noah VonlehMIN 25
14098 Jeff Green UTA 13.306667 6.254154 Jeff GreenUTA 30
106 Kyle Korver MIL 12.036957 6.266112 Kyle KorverMIL 46

Looking at this subset of players, it's not too hard to figure out why the standard deviations are so low. They just don't play a ton, and when they do they aren't a primary option, so they typically get the same looks and the same minutes every game that they do play. Another thing that may be beneficial for creating player pools would be filtering out players with an average below a certain threshold just like we did for players below the number of games threshold.

df2020 = df20[df20['mean'] >= 20]
df2020.sort_values('std').head(15)
Player Team mean std Key games
976 Terrence Ross ORL 22.060377 6.974330 Terrence RossORL 53
5031 Danilo Gallinari OKC 30.508511 7.040313 Danilo GallinariOKC 47
1992 Evan Fournier ORL 28.357407 7.425265 Evan FournierORL 54
3823 Collin Sexton CLE 28.822222 7.483710 Collin SextonCLE 54
5375 Dorian Finney-Smith DAL 21.092727 7.626364 Dorian Finney-SmithDAL 55
3380 Cedi Osman CLE 20.314815 7.631771 Cedi OsmanCLE 54
1760 Donte DiVincenzo MIL 22.754167 7.777024 Donte DiVincenzoMIL 48
7703 Maxi Kleber DAL 20.112963 7.798283 Maxi KleberDAL 54
5873 Gary Harris DEN 20.600000 7.953532 Gary HarrisDEN 46
12399 Willie Cauley-Stein GSW 23.634146 8.024388 Willie Cauley-SteinGSW 41
3985 Bojan Bogdanovic UTA 28.577358 8.046156 Bojan BogdanovicUTA 53
3471 Joe Harris BKN 23.548077 8.067254 Joe HarrisBKN 52
11908 Garrett Temple BKN 20.454167 8.076469 Garrett TempleBKN 48
5706 Glenn Robinson III GSW 23.847917 8.100564 Glenn Robinson IIIGSW 48
5988 Ivica Zubac LAC 20.785455 8.119374 Ivica ZubacLAC 55

These are going to be the players with the lowest standard deviations after we filter out all players with a per game average of less than 20 fpts. These would be good filler players to take a look at if they are cheap and have a favorable matchup because you can be pretty confident with their floor.

Let's also take a look at the players with the highest per game average and see how much different their standard deviations are.

df2020.sort_values('mean', ascending=False).head(15)
Player Team mean std Key games
2220 Giannis Antetokounmpo MIL 57.629167 10.903698 Giannis AntetokounmpoMIL 48
10935 James Harden HOU 57.403922 14.117804 James HardenHOU 51
9542 Luka Doncic DAL 53.890698 12.238519 Luka DoncicDAL 43
3000 Anthony Davis LAL 52.032609 13.066106 Anthony DavisLAL 46
196 LeBron James LAL 51.711765 9.848099 LeBron JamesLAL 51
13659 Karl-Anthony Towns MIN 49.260000 11.960086 Karl-Anthony TownsMIN 35
9999 Russell Westbrook HOU 49.133333 11.790963 Russell WestbrookHOU 45
4528 Kawhi Leonard LAC 48.547619 9.790494 Kawhi LeonardLAC 42
4758 Andre Drummond DET 48.367347 12.757749 Andre DrummondDET 49
9894 Trae Young ATL 47.630000 15.366266 Trae YoungATL 50
2946 Damian Lillard POR 47.620370 12.798178 Damian LillardPOR 54
13516 Kyrie Irving BKN 46.130000 14.420712 Kyrie IrvingBKN 20
8699 Nikola Jokic DEN 45.792727 13.050612 Nikola JokicDEN 55
10394 Joel Embiid PHI 45.474359 11.946123 Joel EmbiidPHI 39
1243 Bradley Beal WAS 44.806522 11.874144 Bradley BealWAS 46

The players that make up the highest average fpts list is no surprise, however some of the standard deviations may be alittle surprising. While most of them are with a few points of each other, this can help drive which high dollar players to roll with in gpp's vs cash games depending on their matchups.

Now, for strictly gpp purposes, lets' take a look at the players with the HIGHEST standard devations to see the high risk plays.

df2020.sort_values('std', ascending=False).head(15)
Player Team mean std Key games
9894 Trae Young ATL 47.630000 15.366266 Trae YoungATL 50
13516 Kyrie Irving BKN 46.130000 14.420712 Kyrie IrvingBKN 20
10935 James Harden HOU 57.403922 14.117804 James HardenHOU 51
8245 Paul George LAC 37.467647 13.513511 Paul GeorgeLAC 34
13483 D'Angelo Russell GSW 37.981818 13.449918 D'Angelo RussellGSW 33
11014 Jabari Parker ATL 28.350000 13.209039 Jabari ParkerATL 32
2268 Hassan Whiteside POR 42.776471 13.098131 Hassan WhitesidePOR 51
3000 Anthony Davis LAL 52.032609 13.066106 Anthony DavisLAL 46
8699 Nikola Jokic DEN 45.792727 13.050612 Nikola JokicDEN 55
2591 Jordan McRae WAS 24.307143 12.865974 Jordan McRaeWAS 28
2872 Gorgui Dieng MIN 20.515556 12.837462 Gorgui DiengMIN 45
3577 Christian Wood DET 22.750980 12.827180 Christian WoodDET 51
2946 Damian Lillard POR 47.620370 12.798178 Damian LillardPOR 54
4758 Andre Drummond DET 48.367347 12.757749 Andre DrummondDET 49
10433 Clint Capela HOU 38.753846 12.661937 Clint CapelaHOU 39

This grouping of players is a nice mix of high performers and middle of the pack guys, but that is good since it demonstrates there are valuable high risk plays at all levels of salary. These picks will be very matchup dependent, and likely lineup dependent for guys like jabari parker and gorgui dieng. But it is always good to know which players have the biggest difference in impact based on those scenarios to factor into your lineups.

Like every other metric and predictive measure, there is no one correct metric to use. Merely more tools to put into your toolbox to use when appropriate. Now you can add standard deviation to that list and begin taking a look at what types of players are consistent, and factor in how likely they are to be scoring within a certain threshold given a favorable/unfavorable matchup.

Next
Next

High Ceilings and Floors Part Two