Draft Bargains 2020 (Sleeper)

Last updated on Aug 18, 2020 16 min read R, Webscraping

(EDIT (01.09.2020): Updated Version, since today is my Fantasy Draft)

Every year I try to compare the rankings on the fantasy football site I play on with consensus expert rankings in order to find exploits.
If a player is goes way earlier than he is evalued by experts, I will probably not draft him at all as it is not a good value.
If a player goes way later than projected by experts, I will maybe draft him, but I might get him later than I should be, hence the name Draft Bargain.
In this post we will compare the consensus Expert Ranking on FantasyPros with the previously scraped in-draft rankings from Sleeper. If you want to know how to do this, read my latest blog post.

Data Gathering

First, we should load our scraped mock ranks. Since we are only interested in skill positions and Quaterbacks, we remove everything else.

sleeper_ranks <- 
  #read_csv("https://maxhuebner.github.io/post/data/sleeper-mock-ranks-2020-09-01.csv") %>%
  read_csv("https://raw.githubusercontent.com/maxhuebner/maxhuebner.github.io/master/post/data/sleeper-mock-ranks-2020-09-01.csv") %>% 
  filter((pos %in% c("QB","RB","WR","TE")))

Next, we need data to compare. Let’s scrape the data from the FantasyPros-website using the package rvest. The Data is stored in a <table> tag so we can us rvest’s function html_table().

fantasypros_url <- "https://www.fantasypros.com/nfl/rankings/half-point-ppr-cheatsheets.php"

fp_html <- read_html(fantasypros_url) %>% 
  html_table(fill = T) %>% 
  .[[1]] %>% 
  as_tibble() %>% 
  janitor::clean_names()

fp_html

## # A tibble: 510 x 12
##    rank  wsid  overall_team pos   bye   best  worst avg   std_dev adp   vs_adp
##    <chr> <chr> <chr>        <chr> <chr> <chr> <chr> <chr> <chr>   <chr> <chr> 
##  1 &nbsp "&nb~ "&nbsp"      "&nb~ "&nb~ "&nb~ "&nb~ "&nb~ "&nbsp" "&nb~ "&nbs~
##  2 Tier~ ""    ""           ""    ""    ""    ""    ""    ""      ""    ""    
##  3 1     ""    "Christian ~ "RB1" "13"  "1"   "4"   "1.1" "0.3"   "1.0" "0.0" 
##  4 2     ""    "Saquon Bar~ "RB2" "11"  "1"   "5"   "2.1" "0.5"   "2.0" "0.0" 
##  5 3     ""    "Ezekiel El~ "RB3" "10"  "2"   "17"  "3.5" "1.7"   "3.0" "0.0" 
##  6 4     ""    "Alvin Kama~ "RB4" "6"   "3"   "12"  "4.7" "1.3"   "5.0" "+1.0"
##  7 5     ""    "Michael Th~ "WR1" "6"   "3"   "14"  "6.2" "2.6"   "4.0" "-1.0"
##  8 6     ""    "Dalvin Coo~ "RB5" "7"   "2"   "21"  "6.8" "2.5"   "7.0" "+1.0"
##  9 Tier~ ""    ""           ""    ""    ""    ""    ""    ""      ""    ""    
## 10 7     ""    "Derrick He~ "RB6" "7"   "2"   "28"  "8.3" "3.2"   "6.0" "-1.0"
## # ... with 500 more rows, and 1 more variable: notes <lgl>

Unfortunately the data/table is not perfect, so we need to do a bit of data cleaning:

Get rid of rows with non numeric rows (as.numeric(rank) will make non numeric values NA)
Extract name by matching everything up to the . and trimming the last 2 characters
Extract position by removing the number
Extract the team by extracting at least two back to back capital letters (only team names matches this description)

After that, we change a few outliers so that our data is acceptable.

fp_ranks <- fp_html %>% 
  mutate(rank = as.numeric(rank),
         name = str_extract(overall_team, ".*\\."),
         name = str_sub(name, end = -3),
         pos = str_extract(pos, "[:upper:]*"),
         team = str_extract(overall_team, "[:upper:]{2,}")) %>% 
  filter(!is.na(rank),
         pos %in% c("QB","RB","WR","TE")) %>% 
  select(rank, name, pos, team) %>%
  mutate(team = if_else(team == "JAC", "JAX", team),
         team = if_else(name == "Mark Ingram II", "BAL", team)) %>% 
  add_count(team) %>% 
  filter(n > 2) %>% 
  select(-n)

fp_ranks

## # A tibble: 412 x 4
##     rank name                pos   team 
##    <dbl> <chr>               <chr> <chr>
##  1     1 Christian McCaffrey RB    CAR  
##  2     2 Saquon Barkley      RB    NYG  
##  3     3 Ezekiel Elliott     RB    DAL  
##  4     4 Alvin Kamara        RB    NO   
##  5     5 Michael Thomas      WR    NO   
##  6     6 Dalvin Cook         RB    MIN  
##  7     7 Derrick Henry       RB    TEN  
##  8     8 Davante Adams       WR    GB   
##  9     9 Joe Mixon           RB    CIN  
## 10    10 Julio Jones         WR    ATL  
## # ... with 402 more rows

Merging Data

In order to compare the data we should merge it in a single data frame (or in this case tibble). We want to join by name, position and team so every player has a sleeper_rank and a fp_rank.
The only problem: The names don’t match up perfectly. To fix that we have to use the awesome package fuzzyjoin by David Robinson. We will also use the package stringdist

library(fuzzyjoin)
library(stringdist)

adp_tibble <- fp_ranks %>% 
  fuzzy_left_join(sleeper_ranks,
                  by = c("pos", "team", "name"),
                  list(`==`,`==`,function(x,y) stringdist(tolower(x), tolower(y),
                                                           method="osa") <= 6)) %>%
  select(name = name.x, pos = pos.x, team = team.x,
         fp = rank.x, sleeper = rank.y) %>%
  filter(!is.na(sleeper)) %>%
  mutate(diff = sleeper-fp,
         category = as.factor(ifelse(diff > 0, "Steal", "Overhyped"))) %>% 
  arrange(abs(diff)) %>% 
  distinct(name, team, pos, .keep_all = T)

adp_tibble

## # A tibble: 255 x 7
##    name                pos   team     fp sleeper  diff category 
##    <chr>               <chr> <chr> <dbl>   <dbl> <dbl> <fct>    
##  1 Christian McCaffrey RB    CAR       1       1     0 Overhyped
##  2 Saquon Barkley      RB    NYG       2       2     0 Overhyped
##  3 Joe Mixon           RB    CIN       9       9     0 Overhyped
##  4 Kenyan Drake        RB    ARI      17      17     0 Overhyped
##  5 Kareem Hunt         RB    CLE      62      62     0 Overhyped
##  6 Matt Breida         RB    MIA      86      86     0 Overhyped
##  7 Larry Fitzgerald    WR    ARI     183     183     0 Overhyped
##  8 Brian Hill          RB    ATL     281     281     0 Overhyped
##  9 Ezekiel Elliott     RB    DAL       3       4     1 Steal    
## 10 Alvin Kamara        RB    NO        4       3    -1 Overhyped
## # ... with 245 more rows

We have three join columns. This fuzzy_left_join() works the following way:
Join first and second column by exact match ==. Join third column (name) by a function that is true, if the stringdist of two names is less or equal 6.
Example: stringdist("Patrick Mahomes", "Pat Mahomes") would be 4, so it would still match our superstar quaterback. Difference of 6 seems like a lot, but we can’t match everyone with a smaller value and our overlap is minimal as well.

We will also calculate the difference in the rankings. Negative Value means the player goes too early on sleeper, positive means that we might be able to snatch him a bit later. We label the players accordingly in a new column category

Comparing Data

After joining the data, we can analyze it. Nobody cares about sleepers that go after round 13 so we will only look at Top100 players (in either ranking)

LIMIT <- 100

adp_compare <- adp_tibble %>% 
  arrange(desc(abs(diff))) %>% 
  filter(fp <= LIMIT | sleeper <= LIMIT)

steals <- adp_compare %>% 
  filter(category == "Steal") %>% 
  select(-category)

overhyped <- adp_compare %>% 
  filter(category == "Overhyped") %>% 
  select(-category)

steals

## # A tibble: 51 x 6
##    name              pos   team     fp sleeper  diff
##    <chr>             <chr> <chr> <dbl>   <dbl> <dbl>
##  1 Tarik Cohen       RB    CHI      88     120    32
##  2 Tyler Higbee      TE    LAR      77     106    29
##  3 Austin Hooper     TE    CLE      99     124    25
##  4 Josh Allen        QB    BUF      70      89    19
##  5 Matthew Stafford  QB    DET      90     109    19
##  6 D.J. Moore        WR    CAR      30      47    17
##  7 Odell Beckham Jr. WR    CLE      31      48    17
##  8 Courtland Sutton  WR    DEN      50      67    17
##  9 Allen Robinson    WR    CHI      24      40    16
## 10 DeVante Parker    WR    MIA      55      71    16
## # ... with 41 more rows

overhyped

## # A tibble: 54 x 6
##    name               pos   team     fp sleeper  diff
##    <chr>              <chr> <chr> <dbl>   <dbl> <dbl>
##  1 Deebo Samuel       WR    SF      123      75   -48
##  2 Mecole Hardman     WR    KC      147     100   -47
##  3 Rob Gronkowski     TE    TB      107      72   -35
##  4 Emmanuel Sanders   WR    NO      127      99   -28
##  5 Marlon Mack        RB    IND      91      69   -22
##  6 Alexander Mattison RB    MIN     113      96   -17
##  7 Aaron Rodgers      QB    GB       98      82   -16
##  8 Sony Michel        RB    NE      109      93   -16
##  9 David Montgomery   RB    CHI      52      37   -15
## 10 Devin Singletary   RB    BUF      57      42   -15
## # ... with 44 more rows

Creating Tables

We now have two datasets steals and overhyped that contain the data we were interested in. However, it is not pleasant to look at the players in this format. Therefore we will create beatiful tables using the gt package.

library(gt)
#Table Options Shared
table_init_with_options <- . %>% 
  gt(groupname_col = "pos", rownames_to_stub = T) %>% 
  tab_options(
    row_group.background.color = "#FFEFDB80",#EFFBFC
    heading.background.color = "#ebebeb",
    column_labels.background.color = "#ebebeb",
    stub.background.color = "#ebebeb",
    table.font.color = "#323232",
    table_body.hlines.color = "#989898",
    table_body.border.top.color = "#989898",
    heading.border.bottom.color = "#989898",
    row_group.border.top.color = "#989898",
    row_group.border.bottom.style = "none",
    stub.border.style = "dashed",
    stub.border.color = "#989898",
    stub.border.width = "1px",
    table.width = "60%"
  ) %>% 
  opt_all_caps() %>% 
  cols_align(align = "center", columns = c(1,3:7))

MINIMUM_DIFFERENCE <- 8

These are some options that we want to have for both our tables so we create a function for it. We also set the minimum difference to 8, so the table isn’t to crowded.

Over Table with Player we don’t want to draft looks like this:

overhyped %>% 
  filter(diff <= -MINIMUM_DIFFERENCE) %>% 
  table_init_with_options() %>% 
  tab_header(
    title = md("Overhyped Players on *Sleeper.App*"),
    subtitle = "(Players that tend to go before their general ADP)"
  ) %>% 
  data_color(
    columns = vars(diff),
    colors = scales::col_numeric(
      palette = paletteer::paletteer_d(
        palette = "ggsci::red_material"
      ) %>% as.character(),
      domain = NULL,
      reverse = T
    ),
    alpha = 0.8
  )

Overhyped Players on Sleeper.App
(Players that tend to go before their general ADP)
	name	team	fp	sleeper	diff
WR
1	Deebo Samuel	SF	123	75	-48
2	Mecole Hardman	KC	147	100	-47
4	Emmanuel Sanders	NO	127	99	-28
12	Brandin Cooks	HOU	87	73	-14
17	T.Y. Hilton	IND	63	52	-11
20	D.K. Metcalf	SEA	53	44	-9
21	Marquise Brown	BAL	73	64	-9
TE
3	Rob Gronkowski	TB	107	72	-35
RB
5	Marlon Mack	IND	91	69	-22
6	Alexander Mattison	MIN	113	96	-17
8	Sony Michel	NE	109	93	-16
9	David Montgomery	CHI	52	37	-15
10	Devin Singletary	BUF	57	42	-15
11	James Conner	PIT	39	25	-14
13	Le'Veon Bell	NYJ	42	29	-13
14	J.K. Dobbins	BAL	89	76	-13
16	David Johnson	HOU	44	33	-11
18	Melvin Gordon	DEN	41	31	-10
19	Mark Ingram II	BAL	48	39	-9
22	Jonathan Taylor	IND	54	46	-8
QB
7	Aaron Rodgers	GB	98	82	-16
15	Patrick Mahomes	KC	25	14	-11

Here’s our table for potential steals:

steals %>% 
  filter(diff >= MINIMUM_DIFFERENCE) %>%
  table_init_with_options() %>% 
  tab_header(
    title = md("Potential Steals on *Sleeper.App*"),
    subtitle = "(Players that tend to go after their general ADP)"
  ) %>% 
  data_color(
    columns = vars(diff),
    colors = scales::col_numeric(
      palette = paletteer::paletteer_d(
        palette = "ggsci::green_material"
      ) %>% as.character(),
      domain = NULL
    ),
    alpha = 0.8
  )

Potential Steals on Sleeper.App
(Players that tend to go after their general ADP)
	name	team	fp	sleeper	diff
RB
1	Tarik Cohen	CHI	88	120	32
18	James White	NE	80	90	10
TE
2	Tyler Higbee	LAR	77	106	29
3	Austin Hooper	CLE	99	124	25
19	Hayden Hurst	ATL	97	107	10
20	Hunter Henry	LAC	78	87	9
QB
4	Josh Allen	BUF	70	89	19
5	Matthew Stafford	DET	90	109	19
12	Carson Wentz	PHI	83	98	15
22	Matt Ryan	ATL	75	83	8
WR
6	D.J. Moore	CAR	30	47	17
7	Odell Beckham Jr.	CLE	31	48	17
8	Courtland Sutton	DEN	50	67	17
9	Allen Robinson	CHI	24	40	16
10	DeVante Parker	MIA	55	71	16
11	JuJu Smith-Schuster	PIT	28	43	15
13	Jarvis Landry	CLE	67	81	14
14	Robert Woods	LAR	36	49	13
15	Tyler Boyd	CIN	72	85	13
16	Marvin Jones	DET	81	94	13
17	Terry McLaurin	WAS	49	59	10
21	D.J. Chark	JAX	47	55	8

Conclusion

These tables might help, when drafting on Sleeper this year. You should, however, never base your whole draft around this. If you like a player and find him on the Steals-Table, great! You might even get him a round later than usual. If you like a player, but he is on the Overhyped-Table, you have to decide, if you really want him, because you might have to pay a hefty price.

Standalone Overhyped Table can be found here
Standalone Steal table can be found here

fantasy-football nfl R

Draft Bargains 2020 (Sleeper)

Data Gathering

Merging Data

Comparing Data

Creating Tables

Conclusion

Max Hübner

Computer Science Student

Related