NFL Arrests Visualization Project

We have a dataset of the number of fans arrested at each NFL game from 2011 to 2015, and another of the number of NFL players arrested during that same time period. Could the rowdiness of fans be related to the bad behavior of the players on their local teams? How about local crime rates? This project created a variety of visualizations to compare these factors, without going into predictive modeling.

Visualizations

Dataset 1 - Prepping the Fans Arrests Data

import pandas as pd
import re
import numpy as np
data = pd.read_csv("nfl_arrests_2011-2015.csv", encoding = 'unicode_escape')
#Fix missing data in OT_flag and turn it into a numeric variable
data.fillna({'OT_flag':0}, inplace=True)
data['OT_flag'] = data['OT_flag'].replace(['OT'],1)
data["OT_flag"]=pd.to_numeric(data["OT_flag"])

#Update "division_game" into numeric as well
data['division_game'] = data['division_game'].replace(['n'],0)
data['division_game'] = data['division_game'].replace(['y'],1)
data["division_game"]=pd.to_numeric(data["division_game"])
data.head()
season week_num day_of_week gametime_local home_team away_team home_score away_score OT_flag arrests division_game
0 2011 1 Sunday 1:15:00 PM Arizona Carolina 28 21 0 5.0 0
1 2011 4 Sunday 1:05:00 PM Arizona New York Giants 27 31 0 6.0 0
2 2011 7 Sunday 1:05:00 PM Arizona Pittsburgh 20 32 0 9.0 0
3 2011 9 Sunday 2:15:00 PM Arizona St. Louis 19 13 1 6.0 1
4 2011 13 Sunday 2:15:00 PM Arizona Dallas 19 13 1 3.0 0
#Some observations have missing data--they should be dropped from the dataframe.
data = data[data['arrests'].notna()]
#Some games were played in London and so have missing data.
#We can impute the missing values taking the mean of the arrests of the same year for that team.

#Create a function to help with this process
def imputeLondon(year, home, away, homescore, awayscore, OT, division):
    new = data[(data['home_team'] == home) & (data['season'] == year )]["arrests"].mean()
    data.loc[len(data)] = np.array([year,0,0,0,home,away,homescore,awayscore,OT,new,division])
    return;
#Use the function to fill in the missing data with imputed values.
imputeLondon(2013, "Arizona", "Houston", 30, 9, 0, 0)
imputeLondon(2013, "Jacksonville", "San Francisco", 10, 42, 0, 0)
imputeLondon(2014, "Jacksonville", "Dallas", 17, 32, 0, 0)
imputeLondon(2015, "Jacksonville", "Buffalo", 34, 31, 0, 0)
imputeLondon(2015, "Kansas City", "Detroit", 45, 10, 0, 0)
imputeLondon(2015, "Miami", "New York Jets", 14, 27, 0, 1)
imputeLondon(2014, "Oakland", "Miami", 14, 38, 0, 0)
imputeLondon(2014, "Oakland", "Denver", 17, 41, 0, 1)
imputeLondon(2014, "Oakland", "Kansas City", 24, 20, 0, 0)
imputeLondon(2011, "Tampa Bay", "Chicago", 18, 24, 0, 0)
#Three teams had a missing year of data--we can impute this data by taking the mean of the existing years.
def imputeYear(year, home, away, homescore, awayscore, OT, division):
    new = data[(data['home_team'] == home)]["arrests"].mean()
    data.loc[len(data)] = np.array([year,0,0,0,home,away,homescore,awayscore,OT,pd.to_numeric(new),division])
    data["arrests"]=pd.to_numeric(data["arrests"]) #kept getting type errors without brute-forcing it
    return;

imputeYear(2012, "Baltimore", "Cincinnati", 44, 13, 0, 1)
imputeYear(2012, "Baltimore", "New England", 31, 30, 0, 0)
imputeYear(2012, "Baltimore", "Cleveland", 23, 16, 0, 1)
imputeYear(2012, "Baltimore", "Dallas", 31, 29, 0, 1)
imputeYear(2012, "Baltimore", "Oakland", 55, 20, 0, 0)
imputeYear(2012, "Baltimore", "Pittsburgh", 20, 23, 0, 1)
imputeYear(2012, "Baltimore", "Denver", 17, 34, 0, 0)
imputeYear(2012, "Baltimore", "New York Giants", 33, 14, 0, 0)

imputeYear(2015, "Chicago", "Green Bay", 23, 31, 0, 1)
imputeYear(2015, "Chicago", "Arizona", 23, 48, 0, 0)
imputeYear(2015, "Chicago", "Oakland", 22, 20, 0, 0)
imputeYear(2015, "Chicago", "Minnesota", 20, 23, 0, 1)
imputeYear(2015, "Chicago", "Denver", 15, 17, 0, 0)
imputeYear(2015, "Chicago", "San Francisco", 20, 26, 1, 0)
imputeYear(2015, "Chicago", "Washington", 21, 24, 0, 0)
imputeYear(2015, "Chicago", "Detroit", 20, 24, 0, 1)

imputeYear(2011, "Miami", "New England", 24, 38, 0, 1)
imputeYear(2011, "Miami", "Houston", 13, 23, 0, 0)
imputeYear(2011, "Miami", "Denver", 15, 18, 1, 0)
imputeYear(2011, "Miami", "Washington", 2, 9, 0, 0)
imputeYear(2011, "Miami", "Buffalo", 35, 8, 0, 1)
imputeYear(2011, "Miami", "Oakland", 34, 14, 0, 0)
imputeYear(2011, "Miami", "Philadelphia", 10, 26, 0, 0)
imputeYear(2011, "Miami", "New York Jets", 19, 17, 0, 1)
#For one of the visualizations, we need to sort the dataset according to team, then year, then week--then an index will need
#to be added to keep things properly sorted.
data = data.sort_values(by = ["home_team","season", "week_num"])

data["Index_num"] = 0
for snuh in range(0,len(data)):
    data.iat[snuh,11] = snuh
#Export Dataframe to CSV
data.to_csv(r'nfl_arrests.csv', index = False)

Dataset 1 is properly formatted and can be exported as a CSV for use in Tableau.

Dataset 2 - NFL Player Arrests

#New Dataset, NFL Player Arrests
data2 = pd.read_csv("nfl_player_arrests.csv", encoding = 'unicode_escape')
#Check out the data, look for Missing Data
data2.head()
DATE TEAM NAME POS CASE CATEGORY DESCRIPTION OUTCOME
0 10/13/2020 DEN Melvin Gordon RB Arrested DUI Suspected of drunk driving, speeding in Denver. Resolution undetermined.
1 10/3/2020 PIT Jarron Jones OT Arrested Domestic violence Charged with aggravated assault, strangulation... Resolution undetermined.
2 9/11/2020 TEN Isaiah Wilson OT Arrested DUI Pulled over, accused of drunken driving near N... Resolution undetermined.
3 8/25/2020 CIN Mackensie Alexander CB Arrested Battery Accused of hitting a man in the face in Collie... Resolution undetermined.
4 8/7/2020 WAS Derrius Guice RB Arrested Domestic violence Accused of strangulation, assault and property... Resolution undetermined. Team released him sam...
#We need to standardize Team Names--importing a new csv file with two columns to help ease the transition
data3 = pd.read_csv("nfl_names_conversion.csv", encoding = 'unicode_escape')
data3.head()
Team_Name Team_City
0 ARI Arizona
1 BAL Baltimore
2 CAR Carolina
3 CHI Chicago
4 CIN Cincinnati
#Loop through each row in this small dataset, and change obervations in data2 that match "Team Name" to "Team City".
#Also add a new column to data3 that selects 1 for items that were matched. This will allow us to delete all observations
#with teams outside of our dataset easily.

data2["Found"] = 0
for meh in range(0,len(data3)):
    teamname = data3["Team_Name"][meh]
    for bleh in range(0,len(data2)):
        if data2.iloc[bleh]['TEAM'] == teamname:
            data2.iat[bleh,8] = 1
            data2.iat[bleh,1] = data3.iloc[meh]["Team_City"]

data2.drop(data2[data2['Found'] == 0].index, inplace = True) 
data2.head(20)
DATE TEAM NAME POS CASE CATEGORY DESCRIPTION OUTCOME Found
0 10/13/2020 Denver Melvin Gordon RB Arrested DUI Suspected of drunk driving, speeding in Denver. Resolution undetermined. 1
1 10/3/2020 Pittsburgh Jarron Jones OT Arrested Domestic violence Charged with aggravated assault, strangulation... Resolution undetermined. 1
2 9/11/2020 Tennessee Isaiah Wilson OT Arrested DUI Pulled over, accused of drunken driving near N... Resolution undetermined. 1
3 8/25/2020 Cincinnati Mackensie Alexander CB Arrested Battery Accused of hitting a man in the face in Collie... Resolution undetermined. 1
4 8/7/2020 Washington Derrius Guice RB Arrested Domestic violence Accused of strangulation, assault and property... Resolution undetermined. Team released him sam... 1
5 7/14/2020 Houston Kenny Stills WR Arrested Disorderly conduct Accused of felony intimidation in Louisville a... Resolution undetermined. 1
6 6/27/2020 Arizona Jermiah Braswell WR Arrested DUI Accused of driving while intoxiated after his ... Resolution undetermined. 1
7 6/15/2020 New York Giants Aldrick Rosas K Arrested Hit-and-run Accused of fleeing the scene of a collision at... Resolution undetermined. 1
8 5/19/2020 Green Bay Montravius Adams DE Arrested Drugs Pulled over near Perry, Ga., accused of mariju... Resolution undetermined. 1
9 5/16/2020 New York Giants Deandre Baker CB Surrendered Armed robbery Accused of being involved in armed robbery of ... Dropped by prosecutors. 1
10 5/16/2020 Seattle Quinton Dunbar CB Surrendered Armed robbery Accused of being involved in armed robbery of ... Dropped by prosecutors. 1
11 5/16/2020 Washington Cody Latimer WR Arrested Gun Accused of felony discharge of a weapon in Eng... Resolution undetermined. 1
13 4/28/2020 Kansas City Bashaud Breeland CB Arrested Drugs Accused of marijuana possession, resisting arr... Resolution undetermined. 1
14 3/11/2020 Dallas Ventell Bryant WR Arrested DUI Pulled over in Tampa, accused of driving drunk... Resolution undetermined. 1
15 3/5/2020 New York Jets Quinnen Williams DE Arrested Gun Accused of criminal possession of pistol at La... Resolution undetermined. 1
17 1/17/2020 New England Joejuan Williams CB Arrested Drugs Pulled over for speeding in Nashville, accused... Pleaded no contest to simple possession. Diver... 1
19 1/11/2020 New England Julian Edelman WR Arrested Vandalism Accused of jumping on the hood of a Mercedes i... Charge dropped. 1
20 12/29/2019 Miami Xavien Howard CB Arrested Domestic violence Police in Davie, Fla., say he pushed his fianc... Dropped after woman declined to proceed with c... 1
21 12/20/2019 Pittsburgh Kameron Kelly S Arrested Disorderly conduct Accused of making threats and resisting arrest... Resolution undetermined. Team released him sam... 1
22 12/3/2019 Dallas Antwaun Woods DT Arrested Drugs Pulled over for speeding in Frisco, Texas, and... Resolution undetermined 1
#Export Dataframe to CSV
data2.to_csv(r'nfl_players.csv', index = False)

Both datasets are prepped to be used in Tableau.