NFL Arrests Visualization Project
We have a dataset of the number of fans arrested at each NFL game from 2011 to 2015, and another of the number of NFL players arrested during that same time period. Could the rowdiness of fans be related to the bad behavior of the players on their local teams? How about local crime rates? This project created a variety of visualizations to compare these factors, without going into predictive modeling.
Visualizations
Dataset 1 - Prepping the Fans Arrests Data
import pandas as pd
import re
import numpy as np
data = pd.read_csv("nfl_arrests_2011-2015.csv", encoding = 'unicode_escape')
#Fix missing data in OT_flag and turn it into a numeric variable
data.fillna({'OT_flag':0}, inplace=True)
data['OT_flag'] = data['OT_flag'].replace(['OT'],1)
data["OT_flag"]=pd.to_numeric(data["OT_flag"])
#Update "division_game" into numeric as well
data['division_game'] = data['division_game'].replace(['n'],0)
data['division_game'] = data['division_game'].replace(['y'],1)
data["division_game"]=pd.to_numeric(data["division_game"])
data.head()
season | week_num | day_of_week | gametime_local | home_team | away_team | home_score | away_score | OT_flag | arrests | division_game | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2011 | 1 | Sunday | 1:15:00 PM | Arizona | Carolina | 28 | 21 | 0 | 5.0 | 0 |
1 | 2011 | 4 | Sunday | 1:05:00 PM | Arizona | New York Giants | 27 | 31 | 0 | 6.0 | 0 |
2 | 2011 | 7 | Sunday | 1:05:00 PM | Arizona | Pittsburgh | 20 | 32 | 0 | 9.0 | 0 |
3 | 2011 | 9 | Sunday | 2:15:00 PM | Arizona | St. Louis | 19 | 13 | 1 | 6.0 | 1 |
4 | 2011 | 13 | Sunday | 2:15:00 PM | Arizona | Dallas | 19 | 13 | 1 | 3.0 | 0 |
#Some observations have missing data--they should be dropped from the dataframe.
data = data[data['arrests'].notna()]
#Some games were played in London and so have missing data.
#We can impute the missing values taking the mean of the arrests of the same year for that team.
#Create a function to help with this process
def imputeLondon(year, home, away, homescore, awayscore, OT, division):
new = data[(data['home_team'] == home) & (data['season'] == year )]["arrests"].mean()
data.loc[len(data)] = np.array([year,0,0,0,home,away,homescore,awayscore,OT,new,division])
return;
#Use the function to fill in the missing data with imputed values.
imputeLondon(2013, "Arizona", "Houston", 30, 9, 0, 0)
imputeLondon(2013, "Jacksonville", "San Francisco", 10, 42, 0, 0)
imputeLondon(2014, "Jacksonville", "Dallas", 17, 32, 0, 0)
imputeLondon(2015, "Jacksonville", "Buffalo", 34, 31, 0, 0)
imputeLondon(2015, "Kansas City", "Detroit", 45, 10, 0, 0)
imputeLondon(2015, "Miami", "New York Jets", 14, 27, 0, 1)
imputeLondon(2014, "Oakland", "Miami", 14, 38, 0, 0)
imputeLondon(2014, "Oakland", "Denver", 17, 41, 0, 1)
imputeLondon(2014, "Oakland", "Kansas City", 24, 20, 0, 0)
imputeLondon(2011, "Tampa Bay", "Chicago", 18, 24, 0, 0)
#Three teams had a missing year of data--we can impute this data by taking the mean of the existing years.
def imputeYear(year, home, away, homescore, awayscore, OT, division):
new = data[(data['home_team'] == home)]["arrests"].mean()
data.loc[len(data)] = np.array([year,0,0,0,home,away,homescore,awayscore,OT,pd.to_numeric(new),division])
data["arrests"]=pd.to_numeric(data["arrests"]) #kept getting type errors without brute-forcing it
return;
imputeYear(2012, "Baltimore", "Cincinnati", 44, 13, 0, 1)
imputeYear(2012, "Baltimore", "New England", 31, 30, 0, 0)
imputeYear(2012, "Baltimore", "Cleveland", 23, 16, 0, 1)
imputeYear(2012, "Baltimore", "Dallas", 31, 29, 0, 1)
imputeYear(2012, "Baltimore", "Oakland", 55, 20, 0, 0)
imputeYear(2012, "Baltimore", "Pittsburgh", 20, 23, 0, 1)
imputeYear(2012, "Baltimore", "Denver", 17, 34, 0, 0)
imputeYear(2012, "Baltimore", "New York Giants", 33, 14, 0, 0)
imputeYear(2015, "Chicago", "Green Bay", 23, 31, 0, 1)
imputeYear(2015, "Chicago", "Arizona", 23, 48, 0, 0)
imputeYear(2015, "Chicago", "Oakland", 22, 20, 0, 0)
imputeYear(2015, "Chicago", "Minnesota", 20, 23, 0, 1)
imputeYear(2015, "Chicago", "Denver", 15, 17, 0, 0)
imputeYear(2015, "Chicago", "San Francisco", 20, 26, 1, 0)
imputeYear(2015, "Chicago", "Washington", 21, 24, 0, 0)
imputeYear(2015, "Chicago", "Detroit", 20, 24, 0, 1)
imputeYear(2011, "Miami", "New England", 24, 38, 0, 1)
imputeYear(2011, "Miami", "Houston", 13, 23, 0, 0)
imputeYear(2011, "Miami", "Denver", 15, 18, 1, 0)
imputeYear(2011, "Miami", "Washington", 2, 9, 0, 0)
imputeYear(2011, "Miami", "Buffalo", 35, 8, 0, 1)
imputeYear(2011, "Miami", "Oakland", 34, 14, 0, 0)
imputeYear(2011, "Miami", "Philadelphia", 10, 26, 0, 0)
imputeYear(2011, "Miami", "New York Jets", 19, 17, 0, 1)
#For one of the visualizations, we need to sort the dataset according to team, then year, then week--then an index will need
#to be added to keep things properly sorted.
data = data.sort_values(by = ["home_team","season", "week_num"])
data["Index_num"] = 0
for snuh in range(0,len(data)):
data.iat[snuh,11] = snuh
#Export Dataframe to CSV
data.to_csv(r'nfl_arrests.csv', index = False)
Dataset 1 is properly formatted and can be exported as a CSV for use in Tableau.
Dataset 2 - NFL Player Arrests
#New Dataset, NFL Player Arrests
data2 = pd.read_csv("nfl_player_arrests.csv", encoding = 'unicode_escape')
#Check out the data, look for Missing Data
data2.head()
DATE | TEAM | NAME | POS | CASE | CATEGORY | DESCRIPTION | OUTCOME | |
---|---|---|---|---|---|---|---|---|
0 | 10/13/2020 | DEN | Melvin Gordon | RB | Arrested | DUI | Suspected of drunk driving, speeding in Denver. | Resolution undetermined. |
1 | 10/3/2020 | PIT | Jarron Jones | OT | Arrested | Domestic violence | Charged with aggravated assault, strangulation... | Resolution undetermined. |
2 | 9/11/2020 | TEN | Isaiah Wilson | OT | Arrested | DUI | Pulled over, accused of drunken driving near N... | Resolution undetermined. |
3 | 8/25/2020 | CIN | Mackensie Alexander | CB | Arrested | Battery | Accused of hitting a man in the face in Collie... | Resolution undetermined. |
4 | 8/7/2020 | WAS | Derrius Guice | RB | Arrested | Domestic violence | Accused of strangulation, assault and property... | Resolution undetermined. Team released him sam... |
#We need to standardize Team Names--importing a new csv file with two columns to help ease the transition
data3 = pd.read_csv("nfl_names_conversion.csv", encoding = 'unicode_escape')
data3.head()
Team_Name | Team_City | |
---|---|---|
0 | ARI | Arizona |
1 | BAL | Baltimore |
2 | CAR | Carolina |
3 | CHI | Chicago |
4 | CIN | Cincinnati |
#Loop through each row in this small dataset, and change obervations in data2 that match "Team Name" to "Team City".
#Also add a new column to data3 that selects 1 for items that were matched. This will allow us to delete all observations
#with teams outside of our dataset easily.
data2["Found"] = 0
for meh in range(0,len(data3)):
teamname = data3["Team_Name"][meh]
for bleh in range(0,len(data2)):
if data2.iloc[bleh]['TEAM'] == teamname:
data2.iat[bleh,8] = 1
data2.iat[bleh,1] = data3.iloc[meh]["Team_City"]
data2.drop(data2[data2['Found'] == 0].index, inplace = True)
data2.head(20)
DATE | TEAM | NAME | POS | CASE | CATEGORY | DESCRIPTION | OUTCOME | Found | |
---|---|---|---|---|---|---|---|---|---|
0 | 10/13/2020 | Denver | Melvin Gordon | RB | Arrested | DUI | Suspected of drunk driving, speeding in Denver. | Resolution undetermined. | 1 |
1 | 10/3/2020 | Pittsburgh | Jarron Jones | OT | Arrested | Domestic violence | Charged with aggravated assault, strangulation... | Resolution undetermined. | 1 |
2 | 9/11/2020 | Tennessee | Isaiah Wilson | OT | Arrested | DUI | Pulled over, accused of drunken driving near N... | Resolution undetermined. | 1 |
3 | 8/25/2020 | Cincinnati | Mackensie Alexander | CB | Arrested | Battery | Accused of hitting a man in the face in Collie... | Resolution undetermined. | 1 |
4 | 8/7/2020 | Washington | Derrius Guice | RB | Arrested | Domestic violence | Accused of strangulation, assault and property... | Resolution undetermined. Team released him sam... | 1 |
5 | 7/14/2020 | Houston | Kenny Stills | WR | Arrested | Disorderly conduct | Accused of felony intimidation in Louisville a... | Resolution undetermined. | 1 |
6 | 6/27/2020 | Arizona | Jermiah Braswell | WR | Arrested | DUI | Accused of driving while intoxiated after his ... | Resolution undetermined. | 1 |
7 | 6/15/2020 | New York Giants | Aldrick Rosas | K | Arrested | Hit-and-run | Accused of fleeing the scene of a collision at... | Resolution undetermined. | 1 |
8 | 5/19/2020 | Green Bay | Montravius Adams | DE | Arrested | Drugs | Pulled over near Perry, Ga., accused of mariju... | Resolution undetermined. | 1 |
9 | 5/16/2020 | New York Giants | Deandre Baker | CB | Surrendered | Armed robbery | Accused of being involved in armed robbery of ... | Dropped by prosecutors. | 1 |
10 | 5/16/2020 | Seattle | Quinton Dunbar | CB | Surrendered | Armed robbery | Accused of being involved in armed robbery of ... | Dropped by prosecutors. | 1 |
11 | 5/16/2020 | Washington | Cody Latimer | WR | Arrested | Gun | Accused of felony discharge of a weapon in Eng... | Resolution undetermined. | 1 |
13 | 4/28/2020 | Kansas City | Bashaud Breeland | CB | Arrested | Drugs | Accused of marijuana possession, resisting arr... | Resolution undetermined. | 1 |
14 | 3/11/2020 | Dallas | Ventell Bryant | WR | Arrested | DUI | Pulled over in Tampa, accused of driving drunk... | Resolution undetermined. | 1 |
15 | 3/5/2020 | New York Jets | Quinnen Williams | DE | Arrested | Gun | Accused of criminal possession of pistol at La... | Resolution undetermined. | 1 |
17 | 1/17/2020 | New England | Joejuan Williams | CB | Arrested | Drugs | Pulled over for speeding in Nashville, accused... | Pleaded no contest to simple possession. Diver... | 1 |
19 | 1/11/2020 | New England | Julian Edelman | WR | Arrested | Vandalism | Accused of jumping on the hood of a Mercedes i... | Charge dropped. | 1 |
20 | 12/29/2019 | Miami | Xavien Howard | CB | Arrested | Domestic violence | Police in Davie, Fla., say he pushed his fianc... | Dropped after woman declined to proceed with c... | 1 |
21 | 12/20/2019 | Pittsburgh | Kameron Kelly | S | Arrested | Disorderly conduct | Accused of making threats and resisting arrest... | Resolution undetermined. Team released him sam... | 1 |
22 | 12/3/2019 | Dallas | Antwaun Woods | DT | Arrested | Drugs | Pulled over for speeding in Frisco, Texas, and... | Resolution undetermined | 1 |
#Export Dataframe to CSV
data2.to_csv(r'nfl_players.csv', index = False)
Both datasets are prepped to be used in Tableau.