How To Loop To Scrape Different Pages With Python
So, my objective with this code is extract information about a rookie in my NFL team. I want to compare his performance with players that finished last season in top-10 statistics
Solution 1:
To load all tables from years 2011-2021 to one dataframe, you can use this example:
import pandas as pd
# url page
url_mac = "https://www.pro-football-reference.com/years/{}/passing.htm"
all_dfs = []
for year inrange(2011, 2022):
print("Getting table for year {}".format(year))
df = pd.read_html(url_mac.format(year))[0]
df = df[~df["Rk"].eq("Rk")]
df["Year"] = year
all_dfs.append(df)
df = pd.concat(all_dfs)
print(df)
df.to_csv("data.csv", index=False)
Prints:
Rk Player Tm Age Pos G GS QBrec Cmp Att Cmp% Yds TD TD% Int Int% 1D Lng Y/A AY/A Y/C Y/G Rate QBR Sk Yds.1 NY/A ANY/A Sk% 4QC GWD Year
0 1 Drew Brees* NOR 32 QB 16 16 13-3-0 468 657 71.2 5476 46 7.0 14 2.1 279 79 8.3 8.8 11.7 342.3 110.6 82.3 24 158 7.81 8.23 3.5 3 4 2011
1 2 Tom Brady* NWE 34 QB 16 16 13-3-0 401 611 65.6 5235 39 6.4 12 2.0 262 99 8.6 9.0 13.1 327.2 105.6 73.8 32 173 7.87 8.25 5.0 1 2 2011
2 3 Matthew Stafford DET 23 QB 16 16 10-6-0 421 663 63.5 5038 41 6.2 16 2.4 243 73 7.6 7.7 12.0 314.9 97.2 60.5 36 257 6.84 6.98 5.2 3 4 2011
3 4 Eli Manning* NYG 30 QB 16 16 9-7-0 359 589 61.0 4933 29 4.9 16 2.7 218 99 8.4 8.1 13.7 308.3 92.9 64.2 28 199 7.67 7.45 4.5 5 6 2011
...
and saves data.csv
(screenshot from LibreOffice):
Solution 2:
Go with pandas
to parse those tables. Just loop through the seasons in the url (don't forget to add that in the dataframes otherwise you won;t know which rows are for what seasons:
import pandas as pd
# To get last 10 seasons
years_look_back = 10
years_list = list(range(2021,2021-(years_look_back+1),-1))
years_list.reverse()
# Or just make a list of specific years:
years_list = [2001, 2005, 2008, 2010, 2011, 2015]
cols = ['Player', 'Tm','Cmp%', 'Yds', 'TD', 'Int', 'Y/A', 'Rate']
df_list = []
for year in years_list:
url_mac = f'https://www.pro-football-reference.com/years/{year}/passing.htm'
temp_df = pd.read_html(url_mac)[0][cols]
temp_df['Season'] = year
temp_df = temp_df[~temp_df["Player"].eq("Player")]
df_list.append(temp_df)
print(f'Collect: {year}')
data_radar = pd.concat(df_list)
#selecting specific player
data_mac = data_radar[data_radar['Player'] == 'Mac Jones']
Post a Comment for "How To Loop To Scrape Different Pages With Python"