Skip to content Skip to sidebar Skip to footer

How To Loop To Scrape Different Pages With Python

So, my objective with this code is extract information about a rookie in my NFL team. I want to compare his performance with players that finished last season in top-10 statistics

Solution 1:

To load all tables from years 2011-2021 to one dataframe, you can use this example:

import pandas as pd

# url page
url_mac = "https://www.pro-football-reference.com/years/{}/passing.htm"

all_dfs = []
for year inrange(2011, 2022):
    print("Getting table for year {}".format(year))
    df = pd.read_html(url_mac.format(year))[0]
    df = df[~df["Rk"].eq("Rk")]
    df["Year"] = year
    all_dfs.append(df)

df = pd.concat(all_dfs)
print(df)
df.to_csv("data.csv", index=False)

Prints:

      Rk                 Player   Tm Age  Pos   G  GS   QBrec  Cmp  Att   Cmp%   Yds  TD    TD% Int   Int%   1D Lng   Y/A   AY/A   Y/C    Y/G   Rate    QBR  Sk Yds.1   NY/A   ANY/A   Sk%  4QC  GWD  Year
0      1            Drew Brees*  NOR  32   QB  16  16  13-3-0  468  657   71.2  5476  46    7.0  14    2.1  279  79   8.3    8.8  11.7  342.3  110.6   82.3  24   158   7.81    8.23   3.5    3    4  2011
1      2             Tom Brady*  NWE  34   QB  16  16  13-3-0  401  611   65.6  5235  39    6.4  12    2.0  262  99   8.6    9.0  13.1  327.2  105.6   73.8  32   173   7.87    8.25   5.0    1    2  2011
2      3       Matthew Stafford  DET  23   QB  16  16  10-6-0  421  663   63.5  5038  41    6.2  16    2.4  243  73   7.6    7.7  12.0  314.9   97.2   60.5  36   257   6.84    6.98   5.2    3    4  2011
3      4           Eli Manning*  NYG  30   QB  16  16   9-7-0  359  589   61.0  4933  29    4.9  16    2.7  218  99   8.4    8.1  13.7  308.3   92.9   64.2  28   199   7.67    7.45   4.5    5    6  2011


...

and saves data.csv (screenshot from LibreOffice):

enter image description here

Solution 2:

Go with pandas to parse those tables. Just loop through the seasons in the url (don't forget to add that in the dataframes otherwise you won;t know which rows are for what seasons:

import pandas as pd

# To get last 10 seasons
years_look_back = 10
years_list = list(range(2021,2021-(years_look_back+1),-1))
years_list.reverse()


# Or just make a list of specific years:
years_list = [2001, 2005, 2008, 2010, 2011, 2015]

cols = ['Player', 'Tm','Cmp%', 'Yds', 'TD', 'Int', 'Y/A', 'Rate']
df_list = []
for year in years_list:
    url_mac = f'https://www.pro-football-reference.com/years/{year}/passing.htm'
    temp_df = pd.read_html(url_mac)[0][cols]
    temp_df['Season'] = year
    temp_df = temp_df[~temp_df["Player"].eq("Player")]
    df_list.append(temp_df)
    print(f'Collect: {year}')


data_radar = pd.concat(df_list)

#selecting specific player 
data_mac = data_radar[data_radar['Player'] == 'Mac Jones']

Post a Comment for "How To Loop To Scrape Different Pages With Python"