Skip to content Skip to sidebar Skip to footer

Web Scraping In Python Using BeautifulSoup - How To Transpose Results?

I built the code below and am having issues of how to transpose the results. Effectively I am looking for the following result: # Column headers: 'company name', 'Work/Life Bal

Solution 1:

Here's one possible approach.

import pandas as pd
name = ['3M','3M','3M','3M','3M','Google','Google','Google','Google','Google','Apple','Apple','Apple','Apple','Apple']
number = ['3.8','3.9','3.5','3.6','3.8','4.2','4.0','3.6','3.9','4.2','3.8','4.1','3.7','3.7','4.1']
category = ['Work/Life Balance',' Salary/Benefits','Job Security/Advancement','Management','Culture','Work/Life Balance',' Salary/Benefits','Job Security/Advancement','Management','Culture','Work/Life Balance',' Salary/Benefits','Job Security/Advancement','Management','Culture']
cols = {'Name':name,'Rating':number,'Category':category}
df = pd.DataFrame(cols)
print(df)



from collections import defaultdict
aggregated_data = defaultdict(dict)
for idx, row in df.iterrows():
    aggregated_data[row.Name][row.Category] = row.Rating

result = pd.DataFrame(aggregated_data).T
print(result)

Result:

        Salary/Benefits Culture Job Security/Advancement Management Work/Life Balance
3M                  3.9     3.8                      3.5        3.6               3.8
Google              4.0     4.2                      3.6        3.9               4.2
Apple               4.1     4.1                      3.7        3.7               3.8

I don't think this is the "idiomatic" approach. Since it uses native Python data types and loops, it's probably considerably slower than a pure pandas solution. But if your data isn't that big, maybe that's OK.


Edit: I think transposing in that last step there is causing the column names to get put in a surprising order, so here's an approach that constructs the final dataframe from a list of dicts instead.

from collections import defaultdict
data_by_name = defaultdict(dict)
for idx, row in df.iterrows():
    data_by_name[row.Name][row.Category] = row.Rating

aggregated_rows = [{"company name": name, **ratings} for name, ratings in data_by_name.items()]
result = pd.DataFrame(aggregated_rows)
print(result)

Result:

  company name Work/Life Balance  Salary/Benefits Job Security/Advancement Management Culture
0           3M               3.8              3.9                      3.5        3.6     3.8
1       Google               4.2              4.0                      3.6        3.9     4.2
2        Apple               3.8              4.1                      3.7        3.7     4.1

Post a Comment for "Web Scraping In Python Using BeautifulSoup - How To Transpose Results?"