Python Requests.get(url) Returning Javascript Code Instead Of The Page Html
I have a very simple problem. I'm trying to get the job description from the html of a linkedIn page, but instead of getting the html of the page I'm getting few lines that look li
Solution 1:
Some websites present different content based on the type of browser that is accessing the site. LinkedIn is a perfect example of such behavior. If the browser has advanced capabilities, the website may present “richer” content – something more dynamic and styled. And using the bot won't help to see these websites.
To solve this problem, you need to follow these steps:
- Download chrome-driver from here. Choose the one that matches your OS.
- Extract the driver and put it in a certain directory. For example,
\usr
- Install
Selenium
which is a python module by runningpip install selenium
. Note that, selenium depends on another package calledmsgpack
. So, you should install it first using this commandpip install msgpack
. - Now, we are ready to run the following code
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
defcreate_browser(webdriver_path):
#create a selenium object that mimics the browser
browser_options = Options()
#headless tag created an invisible browser
browser_options.add_argument("--headless")
browser_options.add_argument('--no-sandbox')
browser = webdriver.Chrome(webdriver_path, chrome_options=browser_options)
print("Done Creating Browser")
return browser
url = "https://www.linkedin.com/jobs/view/inside-sales-manager-at-stericycle-1089095836/"
browser = create_browser('/usr/chromedriver') #DON'T FORGET TO CHANGE THIS AS YOUR DIRECTORY
browser.get(url)
page_html = browser.page_source
print(page_html[-10:]) #prints dy></html>
Now, you have the whole page. I hope this answers your question!!
Post a Comment for "Python Requests.get(url) Returning Javascript Code Instead Of The Page Html"