All Elements From Html Not Being Extracted By Requests And BeautifulSoup In Python
Solution 1:
requests
is not quite suitable to use in this case - the site is quite dynamic and uses multiple XHR requests and javascript to form the page. A quicker and much less painful way to get to the desired information would be to use a real browser automated via selenium
.
Here is an example code to get you started - headless PhantomJS
browser is used:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.PhantomJS()
driver.get("https://www.bestodds.com.au/odds/cricket/ICC-World-Twenty20/Sri-Lanka-v-Afghanistan_71992/")
# waiting for the page to load
wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".odds-comparison")))
for comparison in driver.find_elements_by_css_selector(".odds-comparison"):
description = comparison.find_element_by_css_selector(".description").text
print(description)
driver.close()
It prints all the odds table descriptions on the page:
MATCH ODDS
MOST SIXES
TOP SRI LANKA BATSMAN
TOP AFGHANISTAN BATSMAN
Solution 2:
It better to use urlopen :
import urllib
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = "https://www.bestodds.com.au/odds/cricket/ICC-World-Twenty20/Sri-Lanka-v-Afghanistan_71992/"
response = urlopen(url)
htmltext = BeautifulSoup(response)
print (htmltext)
after that you can find what ever you want :
Liste_page =htmltext.find('div',{"id":"pager"}).text
Tr=htmltext.find('table',{"class":"additional_data"}).findNext('tbody').text
Solution 3:
The data is most likely loaded dynamically.
It is not in the HTML.
You can try to understand which requests are used to retrieve the real data, or try using e.g. selenium webdriver to simulate a real browser (this second option will be much slower).
Beware that you most likely violate the terms of usage of that site. This can easily get you into trouble. They may also try to deliberately serve you bad data.
Post a Comment for "All Elements From Html Not Being Extracted By Requests And BeautifulSoup In Python"