Skip to content Skip to sidebar Skip to footer

All Elements From Html Not Being Extracted By Requests And BeautifulSoup In Python

I am trying to scrape odds from a site that displays current odds from different agencies for an assignment on the effects of market competition. I am using Requests and BeautifulS

Solution 1:

requests is not quite suitable to use in this case - the site is quite dynamic and uses multiple XHR requests and javascript to form the page. A quicker and much less painful way to get to the desired information would be to use a real browser automated via selenium.

Here is an example code to get you started - headless PhantomJS browser is used:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.PhantomJS()
driver.get("https://www.bestodds.com.au/odds/cricket/ICC-World-Twenty20/Sri-Lanka-v-Afghanistan_71992/")

# waiting for the page to load
wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".odds-comparison")))

for comparison in driver.find_elements_by_css_selector(".odds-comparison"):
    description = comparison.find_element_by_css_selector(".description").text
    print(description)

driver.close()

It prints all the odds table descriptions on the page:

MATCH ODDS
MOST SIXES
TOP SRI LANKA BATSMAN
TOP AFGHANISTAN BATSMAN

Solution 2:

It better to use urlopen :

   import urllib
   from bs4 import BeautifulSoup
   from urllib.request import urlopen

   url = "https://www.bestodds.com.au/odds/cricket/ICC-World-Twenty20/Sri-Lanka-v-Afghanistan_71992/"

   response = urlopen(url)
   htmltext = BeautifulSoup(response)
   print (htmltext)

after that you can find what ever you want :

   Liste_page =htmltext.find('div',{"id":"pager"}).text
   Tr=htmltext.find('table',{"class":"additional_data"}).findNext('tbody').text

Solution 3:

The data is most likely loaded dynamically.

It is not in the HTML.

You can try to understand which requests are used to retrieve the real data, or try using e.g. selenium webdriver to simulate a real browser (this second option will be much slower).

Beware that you most likely violate the terms of usage of that site. This can easily get you into trouble. They may also try to deliberately serve you bad data.


Post a Comment for "All Elements From Html Not Being Extracted By Requests And BeautifulSoup In Python"