Is This Site Not Suited For Web Scraping Using Beautifulsoup?

August 31, 2022 Post a Comment

I try to use beautifulsoup to get the odds for each match on the following site: https://danskespil.dk/oddset/sports/category/990/counter-strike-go/matches The goal is to end up wi

Solution 1:

the problem is at the line with r = res.get('https://danskespil.dk/oddset/sports/category/990/counter-strike-go/matches').text

Python requests library just sent your HTTP/HTTPS request to the server and get the raw html and it does not help you to load more resources like pictures and scripts, which means that some elements is manipulate in javascript scripts (for example, create an element, set class name and insert into DOM tree):

another example, if you GET main.html via requests, it does not load main.js and the class of div t1 will not be set as sgd-wrapper

# main.html
<html>
   <body>
      <div id="t1"></div>
      <script src="main.js"></script>
   </body>
</html>

# in main.js
document.querySelector('#t1').classList.add('sgd-wrapper');

what you need to do is to use headless Chrome (like google-chorme --headless to launch Chrome) and use Chrome API to hook on page loading events then dump whole complete contents.

Python Programming Language

Is This Site Not Suited For Web Scraping Using Beautifulsoup?

Solution 1:

Post a Comment for "Is This Site Not Suited For Web Scraping Using Beautifulsoup?"