Trouble With Scraping
Tag And Datalist With Links In It

December 22, 2023 Post a Comment

This is an example of the HTML I'm scraping with Python/Beautifulsoup:

) # your markup print(soup.br.contents[0])
gives:
WebiEnglishShanghai--Tuesday, 7March2017, at2:17p.m.

Solution 2:

In [31]: for dd in soup.find_all('dd'):
    ...:     link = dd.a.get('href')
    ...:     link_text = dd.a.text
    ...:     *_, dd_text = dd.stripped_strings

out:

http://www.eslcafe.com/jobs/china/index.cgi?read=45391 Teach English in Shenyang, China: Great salary, Support, and Structured program Greenheart Travel -- Thursday, 9 February 2017, at 1:05 p.m.
dd_text is the last text node of dd tag, so I use *_ to represent all the text node before it.
EDIT:
In [20]: for dd in soup.find_all('dd'): ...: ...: d = {} # store data in a dict ...: d['link'] = dd.a.get('href') ...: d['link_text'] = dd.a.text ...: *_, dd_text = dd.stripped_strings ...: d['date_text'] = dd_text ...: print(d)
out:
{'date_text': 'EnglishTeacherChina.com -- Sunday, 12 February 2017, at 1:45 ''p.m.', 'link': 'http://www.eslcafe.com/jobs/china/index.cgi?read=45426', 'link_text': '❤ ❤ ❤ Teach English In China 12,000-20,000 RMB/month - Adults ''or Kids - Free Housing & Airfare - Free TEFL TESOL ''Certification - Where You Want - YOUR NEEDS ARE OUR TOP ''PRIORITY ❤ ❤ ❤'}

Share

You may like these posts
Python3 Exec, Why Returns None?
How To Increment And Get The Next Ipv6 Network Address From The Current Network Address
Inputting To A List And Finding Longest Streak Of The Same Input Python
Creating Confusion Matrix From Multiple .csv Files

Python Programming Language

Trouble With Scraping
Tag And Datalist With Links In It

Solution 2:

Post a Comment for "Trouble With Scraping
Tag And Datalist With Links In It"

Trouble With Scraping Tag And Datalist With Links In It

Solution 2:

Post a Comment for "Trouble With Scraping Tag And Datalist With Links In It"

Trouble With Scraping
Tag And Datalist With Links In It

Post a Comment for "Trouble With Scraping
Tag And Datalist With Links In It"