Skip to content Skip to sidebar Skip to footer

Trouble With Scraping
Tag And Datalist With Links In It

Solution 2:

In [31]: for dd in soup.find_all('dd'):
    ...:     link = dd.a.get('href')
    ...:     link_text = dd.a.text
    ...:     *_, dd_text = dd.stripped_strings

out:

http://www.eslcafe.com/jobs/china/index.cgi?read=45391
Teach English in Shenyang, China: Great salary, Support, and Structured program
Greenheart Travel -- Thursday, 9 February 2017, at 1:05 p.m.

dd_text is the last text node of dd tag, so I use *_ to represent all the text node before it.

EDIT:

In [20]: for dd in soup.find_all('dd'):
    ...:     
    ...:     d = {} # store data in a dict
    ...:     d['link'] = dd.a.get('href')
    ...:     d['link_text'] = dd.a.text
    ...:     *_, dd_text = dd.stripped_strings
    ...:     d['date_text'] = dd_text
    ...:     print(d)

out:

{'date_text': 'EnglishTeacherChina.com -- Sunday, 12 February 2017, at 1:45 ''p.m.',
 'link': 'http://www.eslcafe.com/jobs/china/index.cgi?read=45426',
 'link_text': '❤ ❤ ❤ Teach English In China 12,000-20,000 RMB/month - Adults ''or Kids - Free Housing & Airfare - Free TEFL TESOL ''Certification - Where You Want - YOUR NEEDS ARE OUR TOP ''PRIORITY ❤ ❤ ❤'}

Post a Comment for "Trouble With Scraping
Tag And Datalist With Links In It"