Retrieving Essential Data From A Webpage Using Python
Following is a part of a webpage i downloaded with urlretrieve (urllib). I want to write only this data from the webpage given below in to another text file as: ENGINEERING MATHEMA
Solution 1:
import urllib2
import BeautifulSoup
defmain():
infname = 'htmltable.html'
outfname = 'courses.txt'withopen(infname) as inf:
html = inf.read()
doc = BeautifulSoup.BeautifulSoup(html)
table = doc.find('table',{'id':'content'})
withopen(outfname, 'w') as outf:
for row in table.findAll('tr'):
id,name,a,b,c,d = [cell.getText().strip() for cell in row.findAll('td')]
outf.write("{name}, {a}, {b}, {c}, {d}\n".format(id=id, name=name, a=a, b=b, c=c, d=d))
if __name__=="__main__":
main()
works quite nicely if you assume the saved page starts like
<html><head><title>Data Table</title></head><body><tableid='content'><tralign=leftbgcolor='#FFFFFF'><td>EIT402 </td><td>ENGINEERING MATHEMATICS-IV</td><tdalign=center>4</td><tdalign=center>36</td><tdalign=center>40</td><tdalign=center>F</td></tr>
resulting in
ENGINEERING MATHEMATICS-IV,4,36,40,F
ENVIRONMENTAL STUDIES,47,36,83, P
SYSTEM PROGRAMMING,40,36,76, P
MICROPROCESSOR BASED DESIGN,3,35,38,F
PROGRAMMING PARADIGMS,42,36,78, P
COMMUNICATION SYSTEMS,9,35,44,F
DATA STRUCTURE LAB,10,35,45,F
PROGRAMMING ENVIRONMENTS LAB,20,25,45,F
Post a Comment for "Retrieving Essential Data From A Webpage Using Python"