Upload Images From From Web-page
I want to implement a feature similar to this http://www.tineye.com/parse?url=yahoo.com - allow user upload images from any web page. Main problem for me is that it takes too much
Solution 1:
i can think of few optimisations:
- parse as you are reading a file from the stream
- use SAX parser (which will be great with point above)
- use HEAD to get size of the images
- use queue to put your images, then use few threads to connect and get file sizes
example of HEAD request:
$telnetm.onet.pl80Trying213.180.150.45...Connectedtom.onet.pl.Escapecharacteris'^]'.HEAD/_m/33fb7563935e11c0cba62f504d91675f,59,29,134-68-525-303-0.jpgHTTP/1.1host:m.onet.plHTTP/1.0200OKServer:nginx/0.8.53Date:Sat,09Apr2011 18:32:44 GMTContent-Type:image/jpegContent-Length:37545Last-Modified:Sat,09Apr2011 18:29:22 GMTExpires:Sat,16Apr2011 18:32:44 GMTCache-Control:max-age=604800Accept-Ranges:bytesAge:6575X-Cache:HITfromemka1.m10r2.onetVia:1.1emka1.m10r2.onet:80(squid)Connection:closeConnectionclosedbyforeignhost.
Solution 2:
You can use the headers attribute of the file like object returned by urllib2.urlopen (I don't know about urllib).
Here's a test I wrote for it. As you can see, it is rather fast, though I imagine some websites would block too many repeated requests.
|milo|laurie|¥ cat test.py
import urllib2
uri = "http://download.thinkbroadband.com/1GB.zip"
def get_file_size(uri):
file = urllib2.urlopen(uri)
content_header, = [header for header in file.headers.headers if header.startswith("Content-Length")]
_, str_length = content_header.split(':')
length = int(str_length.strip())
returnlengthif __name__ == "__main__":
get_file_size(uri)
|milo|laurie|¥ time python2 test.py
python2 test.py 0.06s user 0.01s system35% cpu 0.196 total
Post a Comment for "Upload Images From From Web-page"