Skip to content Skip to sidebar Skip to footer

Upload Images From From Web-page

I want to implement a feature similar to this http://www.tineye.com/parse?url=yahoo.com - allow user upload images from any web page. Main problem for me is that it takes too much

Solution 1:

i can think of few optimisations:

  1. parse as you are reading a file from the stream
  2. use SAX parser (which will be great with point above)
  3. use HEAD to get size of the images
  4. use queue to put your images, then use few threads to connect and get file sizes

example of HEAD request:

$telnetm.onet.pl80Trying213.180.150.45...Connectedtom.onet.pl.Escapecharacteris'^]'.HEAD/_m/33fb7563935e11c0cba62f504d91675f,59,29,134-68-525-303-0.jpgHTTP/1.1host:m.onet.plHTTP/1.0200OKServer:nginx/0.8.53Date:Sat,09Apr2011 18:32:44 GMTContent-Type:image/jpegContent-Length:37545Last-Modified:Sat,09Apr2011 18:29:22 GMTExpires:Sat,16Apr2011 18:32:44 GMTCache-Control:max-age=604800Accept-Ranges:bytesAge:6575X-Cache:HITfromemka1.m10r2.onetVia:1.1emka1.m10r2.onet:80(squid)Connection:closeConnectionclosedbyforeignhost.

Solution 2:

You can use the headers attribute of the file like object returned by urllib2.urlopen (I don't know about urllib).

Here's a test I wrote for it. As you can see, it is rather fast, though I imagine some websites would block too many repeated requests.

|milo|laurie|¥ cat test.py
import urllib2
uri = "http://download.thinkbroadband.com/1GB.zip"

def get_file_size(uri):
    file = urllib2.urlopen(uri)
    content_header, = [header for header in file.headers.headers if header.startswith("Content-Length")]
    _, str_length = content_header.split(':')
    length = int(str_length.strip())
    returnlengthif __name__ == "__main__":
    get_file_size(uri)
|milo|laurie|¥ time python2 test.py
python2 test.py  0.06s user 0.01s system35% cpu 0.196 total

Post a Comment for "Upload Images From From Web-page"