python - How to crawl and download files from a dynamic URL? -

i have own python crawler(based on cs101 udacity.com), trying download files(installers) download.cnet.com, when crawler crawling, want work this:

tell if link download link:

response = urllib2.urlopen('http://example.com/')

content_type = response.info().get('content-type')

print content_type
if crawler gets:
```
application/octet-stream 
```
the crawler download installer link

the problem download.com doesn't seem provide real download link, , crawler can't find download link dynamic links. example, when tried download opera in download.com, have message this: "your download begin in moment. if doesn't, restart download." when checked "restart download" link, expecting real download link(e.g. download.com/blah/opera.exe), instead got wierd address crawler couldn't understand.

so have confirmed http://googlewebmastercentral.blogspot.no/2008/09/dynamic-urls-vs-static-urls.html download.com using dynamic links, how should in order let crawler find link can download installer download.com?

as you've said, you're getting javascript or ajax in page activates download in "real" browser while stymying efforts automate it.

here's discussion of same issue: stackoverflow: mechanize , javascript. noted there, 1 option use alternative python such phantomjs or browser automation framework (with optional "remote control") such selenium.

Search This Blog

Bready

python - How to crawl and download files from a dynamic URL? -

Comments

Post a Comment

Popular posts from this blog

ios - iPhone/iPad different view orientations in different views , and apple approval process -

java Extracting Zip file -

php - HTTP_REFERER woes: How can I allow access to a specific page, only when a visitor has visited another specific page beforehand? -