Python Forum
Need alittle hlpl with an image scraper.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need alittle hlpl with an image scraper.
#1
Hello one and all,
I am back with , I hope is a small problem. I found this image scraper that looks just what I been looking for. I been working on this one error for over a week now. I google it and I can't find anything that is close to what I have here.
Here is the code, I am using python 2.7.12 on wim 7:



import requests
import mechanize
from lxml import html
import sys
import urlparse
import time

def grab_my_pictures():
    respones = requests.get(final_link)
    parsed_body = html.fromstring(response.text)
    images = parsed_body.xpath('//img/@src')
    if not images:
        sys.exit("Found no images")
    images = [urlparse.urljoin(response.url, url)for url in images]
    print 'Found %s images' % len(images)

    for url in images[1:50]:
        r = requests.get(url)
        r = open('pics2/%s' % url.split('/')[-1], 'w')
        f.write(r.content)
        f.close()

website_name = raw_input("Please enter the name of the website ")
list_of_links = []
br = mechanize.browser()
br.set_handle_robot(False)
final_website = "http//" + website_name
print final_website
respons = br.open("http//" + website_name)  
Their still some code that I do not know what it does, I been taking it line by line learning what each line does.
The error is this:


Traceback (most recent call last):
  File "C:/Users/renny and kite/Desktop/play_with_image/test_1.py", line 25, in <module>
    br = mechanize.browser()
AttributeError: 'module' object has no attribute 'browser'



I just don't know what attribute I should put in this function br = mechanize.browser()

I been to a few sites, but can't find anything about mechanize.browser()
site one http://wwwsearch.sourceforge.net/mechanize/doc.html
site two  http://stockrt.github.io/p/emulating-a-b...mechanize/
site three http://www.pythonforbeginners.com/mechan...mechanize/, now I found this code[br = mechanize.browser()] on this site, but I could not get it to work.

So no luck so far, that is why I am here again. Hope someone can help, I would like to get this peace of code to work.
Thank you
Reply
#2
>>> 'browser' == 'Browser'
False
Reply
#3
ok does that replace this br = mechanize.browser() If not where do I put 'browser' == 'Browser' at?
thank you
Reply
#4
I tried to give a hint about capitalize character is not the same as lowercase character.
>>> import mechanize
>>> br = mechanize.browser()
Traceback (most recent call last):
 File "<interactive input>", line 1, in <module>
AttributeError: 'module' object has no attribute 'browser'

>>> br = mechanize.Browser()
>>> 
Reply
#5
Ok, I should have seen that.
now i get a lot of errors
Traceback (most recent call last):
File "C:\Users\renny and kite\Desktop\play_with_image\test_1.py", line 26, in <module>
br.set_handle_robot(False)
File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 628, in __getattr__
".select_form()?)" % (self.__class__, name))
AttributeError: mechanize._mechanize.Browser instance has no attribute set_handle_robot (perhaps you forgot to .select_form()?)

I talk to the guy that said he wrote this code, he said that it will work just like it is. Yep, like most of the stuff I have found on the web it takes alot to make it work
thank you for your help
Reply
#6
So nobody know why this this code does not work. I would think one person might know what is going on inside this code.

br = mechanize.Browser(what attribute should be here) to make this work?
thank you
Reply
#7
So, as I can see mechanize is used just to retrieve the web page. If the page is not build with JavaScript you don't need mechanize. Use requests module instead. 
Or replace mechanize with selenium

Also, you never call grab_my_pictures() function. I don't see where requests.get(final_link) gets the web address from either.

Are you sure this is the full script?
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#8
Again typing error same as you had on previos line,
this is a little silly,you should catch this yourself.
just copy and paste the code,as it's not your code anyway Doh
>>> import mechanize
>>> br = mechanize.Browser()
>>> br.set_handle_robot(False)
Traceback (most recent call last):
 File "<interactive input>", line 1, in <module>
 File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 628, in __getattr__
   ".select_form()?)" % (self.__class__, name))
AttributeError: mechanize._mechanize.Browser instance has no attribute set_handle_robot (perhaps you forgot to .select_form()?)
>>> # Add s
>>> br.set_handle_robots(False)
>>>
Reply
#9
Thanks wavic, I will try that, snippsat i change br.set_handle_robots a while back, does not help still get the same error.
I will start to work on this later to day.
thank you all for all the help
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Web scraper tomenzo123 8 4,360 Aug-18-2023, 12:45 PM
Last Post: Gaurav_Kumar
  Web scraper not populating .txt with scraped data BlackHeart 5 1,512 Apr-03-2023, 05:12 PM
Last Post: snippsat
  Image Scraper (beautifulsoup), stopped working, need to help see why woodmister 9 4,034 Jan-12-2021, 04:10 PM
Last Post: woodmister
  Court Opinion Scraper in Python w/ BS4 (Currently exports to CSV) need help with SQL MidnightDreamer 4 2,994 Mar-12-2020, 09:57 AM
Last Post: BrandonKastning
  Python using BS scraper paulfearn100 1 2,534 Feb-07-2020, 10:22 PM
Last Post: snippsat
  web scraper using pathlib Larz60+ 1 3,197 Oct-16-2017, 05:27 PM
Last Post: Larz60+
  Made a very simple email grabber(scraper) Blue Dog 4 6,868 Dec-13-2016, 06:25 AM
Last Post: wavic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020