Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Web scraping doubt
#1
I am trying to scrape a webpage using scrapy, i need to extract an email(text) from an image, the link is "https://profiles.viictr.org/display/MDANDERSON/guilin-tang", i tried using pytesseract but it need the path for the image file from your computer but this is online. some one help me out here please! Idea
Reply
#2
You may need to download the image first before applying pytesseract on it.
Recommended Tutorials:
Reply
#3
I cant download every image there are nearly 1000 of those, is there any possible alternative pls,
link the above is the link, i just gave one sample link above!!!

Also, link this is another site to scrape, I inspected it and got some jpg files in the network tab of inspector of the chrome, if there is a possible solution for the above(first link above), this can also be done, or if there is any other possible way to do,pls tell me. I mean not just python, any possible solution using javascript even.
Reply
#4
Whole purpose of providing email id in image to avoid getting parsed by bots/scrapers.

If still stuck on that,
You have to download/store image from each profile. Apply pyteseract or OCR solution to extract text from image. These OCR tools are not very accurate, specially with low quality small images. You need to play and choose one of those OCR tools to work towards your goal.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Newbie Doubt arshidkv12 8 6,487 Mar-20-2017, 06:33 AM
Last Post: wavic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020