Web scraping doubt

pixel_chick · May-30-2019, 11:23 AM

I am trying to scrape a webpage using scrapy, i need to extract an email(text) from an image, the link is "https://profiles.viictr.org/display/MDANDERSON/guilin-tang", i tried using pytesseract but it need the path for the image file from your computer but this is online. some one help me out here please! Idea

***metulburr*** · May-30-2019, 11:39 AM

You may need to download the image first before applying pytesseract on it.

pixel_chick · (This post was last modified: May-31-2019, 04:33 AM by pixel_chick.)

I cant download every image there are nearly 1000 of those, is there any possible alternative pls,
link the above is the link, i just gave one sample link above!!!

Also, link this is another site to scrape, I inspected it and got some jpg files in the network tab of inspector of the chrome, if there is a possible solution for the above(first link above), this can also be done, or if there is any other possible way to do,pls tell me. I mean not just python, any possible solution using javascript even.

mahi · Jun-05-2019, 06:43 AM

Whole purpose of providing email id in image to avoid getting parsed by bots/scrapers.

If still stuck on that,
You have to download/store image from each profile. Apply pyteseract or OCR solution to extract text from image. These OCR tools are not very accurate, specially with low quality small images. You need to play and choose one of those OCR tools to work towards your goal.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Newbie Doubt	arshidkv12	8	6,487	Mar-20-2017, 06:33 AM Last Post: wavic

Web scraping doubt

User Panel Messages

Announcements