Unicode letters in crawling page - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Unicode letters in crawling page (/thread-30532.html) |
Unicode letters in crawling page - DMDoniz - Oct-24-2020 Hi all, currently I'm trying to learn how to crawl web pages in python. But I'm a bit confused when I have a look into developer tools of my selected web page. I can find there a question mark as a figure/letter and after crawling in my array there are \ue... codes. I think these are unicode letters. But, how to change them into "real" figures/letters in my code? Also tried to export my list into a file with encoding 'utf-8', but the file content is the same as in the output of VS Code. In the web page the meta charset is UTF-8. def crawl(url): data = [] headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'} source_code = requests.get(url, headers=headers) soup = BeautifulSoup(source_code.content, 'html.parser') print(soup.prettify)e.g. this I can find in my soup.prettify output. To explain it, score-left is a part of a football result. Usually this is presented as a number. column-date should be clear :-)
RE: Unicode letters in crawling page - Larz60+ - Oct-24-2020 This might be a bit easier if you supply the URL, and explain what you are trying to extract from the page. RE: Unicode letters in crawling page - DMDoniz - Oct-25-2020 the URL I would like to crawl is e.g. http://www.fussball.de/spieltag/kreisliga-b4-bezirk-bodensee-kl-kreisliga-b-herren-saison2021-wuerttemberg/-/spieltag/8/staffel/02B9EGC7UG000004VS5489B4VUEJF3HB-G#!/ The relevant part is the score board with the result of the matches. RE: Unicode letters in crawling page - DMDoniz - Oct-30-2020 no ideas? Is this webpage so special? RE: Unicode letters in crawling page - Larz60+ - Oct-30-2020 I think selenium will work best on this site. There is a tutorial on this forum that will not take more than a few hours to complete. When finished, you will know what to do. web scraping part 1 web scraping part 2 RE: Unicode letters in crawling page - buran - Oct-31-2020 (Oct-30-2020, 09:41 PM)DMDoniz Wrote: Is this webpage so special?Don't know if it makes it special, but they apply data obfuscation in order to make the life of wanna-be-scrapers hard. |