Hi all,
currently I'm trying to learn how to crawl web pages in python. But I'm a bit confused when I have a look into developer tools of my selected web page. I can find there a question mark as a figure/letter and after crawling in my array there are \ue... codes. I think these are unicode letters. But, how to change them into "real" figures/letters in my code? Also tried to export my list into a file with encoding 'utf-8', but the file content is the same as in the output of VS Code.
In the web page the meta charset is UTF-8.
currently I'm trying to learn how to crawl web pages in python. But I'm a bit confused when I have a look into developer tools of my selected web page. I can find there a question mark as a figure/letter and after crawling in my array there are \ue... codes. I think these are unicode letters. But, how to change them into "real" figures/letters in my code? Also tried to export my list into a file with encoding 'utf-8', but the file content is the same as in the output of VS Code.
In the web page the meta charset is UTF-8.
def crawl(url): data = [] headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'} source_code = requests.get(url, headers=headers) soup = BeautifulSoup(source_code.content, 'html.parser') print(soup.prettify)e.g. this I can find in my soup.prettify output. To explain it, score-left is a part of a football result. Usually this is presented as a number. column-date should be clear :-)
Output:<td class="column-date"><span data-obfuscation="6pw62lmn"></span></td>
<span class="score-left" data-obfuscation="xo2yf7ph"></span>