Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unicode letters in crawling page
#1
Hi all,

currently I'm trying to learn how to crawl web pages in python. But I'm a bit confused when I have a look into developer tools of my selected web page. I can find there a question mark as a figure/letter and after crawling in my array there are \ue... codes. I think these are unicode letters. But, how to change them into "real" figures/letters in my code? Also tried to export my list into a file with encoding 'utf-8', but the file content is the same as in the output of VS Code.

In the web page the meta charset is UTF-8.

def crawl(url):
    data = []
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'}
    source_code = requests.get(url, headers=headers)
    soup = BeautifulSoup(source_code.content, 'html.parser')
    print(soup.prettify)
e.g. this I can find in my soup.prettify output. To explain it, score-left is a part of a football result. Usually this is presented as a number. column-date should be clear :-)

Output:
<td class="column-date"><span data-obfuscation="6pw62lmn"></span></td> <span class="score-left" data-obfuscation="xo2yf7ph"></span>
Reply
#2
This might be a bit easier if you supply the URL, and explain what you are trying to extract from the page.
Reply
#3
the URL I would like to crawl is e.g.

http://www.fussball.de/spieltag/kreislig...JF3HB-G#!/

The relevant part is the score board with the result of the matches.
Reply
#4
no ideas? Is this webpage so special?
Reply
#5
I think selenium will work best on this site.
There is a tutorial on this forum that will not take more than a few hours to complete.
When finished, you will know what to do.

web scraping part 1
web scraping part 2
Reply
#6
(Oct-30-2020, 09:41 PM)DMDoniz Wrote: Is this webpage so special?
Don't know if it makes it special, but they apply data obfuscation in order to make the life of wanna-be-scrapers hard.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Web Crawler's Crawling Ability samlee916 3 2,696 Aug-10-2020, 12:50 PM
Last Post: abusalim
  use Xpath in Python :: libxml2 for a page-to-page skip-setting apollo 2 3,581 Mar-19-2020, 06:13 PM
Last Post: apollo
  Web Scraping and crawling venkataramakrishna 1 1,815 Jan-25-2020, 06:07 PM
Last Post: Larz60+
  Good book on Web scraping and crawling Surya 2 5,091 May-08-2017, 02:01 AM
Last Post: Larz60+
  Crawling tweets with scrapy R3turnz 1 4,484 Jan-16-2017, 06:14 PM
Last Post: micseydel

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020