Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
MechanicalSoup Basics
#1
Hello everyone,

I am quite new to python and especially to web crawling. My goal is to write my own crawler for the website www.dazn.com. I want to use the library MechanicalSoup and my problem is that even basic things seem not to work on that website.
When I go to the login site (https://www.dazn.com/de-DE/account/signin) and click inspect element the following shows up:
<html>
...
<body class>
    <div class="spinner"></div>
    ....
    <div id="app">
        <div class="Root layout6 ...">
        ....
            <div class="SignInView">
            ...
            </div>
        </div>
    </div>
</body>
</html>
According to the MechanicalSoup documentation 'find_all()' should give me all tags that are given as the argument. This works fine for me for other sites I tried. For example find_all('h5') gave me all content within <h5>...</h5>. But for the dazn.com I only get the top-level divs (<div class="spinner"> and <div id="app>), no matter what type of tag I am searching for. When I search for the tag input (which is in the source code) I just get an empty result. So what am I missing here? I just changed the arguments in comparison to a other websites I tried it on and dazn.com

Thanks
Datiswaken
Reply
#2
not able to check site due to geo-restrictions, but based on info on github repo - MechanicalSoup does not do JavaScript. So I would guess the page uses JavaScript. If this is the case you will need tools like selenium to work with this page.
Reply
#3
I might have figured it out. The robots.txt file disallows everything below /app. And I guess MechanicalSoup obeys to this. That would explain the problem at least.
Reply
#4
Does the site use a javascript frontend to render the page? If that's the case, then MechanicalSoup would never see any page structure, as it wouldn't exist, and you'd have to use something different, like Selenium.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  what to learn after the basics? mitmit293 1 611 Feb-06-2019, 11:59 PM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020