Python Forum

Full Version: MechanicalSoup Basics
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello everyone,

I am quite new to python and especially to web crawling. My goal is to write my own crawler for the website I want to use the library MechanicalSoup and my problem is that even basic things seem not to work on that website.
When I go to the login site ( and click inspect element the following shows up:
<body class>
    <div class="spinner"></div>
    <div id="app">
        <div class="Root layout6 ...">
            <div class="SignInView">
According to the MechanicalSoup documentation 'find_all()' should give me all tags that are given as the argument. This works fine for me for other sites I tried. For example find_all('h5') gave me all content within <h5>...</h5>. But for the I only get the top-level divs (<div class="spinner"> and <div id="app>), no matter what type of tag I am searching for. When I search for the tag input (which is in the source code) I just get an empty result. So what am I missing here? I just changed the arguments in comparison to a other websites I tried it on and

not able to check site due to geo-restrictions, but based on info on github repo - MechanicalSoup does not do JavaScript. So I would guess the page uses JavaScript. If this is the case you will need tools like selenium to work with this page.
I might have figured it out. The robots.txt file disallows everything below /app. And I guess MechanicalSoup obeys to this. That would explain the problem at least.
Does the site use a javascript frontend to render the page? If that's the case, then MechanicalSoup would never see any page structure, as it wouldn't exist, and you'd have to use something different, like Selenium.