Python Forum
Scrape for html based on url string and output into csv
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scrape for html based on url string and output into csv
#12
I post here the entire table structure to perfectly visualize what I try to scrape.

I want to extract the phone, email, website, main activity (li element text without the div)

UPDATE: I forgot to mention that i ran into error because sometimes there is no email or website available vice versa, and code does not understand and breakes the entire cycle. I think there should be some error control somehow.

<table class="table-info">
    <tbody>
        <tr>
            <td class="col-1">
                <div class="col-1-text">Business name</div>
            </td>
            <td class="col-2">
                <div class="col-2-text">Company XYZ&nbsp;</div>
            </td>
        </tr>
        <tr>
            <td class="col-1">
                <div class="col-1-text">Register code:</div>
            </td>
            <td class="col-2">
                <div class="col-2-text">112233558</div>
            </td>
        </tr>


        <tr>
            <td class="col-1">
                <div class="col-1-text">Operating address:</div>
            </td>
            <td class="col-2">
                <div class="col-2-text"><a target="googlemaps" href="https://www.google.com/maps/place/Some-location"
                        class="link-location">Some location strt. 233</a></div>
            </td>
        </tr>
        <tr>
            <td class="col-1">
                <div class="col-1-text">Legal address</div>
            </td>
            <td class="col-2">
                <div class="col-2-text">
                    <a class="link-location" href="https://www.google.com/maps/place/Some-location" target="_new">Some
                        location
                    </a>
                </div>
            </td>
        </tr>
        <tr>
            <td class="col-1">
                <div class="col-1-text">VAT No:</div>
            </td>
            <td class="col-2">
                <div class="col-2-text"><a href="javascript:void(0)" onclick="return getVAT(this, '12345678')">Get VAT
                        liability</a></div>
            </td>
        </tr>
        <tr>
            <td class="col-1">
                <div class="col-1-text">Age:</div>
            </td>
            <td class="col-2">
                <div class="col-2-text">1 year&nbsp;3 months</div>
            </td>
        </tr>
        <tr>
            <td class="col-1">
                <div class="col-1-text">Founded:</div>
            </td>
            <td class="col-2">
                <div class="col-2-text">20/09/2019</div>
            </td>
        </tr>
        <tr>
            <td class="col-1">
                <div class="col-1-text">Capital:</div>
            </td>
            <td class="col-2">
                <div class="col-2-text">2000 USD</div>
            </td>
        </tr>
        <tr>
            <td colspan="2" class="sep"></td>
        </tr>
        <tr>
            <td class="col-1">
                <div class="col-1-text">Phone:</div>
            </td>
            <td class="col-2">
                <div class="col-2-text">123456789</div>
            </td>
        </tr>
        <tr>
            <td class="col-1">
                <div class="col-1-text">E-mail:</div>
            </td>
            <td class="col-2">
                <div class="col-2-text"><a href="mailto:[email protected]">[email protected]</a></div>
            </td>
        </tr>
 <tr>
        <td class="col-1"><div class="col-1-text">Website:</div></td>
        <td class="col-2"><div class="col-2-text"><a href="http://www.somecompany.com" target="_blank">www.somecompany.com</a></div></td>
    </tr>
        <tr>
            <td colspan="2" class="sep"></td>
        </tr>
        <tr>
            <td class="col-1">
                <div class="col-1-text">Representatives:</div>
            </td>
            <td class="col-2">
                <div class="col-2-text">
                    <div class="box-message">
                        <p class="desc">To access information, please</p>
                        <p>
                            <a href="#" onclick="return loginClicked(this, '#');"
                                class="btn btn-small btn-purple link-login">Log in</a>
                        </p>
                    </div>
                </div>
            </td>
        </tr>
        <tr>
            <td colspan="2" class="sep"></td>
        </tr>
        <tr>
            <td class="col-1">
                <div class="col-1-text">
                    Main activity:
                    <span class="tip info" title=""
                        data-original-title="Activities are classified according to EMTAK 2008"></span>
                </div>
            </td>
            <td class="col-2">
                <div class="col-2-text" id="activity_top5ffe2eab23d13">
                    <ul>
                        <li>
                            Computer consultancy activities
                            <div class="main_activities_top_link_wrapper">
                                <a href="https://www.somesite.com/" target="_blank"
                                    onclick="ga('send', 'event', 'check', 'top_btn', 'Anonym');"
                                    class="btn btn-simple btn-open-graph">
                                    <span>Open TOP 20</span> </a>
                            </div>
                        </li>
                    </ul>

                </div>
            </td>
        </tr>


    </tbody>
</table>
Reply


Messages In This Thread
RE: Scrape for html based on url string and output into csv - by dana - Jan-12-2021, 11:48 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Trying to scrape data from HTML with no identifiers pythonpaul32 2 915 Dec-02-2023, 03:42 AM
Last Post: pythonpaul32
Lightbulb Python Obstacles | Kung-Fu | Full File HTML Document Scrape and Store it in MariaDB BrandonKastning 5 2,965 Dec-29-2021, 02:26 AM
Last Post: BrandonKastning
  Python Obstacles | Karate | HTML/Scrape Specific Tag and Store it in MariaDB BrandonKastning 8 3,232 Nov-22-2021, 01:38 AM
Last Post: BrandonKastning
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,701 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Pandas tuple list returning html string shansaran 0 1,755 Mar-23-2020, 08:44 PM
Last Post: shansaran
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,402 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  scrape data 1 go to next page scrape data 2 and so on alkaline3 6 5,279 Mar-13-2020, 07:59 PM
Last Post: alkaline3
  How do I get rid of the HTML tags in my output? glittergirl 1 3,763 Aug-05-2019, 08:30 PM
Last Post: snippsat
  Formatting Output after Web Scrape yoitspython 2 2,506 Jul-30-2019, 08:39 PM
Last Post: yoitspython
  Basic Syntax/HTML Scrape Questions sungar78 5 3,839 Sep-06-2018, 09:32 PM
Last Post: sungar78

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020