Python Forum
How to capture Single Column from Web Html Table?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to capture Single Column from Web Html Table?
#1
Hello,

I am trying to get complete values of three distinct columns from an online HTML table that has 13 columns in total. I have parsed the Html content using beautiful soup and can display the values as text files using the code below. However, the output generated just lists out everything in the table and I can't seem to figure out how to simply extract the columns that I need. The code (which I have written until now) is as below:

import re
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
my_url = 'http://stats.espncricinfo.com/ci/engine/player/348144.html?class=3;template=results;type=batting;view=innings'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
container_main = page_soup.findAll("div", {"id":"ciHomeContent"})
container_main = container_main[0]
container_secondary = container_main.findAll("div", {"id":"ciMainContainer"})
container_secondary = container_secondary[0]
container_tertiary = container_secondary.findAll("div", {"id":"ciHomeContentlhs"})
container_tertiary = container_tertiary[0]
container_sublevel = container_tertiary.findAll("div", {"class":"pnl650M"})
container_sublevel = container_sublevel[0]
container_mainTable = container_sublevel.findAll("table", {"class":"engineTable"})
container_mainTable = container_mainTable[3]

for table in container_mainTable:
    Test = container_mainTable.tbody
    Test = Test.text
    print (Test)
This code gives me the following output:



15*
13
11
2
0
136.36
3
not out
2

v England
Manchester
7 Sep 2016
T20I # 566

----------------------------------------------New Record Begins (Inserted by User)-------------------------------------------

55*
49
37
6
2
148.64
3
not out
2

v West Indies
Dubai (DSC)
23 Sep 2016
T20I # 568

-----------------------------------------------New Record Begins (Until the final row in the table)---------------------------------------
From the above output, I just want values (which are denoted by between a '[td][/td]' tag) in the first column (15*, 55*.......). Values in 3rd Column (11, 37.....) and values in the 4th (2, 6.....) & 5th (0, 2......) columns. Following that, I would most probably export them to a CSV file along with generating Gnuplot graphs and other charts.

The data that I am scraping here is hosted in here http://stats.espncricinfo.com/ci/engine/...ew=innings and below are the first few HTML tags that I am trying to export this data from

<tbody>
        <tr class="data1">
                <td>15*</td>
                <td>13</td>
                <td>11</td>
                <td>2</td>
                <td>0</td>
                <td>136.36</td>
                <td>3</td>
                <td nowrap="nowrap">not out</td>
                <td>2</td>
                <td></td>
                <td class="left" nowrap="nowrap">v <a href="/ci/content/team/1.html" class="data-link">England</a></td>
                <td class="left" nowrap="nowrap"><a href="/ci/content/ground/57160.html" class="data-link">Manchester</a></td>
                <td nowrap="nowrap"><b>7 Sep 2016</b></td>
                <td style="white-space: nowrap;"><a href="/ci/engine/match/913663.html" title="view the scorecard for this row">T20I # 566</a></td>
        </tr>
        <tr class="data1">
                <td>55*</td>
                <td>49</td>
                <td>37</td>
                <td>6</td>
                <td>2</td>
                <td>148.64</td>
                <td>3</td>
                <td nowrap="nowrap">not out</td>
                <td>2</td>
                <td></td>
                <td class="left" nowrap="nowrap">v <a href="/ci/content/team/4.html" class="data-link">West Indies</a></td>
                <td class="left" nowrap="nowrap"><a href="/ci/content/ground/392627.html" class="data-link">Dubai (DSC)</a></td>
                <td nowrap="nowrap"><b>23 Sep 2016</b></td>
                <td style="white-space: nowrap;"><a href="/ci/engine/match/1050217.html" title="view the scorecard for this row">T20I # 568</a></td>
        </tr>
        <tr class="data1">
                <td class="padAst">19</td>
                <td>28</td>
                <td>18</td>
                <td>2</td>
                <td>0</td>
                <td>105.55</td>
                <td>3</td>
                <td>caught</td>
                <td>1</td>
                <td></td>
                <td class="left" nowrap="nowrap">v <a href="/ci/content/team/4.html" class="data-link">West Indies</a></td>
                <td class="left" nowrap="nowrap"><a href="/ci/content/ground/392627.html" class="data-link">Dubai (DSC)</a></td>
                <td nowrap="nowrap"><b>24 Sep 2016</b></td>
                <td style="white-space: nowrap;"><a href="/ci/engine/match/1050219.html" title="view the scorecard for this row">T20I # 569</a></td>
        </tr>
        <tr class="data1">
                <td>27*</td>
                <td>42</td>
                <td>24</td>
                <td>1</td>
                <td>0</td>
                <td>112.50</td>
                <td>3</td>
                <td nowrap="nowrap">not out</td>
                <td>2</td>
                <td></td>
                <td class="left" nowrap="nowrap">v <a href="/ci/content/team/4.html" class="data-link">West Indies</a></td>
                <td class="left" nowrap="nowrap"><a href="/ci/content/ground/59396.html" class="data-link">Abu Dhabi</a></td>
                <td nowrap="nowrap"><b>27 Sep 2016</b></td>
                <td style="white-space: nowrap;"><a href="/ci/engine/match/1050221.html" title="view the scorecard for this row">T20I # 570</a></td>
        </tr>
Any help on the matter would be highly appreciated.

Thanks
Waqas
Reply


Messages In This Thread
How to capture Single Column from Web Html Table? - by ahmedwaqas92 - Jul-11-2019, 05:12 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Suggestion request for scrapping html table Vkkindia 3 2,077 Dec-06-2021, 06:09 PM
Last Post: Larz60+
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,684 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Help: Beautiful Soup - Parsing HTML table ironfelix717 2 2,723 Oct-01-2020, 02:19 PM
Last Post: snippsat
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,391 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Imprt HTML table to array meleghengersor 2 2,146 Jan-23-2020, 10:23 AM
Last Post: perfringo
  BeautifulSoup: Error while extracting a value from an HTML table kawasso 3 3,273 Aug-25-2019, 01:13 AM
Last Post: kawasso
  convert html table to json bhojendra 5 16,070 Jul-30-2019, 07:53 PM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020