Python Forum
How to capture Single Column from Web Html Table?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to capture Single Column from Web Html Table?
#6
(Jul-12-2019, 06:26 AM)perfringo Wrote:
(Jul-11-2019, 05:12 AM)ahmedwaqas92 Wrote: I would most probably export them to a CSV file

Link appears to be dead, so generic example how to write specific columns to file:

>>> import pandas as pd
>>> d = {'ham': [1, 2, 3], 'spam': ['a', 'b', 'c'], 'bacon': ['1a', '2b', '3c']}
>>> df = pd.DataFrame(d)
>>> df
   ham spam bacon
0    1    a    1a
1    2    b    2b
2    3    c    3c
>>> df.to_csv('out.csv', columns=['ham', 'bacon'], index=False)
The content of out.csv is:

Output:
ham,bacon 1,1a 2,2b 3,3c

(Jul-12-2019, 10:48 AM)snippsat Wrote:
(Jul-12-2019, 04:34 AM)ahmedwaqas92 Wrote: Do I have to restructure my code from scratch? Is there no way I can use my existing code to get the columns that I might need?
It's just a lot more work and it's not so easy either as have to make columns as it's not clear defined in html.
Need to clean up data to,if gone do plot or other stuff.
As shown by @perfringo Pandas make this a lot easier.

Here a NoteBook where i show some different stuff that may be needed,like clean up take out columns.


Apologies for the delayed response on the matter, I reviewed the detailed instructions on both these examples and then read a few docs pertaining to the use of data frames in Python. Based on all what I gathered in the last week or so I have managed to make a small Python script which captures the data as to how I want. Then it removes the excess columns, removes special characters and does some calculation as well. The results are finally presented in a pie chart - See code below

import pandas as pd
import matplotlib.pyplot as plt
dataMain = pd.read_html('http://stats.espncricinfo.com/ci/engine/player/422108.html?class=3;spanmin1=07+Sep+2016;spanval1=span;template=results;type=batting;view=innings')
dataTabulated = dataMain[3]
columns = [0,2,3,4]
tempFrame = pd.DataFrame(dataTabulated)
dataFinal = tempFrame[tempFrame.columns[columns]]
dataFinal = dataFinal[~dataFinal.Runs.str.contains("DNB")]
dataFinal = dataFinal.replace('\*','',regex=True).astype(float)


totalRuns = dataFinal['Runs'].sum()
ballsFaced = dataFinal['BF'].sum()
fours = dataFinal['4s'].sum()
sixes = dataFinal['6s'].sum()

totalFours = fours * 4
totalSixes = sixes * 6

#This is Calculating the Total runs Scored in Boundaries and Rotation(1s,2s,3s)
boundaryRuns = totalFours + totalSixes
rotationRuns = totalRuns - boundaryRuns

#This is Calculating the Percentage runs in Rotation(1s,2s,3s) & in Boundaries (1s,2s,3s)
rotationRuns_p = rotationRuns / totalRuns * 100
boundaryRuns_p = boundaryRuns / totalRuns * 100

#Calculating Approximate & Percentage Dot balls
forBall = ballsFaced-(fours+sixes)
forRun = totalRuns-(totalFours+totalSixes)
approxDot = forBall-forRun
approxDot_p = approxDot/ballsFaced * 100
score_P = 100 - approxDot_p

print(round(approxDot_p,2))

#Plotting for Boundaries / Rotation Ratio
labels = 'Rotation', 'Boundaries'
sizes = [rotationRuns_p, boundaryRuns_p]
colors = ['yellowgreen', 'yellow']
explode = [0.1, 0] #Explode 1st Slice

#Plotting the Pie Chart
plt.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%', shadow=True, startangle=140)
plt.axis('equal')
plt.show()

#Plotting for Dot Ball Percentage
labels = 'Approx Dot %', 'Scoring %'
sizes = [approxDot_p, score_P]
colors = ['red', 'blue']
explode = [0.1, 0] #Explode 1st Slice

#Plotting the Pie Chart
plt.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%', shadow=True, startangle=140)
plt.axis('equal')
plt.show()

I guess now it seems that my problem has been resolved so you can mark this thread as solved. Your help on the matter @perfringo & @snippsat is much appreciated :)
Reply


Messages In This Thread
RE: How to capture Single Column from Web Html Table? - by ahmedwaqas92 - Jul-29-2019, 02:17 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Suggestion request for scrapping html table Vkkindia 3 2,038 Dec-06-2021, 06:09 PM
Last Post: Larz60+
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,646 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Help: Beautiful Soup - Parsing HTML table ironfelix717 2 2,696 Oct-01-2020, 02:19 PM
Last Post: snippsat
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,370 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Imprt HTML table to array meleghengersor 2 2,119 Jan-23-2020, 10:23 AM
Last Post: perfringo
  BeautifulSoup: Error while extracting a value from an HTML table kawasso 3 3,231 Aug-25-2019, 01:13 AM
Last Post: kawasso
  convert html table to json bhojendra 5 16,029 Jul-30-2019, 07:53 PM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020