Aug-19-2019, 10:12 AM
(Aug-18-2019, 01:35 PM)ndc85430 Wrote: There's no question in your post, nor have you shown us what you've tried, but you can at least use Beautiful Soup to parse the HTML and extract the data you want from it.
below is the code I've tried until now (sorry I forgot to attach in the previous post), however the result includes additional strings as well which I didn't exactly want, I just need the player name from this string => ' to [Player Name], '
import numpy as mp import matplotlib.pyplot as plt from matplotlib import rc import pandas as pd from selenium import webdriver from selenium.webdriver.common.keys import Keys import time import re fname = 'PakVEng.html' HtmlFile = open(fname, 'r', encoding='utf-8') source_code = HtmlFile.read() HtmlFile.close() testVariable = str(source_code) vanillaData = testVariable.split('window.__INITIAL_STATE__ = {', 1)[0] + 'window.__INITIAL_STATE__ = {' totalCount = re.search(' to (.*), ', vanillaData) print(totalCount)This code gives me the following result
ahmedwaqas92@ideapad:~/Downloads$ python3 ImportHtml.py <_sre.SRE_Match object; span=(268692, 268742), match=' to Shadab Khan, <b>FOUR</b> runs, not quite 350,> ahmedwaqas92@ideapad:~/Downloads$The html file is on my computer for testing purposes only, if you want to take a look then go here is the link. All I need from this HTML file is all the player names which occur between ' to [player name], ' and the runs scored after it. This process repeats itself multiple times in the html file so I would want to count all of such instances and then put them in the form of a table.
(Aug-18-2019, 04:49 PM)snippsat Wrote: Can also take a look at Web-Scraping part-1.
I am successfully able to scrape the data onto a string variable, its just that I am having trouble in determining on how to extract sub strings that I want from the html file that I have obtained using Firefox driver selenium scroll and using page source.
This question was part of a larger project that I am working on independently. Below is the link of another question I asked on this thread.
https://python-forum.io/Thread-How-to-Ca...1#pid89131