Python Forum
How to Find & Count String Patterns Between two Markers in a HTML file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to Find & Count String Patterns Between two Markers in a HTML file
#4
(Aug-18-2019, 01:35 PM)ndc85430 Wrote: There's no question in your post, nor have you shown us what you've tried, but you can at least use Beautiful Soup to parse the HTML and extract the data you want from it.

below is the code I've tried until now (sorry I forgot to attach in the previous post), however the result includes additional strings as well which I didn't exactly want, I just need the player name from this string => ' to [Player Name], '

import numpy as mp
import matplotlib.pyplot as plt
from matplotlib import rc
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import re

fname = 'PakVEng.html'
HtmlFile = open(fname, 'r', encoding='utf-8')
source_code = HtmlFile.read()
HtmlFile.close()

testVariable = str(source_code)
vanillaData = testVariable.split('window.__INITIAL_STATE__ = {', 1)[0] + 'window.__INITIAL_STATE__ = {'

totalCount = re.search(' to (.*), ', vanillaData)
print(totalCount)
This code gives me the following result

ahmedwaqas92@ideapad:~/Downloads$ python3 ImportHtml.py 
<_sre.SRE_Match object; span=(268692, 268742), match=' to Shadab Khan, <b>FOUR</b> runs, not quite 350,>
ahmedwaqas92@ideapad:~/Downloads$
The html file is on my computer for testing purposes only, if you want to take a look then go here is the link. All I need from this HTML file is all the player names which occur between ' to [player name], ' and the runs scored after it. This process repeats itself multiple times in the html file so I would want to count all of such instances and then put them in the form of a table.

(Aug-18-2019, 04:49 PM)snippsat Wrote: Can also take a look at Web-Scraping part-1.

I am successfully able to scrape the data onto a string variable, its just that I am having trouble in determining on how to extract sub strings that I want from the html file that I have obtained using Firefox driver selenium scroll and using page source.

This question was part of a larger project that I am working on independently. Below is the link of another question I asked on this thread.

https://python-forum.io/Thread-How-to-Ca...1#pid89131
Reply


Messages In This Thread
RE: How to Find & Count String Patterns Between two Markers in a HTML file - by ahmedwaqas92 - Aug-19-2019, 10:12 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Need to replace a string with a file (HTML file) tester_V 1 761 Aug-30-2023, 03:42 AM
Last Post: Larz60+
  FileNotFoundError: [WinError 2] The system cannot find the file specified NewBiee 2 1,562 Jul-31-2023, 11:42 AM
Last Post: deanhystad
  Regex Include and Exclude patterns in Same Expression starzar 2 784 May-23-2023, 09:12 AM
Last Post: Gribouillis
  The included URLconf 'scribimus.urls' does not appear to have any patterns in it. nengkya 0 1,067 Mar-03-2023, 08:29 PM
Last Post: nengkya
  Cannot find py credentials file standenman 5 1,632 Feb-25-2023, 08:30 PM
Last Post: Jeff900
  selenium can't find a file in my desk ? SouAmego22 0 739 Feb-14-2023, 03:21 PM
Last Post: SouAmego22
  Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row AaronCatolico1 0 923 Dec-25-2022, 06:28 PM
Last Post: AaronCatolico1
  How to remove patterns of characters from text aaander 4 1,109 Nov-19-2022, 03:34 PM
Last Post: snippsat
  Find (each) element from a list in a file tester_V 3 1,205 Nov-15-2022, 08:40 PM
Last Post: tester_V
  Row Count and coloumn count Yegor123 4 1,321 Oct-18-2022, 03:52 AM
Last Post: Yegor123

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020