Python Forum
How to Find & Count String Patterns Between two Markers in a HTML file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to Find & Count String Patterns Between two Markers in a HTML file
#1
Hello,

I have a HTML file from which I am trying to extract strings so that I can count them and then hopefully export the analysis of those results in the form of a table.

My HTML file has many "<div></div>" tags that house a repetitive unique pattern. The pattern is:

" to [Name of Player], "

The only thing that is standard in the above are the blank spaces before and after the 'to' along with the comma and space at the end. I need a way to get the full player name from the above pattern based on my HTML file.


The HTML File is as below

</div>
                                            <div class="commentary-item  ">
                                                <div class="item-wrapper">
                                                    <div class="over">
                                                        <div class="time-stamp">1.6</div>
                                                        <div class="over-circle low-score"><span class="over-score">0</span></div>
                                                    </div>
                                                    <div class="description">Archer to Fakhar Zaman, no run, wide ball prompts a rash swing from Fakhar. Bat nowhere near ball. Better over from England</div>
                                                </div><!-- react-text: 3750 -->
                                                <!-- /react-text -->
                                            </div>
                                            <div class="commentary-item  ">
                                                <div class="item-wrapper">
                                                    <div class="over">
                                                        <div class="time-stamp">1.5</div>
                                                        <div class="over-circle low-score"><span class="over-score">0</span></div>
                                                    </div>
                                                    <div class="description">Archer to Fakhar Zaman, no run, beats the inside edge this time. Bit of swing for Archer. This has been a challenging over</div>
                                                </div><!-- react-text: 3758 -->
                                                <!-- /react-text -->
                                            </div>
                                            <div class="commentary-item pre ">
                                                <div class="item-wrapper">
                                                    <div class="over">
                                                        <div class="time-stamp">1.4</div>
                                                        <div class="over-circle low-score"><span class="over-score">0</span></div>
                                                    </div>
                                                    <div class="description">Archer to Fakhar Zaman, no run, beautiful bowling. Good length, at pace and swinging away from Imam's outside edge</div>
                                                </div>
                                                <div>
                                                    <p class="comment">"Pakistan could learn a lot from the fearless approach of the Bangladeshi batsmen", Nathan Barnes says. "Will they have the courage to take a risk? "</p>
                                                </div>
                                            </div>
                                            <div class="commentary-item  ">
                                                <div class="item-wrapper">
                                                    <div class="over">
                                                        <div class="time-stamp">1.3</div>
                                                        <div class="over-circle low-score"><span class="over-score">0</span></div>
                                                    </div>
                                                    <div class="description">Archer to Fakhar Zaman, no run, shorter and wider, left alone this time</div>
                                                </div><!-- react-text: 3775 -->
                                                <!-- /react-text -->
                                            </div>
                                            <div class="commentary-item  ">
                                                <div class="item-wrapper">
                                                    <div class="over">
                                                        <div class="time-stamp">1.2</div>
                                                        <div class="over-circle low-score"><span class="over-score">4lb</span></div>
                                                    </div>
                                                    <div class="description">Archer to Fakhar Zaman, 4 leg byes, too straight, almost legsid-ish, and Fakhar flicks it off the hips to the fine leg boundary. This is a <strong>good start for Pakistan</strong></div>
                                                </div><!-- react-text: 3783 -->
                                                <!-- /react-text -->
                                            </div>
                                            <div class="commentary-item  ">
                                                <div class="item-wrapper">
                                                    <div class="over">
                                                        <div class="time-stamp">1.1</div>
                                                        <div class="over-circle low-score"><span class="over-score">1</span></div>
                                                    </div>
                                                    <div class="description">Archer to Imam-ul-Haq, 1 run, immediate threat from Archer. Straighter, squared Imam up, an outside edge trickles through to third man</div>
                                                </div><!-- react-text: 3791 -->
                                                <!-- /react-text -->
                                            </div>
                                            <div class="commentary-item end-of-over">
                                                <h4>
                                                    <!-- react-text: 3794 -->END OF OVER:
                                                    <!-- /react-text -->
                                                    <div>
                                                        <!-- react-text: 3796 -->1 | 9 Runs | PAK: 9/0
                                                        <!-- /react-text -->
                                                        <!-- react-text: 3797 -->| RR: 9.00
                                                        <!-- /react-text -->
                                                    </div>
                                                </h4>
                                                <ul class="two-col-table">
                                                    <li>Fakhar Zaman<span>8 (4b)</span></li>
                                                    <li>Imam-ul-Haq<span>1 (2b)</span></li>
                                                </ul>
                                                <ul class="two-col-table">
                                                    <li>Chris Woakes<span>1-0-9-0</span></li>
                                                </ul>
                                            </div>
                                            <div class="commentary-item  ">
                                                <div class="item-wrapper">
                                                    <div class="over">
                                                        <div class="time-stamp">0.6</div>
                                                        <div class="over-circle high-score"><span class="over-score">4</span></div>
                                                    </div>
                                                    <div class="description">Woakes to Fakhar Zaman, <b>FOUR</b> runs, <strong>slashed away</strong> by Zaman. Got just a bit more room, fuller delivery angling away from the left-hander, and a slice gets it just wide of the man at point.</div>
                                                </div><!-- react-text: 3810 -->
                                                <!-- /react-text -->
                                            </div>
                                            <div class="commentary-item  ">
                                                <div class="item-wrapper">
                                                    <div class="over">
                                                        <div class="time-stamp">0.5</div>
                                                        <div class="over-circle low-score"><span class="over-score">0</span></div>
                                                    </div>
                                                    <div class="description">Woakes to Fakhar Zaman, no run, driven to mid-off</div>
                                                </div><!-- react-text: 3818 -->
                                                <!-- /react-text -->
                                            </div>
                                            <div class="commentary-item  ">
                                                <div class="item-wrapper">
                                                    <div class="over">
                                                        <div class="time-stamp">0.4</div>
                                                        <div class="over-circle high-score"><span class="over-score">4</span></div>
                                                    </div>
                                                    <div class="description">Woakes to Fakhar Zaman, <b>FOUR</b> runs, sliced away to the offside. Looked too close to cut, and there was no timing, but surprising misfield in the covers lets the ball squirt through to the boundary. Morgan the man who misfielded</div>
                                                </div><!-- react-text: 3826 -->
                                                <!-- /react-text -->
                                            </div>
                                            <div class="commentary-item  ">
                                                <div class="item-wrapper">
                                                    <div class="over">
                                                        <div class="time-stamp">0.3</div>
                                                        <div class="over-circle low-score"><span class="over-score">0</span></div>
                                                    </div>
                                                    <div class="description">Woakes to Fakhar Zaman, no run, less room this time. Short and straighter, but Fakhar shoulders arms</div>
                                                </div><!-- react-text: 3834 -->
                                                <!-- /react-text -->
                                            </div>
                                            <div class="commentary-item  ">
                                                <div class="item-wrapper">
                                                    <div class="over">
                                                        <div class="time-stamp">0.2</div>
                                                        <div class="over-circle low-score"><span class="over-score">1</span></div>
                                                    </div>
                                                    <div class="description">Woakes to Imam-ul-Haq, 1 run, similar line and length, cut away to third man, and Imam is up and running</div>
                                                </div><!-- react-text: 3842 -->
                                                <!-- /react-text -->
                                            </div>
                                            <div class="commentary-item pre ">
                                                <div class="item-wrapper">
                                                    <div class="over">
                                                        <div class="time-stamp">0.1</div>
                                                        <div class="over-circle low-score"><span class="over-score">0</span></div>
                                                    </div>
                                                    <div class="description">Woakes to Imam-ul-Haq, no run, wide outside off stump, Imam lets it go safely by</div>
                                                </div>
                                                <div>
Reply


Messages In This Thread
How to Find & Count String Patterns Between two Markers in a HTML file - by ahmedwaqas92 - Aug-18-2019, 01:18 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Need to replace a string with a file (HTML file) tester_V 1 699 Aug-30-2023, 03:42 AM
Last Post: Larz60+
  FileNotFoundError: [WinError 2] The system cannot find the file specified NewBiee 2 1,496 Jul-31-2023, 11:42 AM
Last Post: deanhystad
  Regex Include and Exclude patterns in Same Expression starzar 2 736 May-23-2023, 09:12 AM
Last Post: Gribouillis
  The included URLconf 'scribimus.urls' does not appear to have any patterns in it. nengkya 0 1,037 Mar-03-2023, 08:29 PM
Last Post: nengkya
  Cannot find py credentials file standenman 5 1,554 Feb-25-2023, 08:30 PM
Last Post: Jeff900
  selenium can't find a file in my desk ? SouAmego22 0 701 Feb-14-2023, 03:21 PM
Last Post: SouAmego22
  Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row AaronCatolico1 0 877 Dec-25-2022, 06:28 PM
Last Post: AaronCatolico1
  How to remove patterns of characters from text aaander 4 1,068 Nov-19-2022, 03:34 PM
Last Post: snippsat
  Find (each) element from a list in a file tester_V 3 1,155 Nov-15-2022, 08:40 PM
Last Post: tester_V
  Row Count and coloumn count Yegor123 4 1,268 Oct-18-2022, 03:52 AM
Last Post: Yegor123

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020