Python Forum
Python Regular expression, small sample works but not on file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python Regular expression, small sample works but not on file
#1
Hi Experts,

Small sample:
<span id="AccordionHeaderText52741" class="clsAccordionHeaderText"><a id="AccordionHeaderTab52741" tabIndex="101" onkeydown="websys_ToggleAccordion('52741',event);">Billing</a></span>

a regular expression, which works in https://regex101.com/ is
",event\);\">(.*)<\/a.*"
The same works in python editor which pulls out "Billing"
import re
with open('testregpad.txt') as f:
    for line in f:
        match = re.search('event\);\">(.*)<\/a.*', line)
print(match.group(1))
output: Billing

But when I provide the entire page source to pull all the matching strings in the complete file, the code is failing.
Here is the error message both in sublime and jupyter.
Error:
traceback (most recent call last): File "C:\Data\Academic\Python\RegExtractData", line 5, in <module> print(match.group(1)) AttributeError: 'NoneType' object has no attribute 'group' [Finished in 120ms]
Help is appreciated.

Attached Files

.txt   textsample.txt (Size: 29.62 KB / Downloads: 232)
Reply
#2
When re.search() doesn't find a match, the returned object is None instead of a match object.
Reply
#3
(Jun-09-2021, 01:43 PM)Gribouillis Wrote: When re.search() doesn't find a match, the returned object is None instead of a match object.

Thank you @Gribouillis
Yes, you are right, but that is the help, I need as if the sample size is small, I get the right result, both in python editors and regular expressions online. regex https://regex101.com/
I tried to use findall as well but same result, if I take small sample, I get result but when I give entire page source to pull out the list, I get blank.
I have attached the regular expression result. You can find the same strings in the sample file.

Attached Files

Thumbnail(s)
       
Reply
#4
Do a check for when the match fails, then print out the line in question. If you don't understand why the match fails, you can ask about it.

Also, why are you matching every line in the text file, but then only printing out the match that happens on the last line?

import re
with open('testregpad.txt') as f:
    for line in f:
        match = re.search('event\);\">(.*)<\/a.*', line)
        if match:
            print(match.group(1))
        else:
            print(f"Failed to find a match in line: {line}")
Gribouillis likes this post
Reply
#5
@bowlofrred,

Thank you very much. I think my concepts are wrong and thank you for the time and trying to help me.
I am learning and not a programmer as such, to be honest.

In the actual view source extract, the data is not in new lines to execute this statement.
I added a below lines to the actual extract and I got the right answer. So Regex is fine but I need help to work on the actual data sample to make add a new line before and after <span id> then I think, I will get what I need. How do we do that,
For each line I need to check for <span id, and insert \n to the string.

<span id="AccordionHeaderText52741" class="clsAccordionHeaderText"><a id="AccordionHeaderTab52741" tabIndex="101" onkeydown="websys_ToggleAccordion('52741',event);">Billing</a></span>
<span id="AccordionHeaderText52741" class="clsAccordionHeaderText"><a id="AccordionHeaderTab52741" tabIndex="101" onkeydown="websys_ToggleAccordion('52741',event);">Anand</a></span>
<span id="AccordionHeaderText52741" class="clsAccordionHeaderText"><a id="AccordionHeaderTab52741" tabIndex="101" onkeydown="websys_ToggleAccordion('52741',event);">Python</a></span>

Here is my requirement
I need to do web scraping and pull out the menu items about 120 of them as I searched for onclick.. a keyword that follows the string into a text file.

When I run a test sample with just one matching, it matches with what I want to extract.

Example: In the actual file, I have these occurences of html,
onclick="websys_CaptionClickHandler(event);">Code</label>

I need to extract the word "Code" from this and repeat it for all the occurrences of it.
I really do not need when line or other text at all. I just need those words wrapped in between (event);"> DATA I NEEDED </label>

Please can you help?

Thanks in advance

(Jun-09-2021, 03:38 PM)bowlofred Wrote: Do a check for when the match fails, then print out the line in question. If you don't understand why the match fails, you can ask about it.

Also, why are you matching every line in the text file, but then only printing out the match that happens on the last line?

import re
with open('testregpad.txt') as f:
    for line in f:
        match = re.search('event\);\">(.*)<\/a.*', line)
        if match:
            print(match.group(1))
        else:
            print(f"Failed to find a match in line: {line}")
Reply
#6
In the code you posted you are running that match once on every line in the text file. (And you are ignoring the success or failure of that match). You could check the success each time and then exit the loop when you find a match.

Or, if the text file isn't too big, and there's only one place it can match, then don't do it on each line one-at-a-time, just run it on the entire file.

import re
with open('testregpad.txt') as f:
    match = re.search('event\);\">(.*)<\/a.*', f.read())
    if match:
        print(match.group(1))
    else:
        print("Couldn't find a match in the file.")
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Regular Expression search to comment lines of code Gman2233 5 359 Sep-08-2022, 06:57 AM
Last Post: ndc85430
  List Creation and Position of Continue Statement In Regular Expression Code new_coder_231013 3 747 Jun-15-2022, 12:00 PM
Last Post: new_coder_231013
  Convert nested sample json api data into csv in python shantanu97 3 719 May-21-2022, 01:30 PM
Last Post: deanhystad
  Need help with my code (regular expression) shailc 5 893 Apr-04-2022, 07:34 PM
Last Post: shailc
  Regular Expression for matching words xinyulon 1 1,277 Mar-09-2022, 10:34 PM
Last Post: snippsat
Question Opening small size browser with python selenium not work, need help greenpine 0 766 Feb-07-2022, 11:36 AM
Last Post: greenpine
  Sample labels from excel file in order to put them on x-axis and y-axis of a plot hobbyist 11 2,710 Sep-14-2021, 08:29 AM
Last Post: hobbyist
  regular expression question Skaperen 4 1,616 Aug-23-2021, 06:01 PM
Last Post: Skaperen
  How can I find all combinations with a regular expression? AlekseyPython 0 1,024 Jun-23-2021, 04:48 PM
Last Post: AlekseyPython
  Using Regex Expression With Isin in Python eddywinch82 0 1,600 Apr-04-2021, 06:25 PM
Last Post: eddywinch82

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020