@bowlofrred,
Thank you very much. I think my concepts are wrong and thank you for the time and trying to help me.
I am learning and not a programmer as such, to be honest.
In the actual view source extract, the data is not in new lines to execute this statement.
I added a below lines to the actual extract and I got the right answer. So Regex is fine but I need help to work on the actual data sample to make add a new line before and after <span id> then I think, I will get what I need. How do we do that,
For each line I need to check for <span id, and insert \n to the string.
<span id="AccordionHeaderText52741" class="clsAccordionHeaderText"><a id="AccordionHeaderTab52741" tabIndex="101" onkeydown="websys_ToggleAccordion('52741',event);">Billing</a></span>
<span id="AccordionHeaderText52741" class="clsAccordionHeaderText"><a id="AccordionHeaderTab52741" tabIndex="101" onkeydown="websys_ToggleAccordion('52741',event);">Anand</a></span>
<span id="AccordionHeaderText52741" class="clsAccordionHeaderText"><a id="AccordionHeaderTab52741" tabIndex="101" onkeydown="websys_ToggleAccordion('52741',event);">Python</a></span>
Here is my requirement
I need to do web scraping and pull out the menu items about 120 of them as I searched for onclick.. a keyword that follows the string into a text file.
When I run a test sample with just one matching, it matches with what I want to extract.
Example: In the actual file, I have these occurences of html,
onclick="websys_CaptionClickHandler(event);">Code</label>
I need to extract the word "Code" from this and repeat it for all the occurrences of it.
I really do not need when line or other text at all. I just need those words wrapped in between (event);"> DATA I NEEDED </label>
Please can you help?
Thanks in advance
Thank you very much. I think my concepts are wrong and thank you for the time and trying to help me.
I am learning and not a programmer as such, to be honest.
In the actual view source extract, the data is not in new lines to execute this statement.
I added a below lines to the actual extract and I got the right answer. So Regex is fine but I need help to work on the actual data sample to make add a new line before and after <span id> then I think, I will get what I need. How do we do that,
For each line I need to check for <span id, and insert \n to the string.
<span id="AccordionHeaderText52741" class="clsAccordionHeaderText"><a id="AccordionHeaderTab52741" tabIndex="101" onkeydown="websys_ToggleAccordion('52741',event);">Billing</a></span>
<span id="AccordionHeaderText52741" class="clsAccordionHeaderText"><a id="AccordionHeaderTab52741" tabIndex="101" onkeydown="websys_ToggleAccordion('52741',event);">Anand</a></span>
<span id="AccordionHeaderText52741" class="clsAccordionHeaderText"><a id="AccordionHeaderTab52741" tabIndex="101" onkeydown="websys_ToggleAccordion('52741',event);">Python</a></span>
Here is my requirement
I need to do web scraping and pull out the menu items about 120 of them as I searched for onclick.. a keyword that follows the string into a text file.
When I run a test sample with just one matching, it matches with what I want to extract.
Example: In the actual file, I have these occurences of html,
onclick="websys_CaptionClickHandler(event);">Code</label>
I need to extract the word "Code" from this and repeat it for all the occurrences of it.
I really do not need when line or other text at all. I just need those words wrapped in between (event);"> DATA I NEEDED </label>
Please can you help?
Thanks in advance
(Jun-09-2021, 03:38 PM)bowlofred Wrote: Do a check for when the match fails, then print out the line in question. If you don't understand why the match fails, you can ask about it.
Also, why are you matching every line in the text file, but then only printing out the match that happens on the last line?
import re with open('testregpad.txt') as f: for line in f: match = re.search('event\);\">(.*)<\/a.*', line) if match: print(match.group(1)) else: print(f"Failed to find a match in line: {line}")