Python: Regex is not good for re.search (AttributeError: 'NoneType' object has no att

Melcu54 · Jun-28-2023, 06:46 AM

In my html file I have this line:

	<div class="color-black mt-lg-0" id="hidden">, in</div>
    <a href="https://neculaifantanaru.com/en/leadership-pro.html" title="View all articles from Leadership Pro" class="color-green font-weight-600 mx-1" id="hidden">Leadership Pro</a>

I use this regex:

^\s*<a href="(.*?)" title="View`

in order to find this link

https://neculaifantanaru.com/en/leadership-pro.html

In notepad++ the regex search is ok !

The problem is in Python.

FIND: (on line 18)

b_content = re.search('^\s*<a href="(.*?)" title="View', new_file_content).group(1)

REPLACE:

old_file_content = re.sub(', in <a href="(.*?)" title="Vezi', f', in <a href="{b_content}" title="Vezi', old_file_content)

Gives me this error on line 18:

    Traceback (most recent call last):
      File "<module2>", line 18, in <module>
    AttributeError: 'NoneType' object has no attribute 'group'

I, also, try to change that line with:

b_content = re.match(r'^\s*<a href="(.*?)" title="View', new_file_content).group(1)

but I get the same error.

bowlofred · Jun-28-2023, 07:32 AM

Doing regex on HTML is super annoying. Use an HTML parser instead (like beautifulsoup).

Your regex is anchored at the front of the string. If your new_file_content contains the entire file, then the match will fail. When I try your command, but with only the second line in that variable, it matches.

**Gribouillis** · (This post was last modified: Jun-28-2023, 07:43 AM by Gribouillis.)

(Jun-28-2023, 07:32 AM)bowlofred Wrote: Your regex is anchored at the front of the string.

The regex multiline mode (?m) could do the trick.

Melcu54 · Jun-28-2023, 07:50 AM

(Jun-28-2023, 07:42 AM)Gribouillis Wrote:
(Jun-28-2023, 07:32 AM)bowlofred Wrote: Your regex is anchored at the front of the string.
The regex multiline mode (?m) could do the trick.

hello. Can you update my code as to understand better ?

***snippsat*** · (This post was last modified: Jun-28-2023, 08:15 AM by snippsat.)

The old classic read Cool

from bs4 import BeautifulSoup
import re

html = '''\
<div class="color-black mt-lg-0" id="hidden">, in</div>
<a href="https://neculaifantanaru.com/en/leadership-pro.html" title="View all articles from Leadership Pro" class="color-green font-weight-600 mx-1" id="hidden">Leadership Pro</a>
'''

soup = BeautifulSoup(html, 'html.parser')
link = soup.find('a').get('href')
print(link)

Output:
https://neculaifantanaru.com/en/leadership-pro.html

If you wonder about a working regex,but as in link should not use regex with HTML/XML.
Can work in smaller part aa here,but can/will blow up with errors in lager HTML.

>>> import re
>>> 
>>> b_content = re.search(r"<a href=\"(.*?)\"", html).group(1)
>>> b_content
'https://neculaifantanaru.com/en/leadership-pro.html'

Melcu54 · Jun-28-2023, 08:25 AM

import re

# Citește conținutul fișierului new-file.html
with open('c:/Folder7/new-file.html', 'r') as file:
    first_code = file.read()

# Citește conținutul fișierului old-file.html
with open('c:/Folder7/old-file.html', 'r') as file:
    second_code = file.read()

# Extrage URL-ul din first_code
match = re.search('<a href="(.*?)" title="View all articles', first_code)
if match is not None:
    url = match.group(1)
    # Înlocuiește URL-ul în second_code
    second_code = re.sub(', in <a href=".*?" title="Vezi toate', f', in <a href="{url}" title="Vezi toate', second_code)

    # Scrie conținutul modificat înapoi în old-file.html
    with open('c:/Folder7/old-file.html', 'w') as file:
        file.write(second_code)
else:
    print("No match found")

**Gribouillis** · (This post was last modified: Jun-28-2023, 08:44 AM by Gribouillis.)

(Jun-28-2023, 07:50 AM)Melcu54 Wrote: Can you update my code as to understand better ?

Add (?m) at the beginning the regex as specified in the re.MULTILINE documentation. It is very useful to read the documentation.

***snippsat*** · Jun-28-2023, 09:47 AM

As advised no regex 🔨with HTML/XML.

from bs4 import BeautifulSoup

with open('file.html') as file:
    first_code = file.read()

with open('old-file.html') as file:
    second_code = file.read()

soup = BeautifulSoup(first_code, 'html.parser')
link = soup.find('a')
link['href'] = second_code

with open('old-file.html', 'w') as file:
    file.write(soup.prettify())

Output:<div class="color-black mt-lg-0" id="hidden">
 , in
</div>
<a class="color-green font-weight-600 mx-1" href="https://python-forum.io" id="hidden" title="View all articles from Leadership Pro">
 Leadership Pro
</a>

Melcu54 · Jun-28-2023, 10:55 AM

thank you veru much

Melcu54 · Jun-28-2023, 11:13 AM

SOLUTION 1:

FIND:

b_content = re.search('^\s*<a href="(.*?)" title="View', new_file_content).group(1)

REPLACE:

old_file_content = re.sub(', in <a href="(.*?)" title="Vezi', f', in <a href="{b_content}" title="Vezi', old_file_content)

SOLUTION 2:

FIND:

b_content = re.match(r'^\s*<a href="(.*?)" title="View', new_file_content).group(1)

REPLACE:

old_file_content = re.sub(', in <a href="(.*?)" title="Vezi', f', in <a href="{b_content}" title="Vezi', old_file_content)

SOLUTION 3:

import re

b_content = re.match(r'^\s*<a href="(.*?)" title="View', new_file_content)
if b_content is not None:
    b_content = b_content.group(1)
else:
    b_content = "No match found"

SOLUTION 4:

import re

match = re.search('^\s*<a href="(.*?)" title="View', new_file_content)
if match is not None:
    b_content = match.group(1)
    old_file_content = re.sub(', in <a href="(.*?)" title="Vezi', f', in <a href="{b_content}" title="Vezi', old_file_content)
else:
    print("No match found")

SOLUTION 5: (use re.MULTILINE )

import re

match = re.search('^\s*<a href="(.*?)" title="View', new_file_content, re.MULTILINE)
if match is not None:
    b_content = match.group(1)
    old_file_content = re.sub(', in <a href="([^"]*)" title="Vezi', f', in <a href="{b_content}" title="Vezi', old_file_content)
else:
    print("No match found")

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	AttributeError: 'NoneType' re.search	philnyland	1	303	Jan-20-2024, 03:24 AM Last Post: deanhystad
	TypeError: 'NoneType' object is not subscriptable	TheLummen	4	762	Nov-27-2023, 11:34 AM Last Post: TheLummen
	TypeError: 'NoneType' object is not callable	akbarza	4	1,020	Aug-24-2023, 05:14 PM Last Post: snippsat
	AttributeError: '_tkinter.tkapp' object has no attribute 'username'	Konstantin23	4	1,757	Aug-04-2023, 12:41 PM Last Post: Konstantin23
	Python: AttributeError: 'PageObject' object has no attribute 'extract_images'	Melcu54	2	3,926	Jun-18-2023, 07:47 PM Last Post: Melcu54
	cx_oracle Error - AttributeError: 'function' object has no attribute 'cursor'	birajdarmm	1	2,399	Apr-15-2023, 05:17 PM Last Post: deanhystad
	search file by regex	SamLiu	1	921	Feb-23-2023, 01:19 PM Last Post: deanhystad
	Pandas AttributeError: 'DataFrame' object has no attribute 'concat'	Sameer33	5	5,685	Feb-17-2023, 06:01 PM Last Post: Sameer33
	Python Regex	quest	2	2,364	Sep-22-2022, 03:15 AM Last Post: quest
	TypeError: 'NoneType' object is not subscriptable	syafiq14	3	5,274	Sep-19-2022, 02:43 PM Last Post: Larz60+

Python: Regex is not good for re.search (AttributeError: 'NoneType' object has no att

User Panel Messages

Announcements