Python Forum
Help with generating regx Pattern please - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Help with generating regx Pattern please (/thread-42866.html)



Help with generating regx Pattern please - lastyle - Aug-26-2024

Hi all, i need to extract "Names" from a Textfile
which i liked to solve via Regx, but i fail to generate the correct pattern for that. Can somebody please help me out ?

i need to extract after the second Number until the end of Line

˜16524:˜ 1 §š£™£ž£LÁ-SÔÙÌEž£™£š£¥
˜11241:˜ 159 —.M˜A›CžX.

to
§š£™£ž£LÁ-SÔÙÌEž£™£š£¥
—.M˜A›CžX.

but my current output is

♣:˜ 1 ▼§š£™£ž£♣LÁ-SÔÙÌEž£™£š£▼¥
♣:˜ 159 —.M˜A›CžX♣.

A sample file with 21 Lines is attached and my current code looks like :

import re

def parse_username(content):
    print(f"Content after seconds: {content.strip()}")

def process_file(file_path, encoding='utf-8'):
    pattern = re.compile(r'\d+(.*)')
    
    try:
        with open(file_path, 'r', encoding=encoding) as file:
            for line in file:
                match = pattern.search(line)
                if match:
                    content_after_number = match.group(1)
                    parse_username(content_after_number)
    except UnicodeDecodeError as e:
        print(f"Error decoding file: {e}")
    except FileNotFoundError:
        print(f"File not found: {file_path}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

process_file('names.txt')
Thanks in Advance


RE: Help with generating regx Pattern please - DeaD_EyE - Aug-27-2024

import re
from io import StringIO


text_file_like = StringIO(
    """
    ˜16524:˜ 1 §š£™££LÁ-SÔÙÌE£™£š£¥
	˜11241:˜ 159 —.M˜A›CX.
	"""
)

# maybe a regex could solve the problem
# but it could make the problem harder to solve
# test it on https://regex101.com/
REGEX = re.compile(r"˜\d{5}:˜ \d+ (.+)")

def get_names(fd):
    for line in fd:
        if match := REGEX.search(line):
            yield match.group(1)


for name in get_names(text_file_like):
    print(name)
Output:
§š£™££LÁ-SÔÙÌE£™£š£¥ —.M˜A›CX.



RE: Help with generating regx Pattern please - Pedroski55 - Aug-28-2024

Maybe like this:

import re

s = "˜16524:˜ 1 §š£™£ž£LÁ-SÔÙÌEž£™£š£¥"
t = "˜11241:˜ 159 —.M˜A›CžX."
e = re.compile(r'(?<=\d\s)([\w\W]*)')

res_s = e.search(s)
<re.Match object; span=(12, 37), match='\x1f§š£™£\x9e£\x05LÁ-SÔÙÌE\x9e£™£š£\x1f¥'>
print(res_s.group())
res_t = e.search(t)
<re.Match object; span=(14, 25), match='—.M˜A›C\x9eX\x05.'>
print(res_t.group())
Gives:

Output:
—.M˜A›CžX. §š£™£ž£LÁ-SÔÙÌEž£™£š£¥



RE: Help with generating regx Pattern please - snippsat - Aug-28-2024

import re

text = '''\
˜16524:˜ 1 §š£™£ž£LÁ-SÔÙÌEž£™£š£¥
˜11241:˜ 159 —.M˜A›CžX.
˜11243:˜ 90 š™ÄŸIšDI
˜11245:˜ 89 BITSHAKER
˜11247:˜ 20 °À³™È›OŸLY˜ œÍO˜SEŸS›«™ÀÀ®’
˜11248:˜ 11 EAGLEWING
˜11252:˜ 2 ›ÔAPE˜R/ÔÒÉ—ÁÄ
˜11257:˜ 103 ÒEDÁLERT
˜11260:˜ 30 ÓT0RMFR0NT
˜11268:˜ 189 NFODIZ
˜11269:˜ 74 ÓENTINEL/ÅXCESS
˜11270:˜ 90 š™ÄŸIšDI
˜11272:˜ 13 ¤¤¬»PCOLLžI–NS¬»¤¤
˜11276:˜ 75 ×EASEL
˜11278:˜ 82 ÊAZZCAT
˜11286:˜ 105 Ï™š™›™˜šĞŸÔ˜™Ià ™ÆŸÒšÅÅ›šÚŸÅ
˜11290:˜ 172 ROTTEROY
˜11294:˜ 185 ĞITCHER
˜11299:˜ 121 MORPHFROG
˜11300:˜ 156 ÂLACK ÂEARD
˜11317:˜ 80 HEDNING'''

pattern = r'^\˜\d+:˜ \d+ (.+)'
for match in re.finditer(pattern, text, re.MULTILINE):
    print(match.group(1))
Output:
§š£™£ž£LÁ-SÔÙÌEž£™£š£¥ —.M˜A›CžX. š™ÄŸIšDI BITSHAKER °À³™È›OŸLY˜ œÍO˜SEŸS›«™ÀÀ®’ EAGLEWING ›ÔAPE˜R/ÔÒÉ—ÁÄ ÒEDÁLERT ÓT0RMFR0NT NFODIZ ÓENTINEL/ÅXCESS š™ÄŸIšDI ¤¤¬»PCOLLžI–NS¬»¤¤ ×EASEL ÊAZZCAT Ï™š™›™˜šĞŸÔ˜™Ià ™ÆŸÒšÅÅ›šÚŸÅ ROTTEROY ĞITCHER MORPHFROG ÂLACK ÂEARD HEDNING