Python Forum
Help with generating regx Pattern please
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Help with generating regx Pattern please
#1
Hi all, i need to extract "Names" from a Textfile
which i liked to solve via Regx, but i fail to generate the correct pattern for that. Can somebody please help me out ?

i need to extract after the second Number until the end of Line

˜16524:˜ 1 §š£™£ž£LÁ-SÔÙÌEž£™£š£¥
˜11241:˜ 159 —.M˜A›CžX.

to
§š£™£ž£LÁ-SÔÙÌEž£™£š£¥
—.M˜A›CžX.

but my current output is

♣:˜ 1 ▼§š£™£ž£♣LÁ-SÔÙÌEž£™£š£▼¥
♣:˜ 159 —.M˜A›CžX♣.

A sample file with 21 Lines is attached and my current code looks like :

import re

def parse_username(content):
    print(f"Content after seconds: {content.strip()}")

def process_file(file_path, encoding='utf-8'):
    pattern = re.compile(r'\d+(.*)')
    
    try:
        with open(file_path, 'r', encoding=encoding) as file:
            for line in file:
                match = pattern.search(line)
                if match:
                    content_after_number = match.group(1)
                    parse_username(content_after_number)
    except UnicodeDecodeError as e:
        print(f"Error decoding file: {e}")
    except FileNotFoundError:
        print(f"File not found: {file_path}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

process_file('names.txt')
Thanks in Advance

Attached Files

.txt   names.txt (Size: 766 bytes / Downloads: 4)
Reply
#2
import re
from io import StringIO


text_file_like = StringIO(
    """
    ˜16524:˜ 1 §š£™££LÁ-SÔÙÌE£™£š£¥
	˜11241:˜ 159 —.M˜A›CX.
	"""
)

# maybe a regex could solve the problem
# but it could make the problem harder to solve
# test it on https://regex101.com/
REGEX = re.compile(r"˜\d{5}:˜ \d+ (.+)")

def get_names(fd):
    for line in fd:
        if match := REGEX.search(line):
            yield match.group(1)


for name in get_names(text_file_like):
    print(name)
Output:
§š£™££LÁ-SÔÙÌE£™£š£¥ —.M˜A›CX.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
Maybe like this:

import re

s = "˜16524:˜ 1 §š£™£ž£LÁ-SÔÙÌEž£™£š£¥"
t = "˜11241:˜ 159 —.M˜A›CžX."
e = re.compile(r'(?<=\d\s)([\w\W]*)')

res_s = e.search(s)
<re.Match object; span=(12, 37), match='\x1f§š£™£\x9e£\x05LÁ-SÔÙÌE\x9e£™£š£\x1f¥'>
print(res_s.group())
res_t = e.search(t)
<re.Match object; span=(14, 25), match='—.M˜A›C\x9eX\x05.'>
print(res_t.group())
Gives:

Output:
—.M˜A›CžX. §š£™£ž£LÁ-SÔÙÌEž£™£š£¥
Reply
#4
import re

text = '''\
˜16524:˜ 1 §š£™£ž£LÁ-SÔÙÌEž£™£š£¥
˜11241:˜ 159 —.M˜A›CžX.
˜11243:˜ 90 š™ÄŸIšDI
˜11245:˜ 89 BITSHAKER
˜11247:˜ 20 °À³™È›OŸLY˜ œÍO˜SEŸS›«™ÀÀ®’
˜11248:˜ 11 EAGLEWING
˜11252:˜ 2 ›ÔAPE˜R/ÔÒÉ—ÁÄ
˜11257:˜ 103 ÒEDÁLERT
˜11260:˜ 30 ÓT0RMFR0NT
˜11268:˜ 189 NFODIZ
˜11269:˜ 74 ÓENTINEL/ÅXCESS
˜11270:˜ 90 š™ÄŸIšDI
˜11272:˜ 13 ¤¤¬»PCOLLžI–NS¬»¤¤
˜11276:˜ 75 ×EASEL
˜11278:˜ 82 ÊAZZCAT
˜11286:˜ 105 Ï™š™›™˜šĞŸÔ˜™Ià ™ÆŸÒšÅÅ›šÚŸÅ
˜11290:˜ 172 ROTTEROY
˜11294:˜ 185 ĞITCHER
˜11299:˜ 121 MORPHFROG
˜11300:˜ 156 ÂLACK ÂEARD
˜11317:˜ 80 HEDNING'''

pattern = r'^\˜\d+:˜ \d+ (.+)'
for match in re.finditer(pattern, text, re.MULTILINE):
    print(match.group(1))
Output:
§š£™£ž£LÁ-SÔÙÌEž£™£š£¥ —.M˜A›CžX. š™ÄŸIšDI BITSHAKER °À³™È›OŸLY˜ œÍO˜SEŸS›«™ÀÀ®’ EAGLEWING ›ÔAPE˜R/ÔÒÉ—ÁÄ ÒEDÁLERT ÓT0RMFR0NT NFODIZ ÓENTINEL/ÅXCESS š™ÄŸIšDI ¤¤¬»PCOLLžI–NS¬»¤¤ ×EASEL ÊAZZCAT Ï™š™›™˜šĞŸÔ˜™Ià ™ÆŸÒšÅÅ›šÚŸÅ ROTTEROY ĞITCHER MORPHFROG ÂLACK ÂEARD HEDNING
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020