Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Query regarding regex
#1
Hi,

I am a beginner of python trying to learn through jupyter notebook.
I have a text file that contains close to 100k lines.
One sample line is as below. I need to create 3 columns with the names cellId, rbStart, ipn.
Values of these names are part of each line. Sometimes, all lines may not carry the required data and may have different data in those lines.

Output:
Line 125: [2021-08-22 22:09:32.868188] 0x84e51a08=(bfn:2126, sfn:78, sf:5.40, bf:160) duId:1 EMCA4/BbiUniqueTrace 2 UPCUEULNR.207 upcueulnrce_ueactor_estimateulgain.c:242: <!UPCUEULNR.207!> TRACE3 cellId=2, bbUeRef=0x00009ea0 puschSfn=78 puschSlot=9 : Estimated UL gain from PUSCH power report. postEqSinr=61.0 rbStart=150 rbLength=10 ipn=-109.05000305175781 pcMaxC=25.5 backoffPsd=0.0 pcMaxInUse=25.5 ulPsdTxPhr=42.21849060058594 slotCounter=116236385
I have tried using the below code in Jupyter notebook. But it is resulting in only 1 line and not all the lines. That too it gives result like this:
Output:
['cellId', 'rbStart', 'rssi'] cellId=2 rbStart=150 ipn=-109.05000305175781
df = pd.read_table("C:\\Users\\newFile.txt")
for line in df:
    cellId = re.findall("cellId.\d+",line)
    rbStart = re.findall("rbStart.\d+",line)
    rssi = re.findall("ipn.[-]\d{3}.\d+",line)
    Headers = ['cellId', 'rbStart', 'rssi']
    print(Headers)
    print (cellId[0],rbStart[0],rssi[0])
Expectation is to print as below
Output:
cellId rbStart rssi 2 150 -109.05000305175781
Can you please help me with the corrections in my code.
Thanks in advance.
Yoriz write Jul-09-2022, 08:25 PM:
Please post all code, output and errors (in their entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Reply
#2
Why are you using pandas.read_table()? Your example data does not look like it is csv format. Can you change the file format?

Using your file format, which I found quite difficult to work with, I came up with this:
import re
import pandas as pd

data = (
    "TRACE3 cellId=2, postEqSinr=61.0 rbStart=150 rbLength=10 ipn=-109.05000305175781 slotCounter=116236385",
    "TRACE4 cellId=3, postEqSinr=61.0 rbStart=175 rbLength=10 ipn=-42 slotCounter=116236385",
    "TRACE5 cellId=6, postEqSinr=61.0 rbLength=10 ipn=3.14 slotCounter=116236385",
)

class Pattern:
    """Special re.findall() for finding label=value patterns"""
    def __init__(self, label):
        delimiters = "[:=]"
        self.pattern = re.compile(f"{label}{delimiters}\\S+")
        self.offset = len(label) + 1

    def find(self, line):
        """Search for label=value pattern,  Return value if found, else None"""
        match = re.findall(self.pattern, line)
        if match:
            return match[0][self.offset :].rstrip(".,")
        return None


labels = ("cellId", "rbStart", "ipn")
patterns = {label: Pattern(label) for label in labels}
columns = {label: [] for label in labels}

for line in data:
    for key, value in columns.items():
        value.append(patterns[key].find(line))

df = pd.DataFrame(columns)
print(df)
Output:
cellId rbStart ipn 0 2 150 -109.05000305175781 1 3 175 -42 2 6 None 3.14
I made some dummy data so I had multiple lines instead of one. I cut down the length a lot because the full length data is not needed for this purpose. The last line does not have a "rbStart" on purpose.

To change this to work with a file instead of a list of strings replace this:
data = (
    "TRACE3 cellId=2, postEqSinr=61.0 rbStart=150 rbLength=10 ipn=-109.05000305175781 slotCounter=116236385",
    "TRACE4 cellId=3, postEqSinr=61.0 rbStart=175 rbLength=10 ipn=-42 slotCounter=116236385",
    "TRACE5 cellId=6, postEqSinr=61.0 rbLength=10 ipn=3.14 slotCounter=116236385",
)

for line in data:
With this:
with open(r"C:\Users\newFile.txt", "r") as data:
    for line in data:
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Regex Expression With Code Query In Pandas eddywinch82 8 3,953 Apr-13-2022, 09:12 AM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020