Python Forum

Full Version: Python problem reading file
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I am reading a file that has over 200 thousand lines with users information.

When it finds lines with "|" it makes a split to separate the fields I identified as "name" that contains name and surname, followed by IP, MAC, etc.

If the field contains the string enter inside a word it performs a line break by error, when it should save the line with name, ip, mac, etc:

JUAN ARMENTEROS IP MAC
PEDRO REMENTER IP MAC

but with the line break python code generates two lines in the csv:

JUAN ARMENTER
OS IP MAC
PEDRO REMENTER
IP MAC

Recording in the database the disordered fields of the lines involved, duplicating its. I guess python interprets ENTER as the reserved word and then executes it.

How can I indicate that ignore this word and not executes it?

Thank you.



df = pd.DataFrame(columns=('extension','empleado','mac_address','validado','neqt','ipv4_address','dominio','nodo'))

num_cs = 0
with open('/shared/scripts/01_extraccion.txt', 'r', encoding="utf8", errors='ignore') as file:
    iloc_number = 0
    for linea in file:
        iloc_number = iloc_number + 1

        if "Welcome to cpuanode" in linea:
            nodo = linea[19:21]

        if "------------------------------------------------IP couplers defined in domain " in linea:
            dominio = linea[78:81].replace(" ","")
        campos = linea.split("|")

        if len(campos) > 10:
            if "QMCDU" not in campos[1]:

                extension = campos[1].strip()
                empleado = campos[2].strip()
                mac_address = campos[3].strip()
                validado = campos[4].strip()
                neqt = campos[5].strip()
                ipv4_address = campos[7].strip()
                dominio = dominio.strip()
                nodo = nodo
                print(extension,empleado,mac_address,validado,neqt,ipv4_address,dominio,nodo)
                df = df.append([{'extension':extension, 
                                'empleado':empleado,
                                'mac_address':mac_address,
                                'validado':validado,
                                'neqt':neqt,
                                'ipv4_address':ipv4_address,
                                'dominio':dominio,
                                'nodo':nodo}], 
                                ignore_index=True)          
    print(df)
file.close()