In need of insight regarding Python file reading mechanisms.

EnfantNicolas · Sep-17-2021, 09:22 AM

I am a novice trying to do a simple thing:

read a huge (80 mb of text, around 500000 lines) database file -> make a copy of it while only changing a select portion of it

For this I have the following code, it's not complete yet because it has already run into a problem:

with open("ROTOR_EMR4_BY_R15.cdb", "r") as old_database:    # opening files with content manager is good practice as it
    with open("old_output.txt", "r+") as old_copy:          # closes the file automatically at the end of the operations

        old_content = old_database.readlines()

        for x in old_content:
            line = str(x)
            print(line)

            if line == "(3i9,6e21.13e3)":
                old_copy.write("EUREKA")
            else:
                old_copy.write(line)

The code "works" as the old_output.txt is created and is identical to the original CDB database.
However, the IF statement is never triggered and EUREKA is nowhere to be found.

My suspicion is that before reading Python changes the format of the database, reads it, then before writing, changes it back to the original. So I added the a print command to see what Python "sees" and, fair enough, it doesn't look anything like the original.
I can't possibly paste the whole thing here so I will use screenshots.

[Image: unknown.png]

This (3i9,6e20.13) is the bit of information I want Python to find and you can clearly see that it's there. It also happens to be the fortran formatting code of the database.

[Image: unknown.png?width=960&height=483]

[Image: unknown.png?width=960&height=483]

As you can see, (3i9,6e21.13e3) is nowhere to be found, so of course the IF statement won't trigger!

Would anybody know what formatting python is using and how to change it?

**Larz60+** · Sep-17-2021, 11:51 AM

Some questions:

I am not familiar with CDB database, is that Oracle?
And can it be read as a flat file as you are trying to do?
you are searching for text "(3i9,6e21.13e3)" including parenthesis and all, is that your goal?
Do you know for a fact that that is contained in the file?

EnfantNicolas · Sep-17-2021, 12:10 PM

(Sep-17-2021, 11:51 AM)Larz60+ Wrote: Some questions:
I am not familiar with CDB database, is that Oracle?

And can it be read as a flat file as you are trying to do?

you are searching for text "(3i9,6e21.13e3)" including parenthesis and all, is that your goal?

Do you know for a fact that that is contained in the file?

* I don't know what oracle is. This is a file that was made using ANSYS 15.0.7, a simulations program.
In fact the important information of this database is located after the (3i9,6e21.13e3) and represents X, Y and Z coordinates for a 3D model of a rotor. (I have the cdb file opened in PyCharm)

* I imagine it can because the txt file produced by mo program is identical in every way to the cdb.
* For now it is, because ultimetaly I want to be able to change values but only the ones after the (3i9,6e21.13e3).
* I know for a fact that this is in the file: the first SC is the original cdb file and you can see it underlined in blue

ibreeden · Sep-17-2021, 05:48 PM

So the file contains a line with "(3i9,6e21.13e3)", we see on the image you show us. But are there perhaps (invisible) spaces after this text? In that case you had better test with:

if line.startswith("(3i9,6e21.13e3)"):

Another thing you are not asking for is this: you emphasize the file is huge. But in your code you use "old_database.readlines()". You must be aware that this means you are loading the complete file in RAM. (Nowadays 80 MB is not huge anymore but when you encounter a real huge file you will run into troubles.) My advice would be to use "readline()" instead of "readlines()" so each time only one line will be read in RAM.
Like this (untested):

with open("ROTOR_EMR4_BY_R15.cdb", "r") as old_database:    # opening files with content manager is good practice as it
    with open("old_output.txt", "r+") as old_copy:          # closes the file automatically at the end of the operations
         # old_content = old_database.readlines()           # this may need a lot of memory
         for line in old_database.readline():
            # line = str(x)                                 # it is already a string
            print(line)
             if line.startswith("(3i9,6e21.13e3)"):
                old_copy.write("EUREKA")
            else:
                old_copy.write(line)

**deanhystad** · Sep-17-2021, 05:50 PM

I think this is your problem:

if line == "(3i9,6e21.13e3)":

Is there any line in the file that is only "(3i9,6e21.13e3)"? I mean a line without any starting whitespace or a trailing newline? I would replace with:

if "(3i9,6e21.13e3)" in line:

This will be True if the text appears anywhere in line.

A few other comments.

with open("ROTOR_EMR4_BY_R15.cdb", "r") as old_database:    # opening files with content manager is good practice as it
    # with open("old_output.txt", "r+") as old_copy:          # closes the file automatically at the end of the operation
    with open("old_output.txt", "w") as old_copy:          # r+ forces file to exist.  Why force unnecessary requirement
 
        # old_content = old_database.readlines()
        for line in old_database:
 
        # for x in old_content:
            # line = str(x)  # already text
            print(line)
 
            if "(3i9,6e21.13e3)" in line:
                old_copy.write("EUREKA")
            else:
                old_copy.write(line)

SamHobbs · Sep-17-2021, 05:50 PM

I see CDB files described many times in Post-Processing APDL Models Inside Ansys Workbench V15.0 - SimuTech Group but I do not see much that seems to help here. I suggest asking questions about Ansys somewhere that specialists can help you. Then when you need helps specific to Python you will have questions that Python specialists can help with. Ansys Learning Forum seems to be a good place to ask about Ansys and its files.

One possible explanation is that the newline characters are different. For example in Windows a newline is CRLF (carriage return and line feed) and in Unix/Linux it is just a CR and in Apple systems it is just a LF. Details like that might be overlooked. You could write an analysis program that just reads the lines and counts how many there are. If the count matches what you expect then using an editor to browse the file you can determine the record number you need. Then modify the analysis program to print just the lines (5 or 10 or so) before and after the line you need.

**Larz60+** · Sep-17-2021, 05:52 PM

open files like so, instead of embedded:

with open("ROTOR_EMR4_BY_R15.cdb", "r") as old_database,
    open("old_output.txt", "r+") as old_copy:

Then, assuming that the "(3i9,6e21.13e3)" is alone by itself on a line:

        for x in old_content:
            line = str(x)
            line = line.strip()
 
            if line == "(3i9,6e21.13e3)":
                ...

You're probably including the line termination in your search, so won't match . That is the reason for the strip

ndc85430 · Sep-18-2021, 10:39 AM

Are there new line characters in the line?

Also, you don't need line 7, as x is already a string (really, what else could it be if you're reading text?). x is, however, a poor name.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Reading an ASCII text file and parsing data...	oradba4u	2	1,465	Jun-08-2024, 12:41 AM Last Post: oradba4u
	problems with reading csv file.	MassiJames	3	2,603	Nov-16-2023, 03:41 PM Last Post: snippsat
	Reading a file name fron a folder on my desktop	Fiona	4	2,121	Aug-23-2023, 11:11 AM Last Post: Axel_Erfurt
	Reading data from excel file –> process it >>then write to another excel output file	Jennifer_Jone	0	2,124	Mar-14-2023, 07:59 PM Last Post: Jennifer_Jone
	Reading a file	JonWayn	3	1,963	Dec-30-2022, 10:18 AM Last Post: ibreeden
	Reading Specific Rows In a CSV File	finndude	3	1,872	Dec-13-2022, 03:19 PM Last Post: finndude
	Excel file reading problem	max70990	1	1,690	Dec-11-2022, 07:00 PM Last Post: deanhystad
	Replace columns indexes reading a XSLX file	Larry1888	2	1,730	Nov-18-2022, 10:16 PM Last Post: Pedroski55
	Failing reading a file and cannot exit it...	tester_V	8	3,414	Aug-19-2022, 10:27 PM Last Post: tester_V
	Reading .csv file	doug2019	4	2,813	Apr-29-2022, 09:55 PM Last Post: deanhystad

In need of insight regarding Python file reading mechanisms.

User Panel Messages

Announcements