Python Forum
In need of insight regarding Python file reading mechanisms.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
In need of insight regarding Python file reading mechanisms.
#1
Question 
I am a novice trying to do a simple thing:

read a huge (80 mb of text, around 500000 lines) database file -> make a copy of it while only changing a select portion of it

For this I have the following code, it's not complete yet because it has already run into a problem:

with open("ROTOR_EMR4_BY_R15.cdb", "r") as old_database:    # opening files with content manager is good practice as it
    with open("old_output.txt", "r+") as old_copy:          # closes the file automatically at the end of the operations

        old_content = old_database.readlines()

        for x in old_content:
            line = str(x)
            print(line)

            if line == "(3i9,6e21.13e3)":
                old_copy.write("EUREKA")
            else:
                old_copy.write(line)
The code "works" as the old_output.txt is created and is identical to the original CDB database.
However, the IF statement is never triggered and EUREKA is nowhere to be found.

My suspicion is that before reading Python changes the format of the database, reads it, then before writing, changes it back to the original. So I added the a print command to see what Python "sees" and, fair enough, it doesn't look anything like the original.
I can't possibly paste the whole thing here so I will use screenshots.

[Image: unknown.png]

This (3i9,6e20.13) is the bit of information I want Python to find and you can clearly see that it's there. It also happens to be the fortran formatting code of the database.

[Image: unknown.png?width=960&height=483]
As you can see, (3i9,6e21.13e3) is nowhere to be found, so of course the IF statement won't trigger!

Would anybody know what formatting python is using and how to change it?
Reply
#2
Some questions:
  • I am not familiar with CDB database, is that Oracle?
  • And can it be read as a flat file as you are trying to do?
  • you are searching for text "(3i9,6e21.13e3)" including parenthesis and all, is that your goal?
  • Do you know for a fact that that is contained in the file?
Reply
#3
(Sep-17-2021, 11:51 AM)Larz60+ Wrote: Some questions:
  • I am not familiar with CDB database, is that Oracle?
  • And can it be read as a flat file as you are trying to do?
  • you are searching for text "(3i9,6e21.13e3)" including parenthesis and all, is that your goal?
  • Do you know for a fact that that is contained in the file?

* I don't know what oracle is. This is a file that was made using ANSYS 15.0.7, a simulations program.
In fact the important information of this database is located after the (3i9,6e21.13e3) and represents X, Y and Z coordinates for a 3D model of a rotor. (I have the cdb file opened in PyCharm)

* I imagine it can because the txt file produced by mo program is identical in every way to the cdb.
* For now it is, because ultimetaly I want to be able to change values but only the ones after the (3i9,6e21.13e3).
* I know for a fact that this is in the file: the first SC is the original cdb file and you can see it underlined in blue
Reply
#4
So the file contains a line with "(3i9,6e21.13e3)", we see on the image you show us. But are there perhaps (invisible) spaces after this text? In that case you had better test with:
if line.startswith("(3i9,6e21.13e3)"):
Another thing you are not asking for is this: you emphasize the file is huge. But in your code you use "old_database.readlines()". You must be aware that this means you are loading the complete file in RAM. (Nowadays 80 MB is not huge anymore but when you encounter a real huge file you will run into troubles.) My advice would be to use "readline()" instead of "readlines()" so each time only one line will be read in RAM.
Like this (untested):
with open("ROTOR_EMR4_BY_R15.cdb", "r") as old_database:    # opening files with content manager is good practice as it
    with open("old_output.txt", "r+") as old_copy:          # closes the file automatically at the end of the operations
         # old_content = old_database.readlines()           # this may need a lot of memory
         for line in old_database.readline():
            # line = str(x)                                 # it is already a string
            print(line)
             if line.startswith("(3i9,6e21.13e3)"):
                old_copy.write("EUREKA")
            else:
                old_copy.write(line)
Reply
#5
I think this is your problem:
if line == "(3i9,6e21.13e3)":
Is there any line in the file that is only "(3i9,6e21.13e3)"? I mean a line without any starting whitespace or a trailing newline? I would replace with:
if "(3i9,6e21.13e3)" in line:
This will be True if the text appears anywhere in line.

A few other comments.
with open("ROTOR_EMR4_BY_R15.cdb", "r") as old_database:    # opening files with content manager is good practice as it
    # with open("old_output.txt", "r+") as old_copy:          # closes the file automatically at the end of the operation
    with open("old_output.txt", "w") as old_copy:          # r+ forces file to exist.  Why force unnecessary requirement
 
        # old_content = old_database.readlines()
        for line in old_database:
 
        # for x in old_content:
            # line = str(x)  # already text
            print(line)
 
            if "(3i9,6e21.13e3)" in line:
                old_copy.write("EUREKA")
            else:
                old_copy.write(line)
Reply
#6
I see CDB files described many times in Post-Processing APDL Models Inside Ansys Workbench V15.0 - SimuTech Group but I do not see much that seems to help here. I suggest asking questions about Ansys somewhere that specialists can help you. Then when you need helps specific to Python you will have questions that Python specialists can help with. Ansys Learning Forum seems to be a good place to ask about Ansys and its files.

One possible explanation is that the newline characters are different. For example in Windows a newline is CRLF (carriage return and line feed) and in Unix/Linux it is just a CR and in Apple systems it is just a LF. Details like that might be overlooked. You could write an analysis program that just reads the lines and counts how many there are. If the count matches what you expect then using an editor to browse the file you can determine the record number you need. Then modify the analysis program to print just the lines (5 or 10 or so) before and after the line you need.
Reply
#7
open files like so, instead of embedded:
with open("ROTOR_EMR4_BY_R15.cdb", "r") as old_database,
    open("old_output.txt", "r+") as old_copy:
Then, assuming that the "(3i9,6e21.13e3)" is alone by itself on a line:
        for x in old_content:
            line = str(x)
            line = line.strip()
 
            if line == "(3i9,6e21.13e3)":
                ...
You're probably including the line termination in your search, so won't match . That is the reason for the strip
Reply
#8
Are there new line characters in the line?

Also, you don't need line 7, as x is already a string (really, what else could it be if you're reading text?). x is, however, a poor name.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Sad problems with reading csv file. MassiJames 3 559 Nov-16-2023, 03:41 PM
Last Post: snippsat
  Reading a file name fron a folder on my desktop Fiona 4 851 Aug-23-2023, 11:11 AM
Last Post: Axel_Erfurt
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,046 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Reading a file JonWayn 3 1,057 Dec-30-2022, 10:18 AM
Last Post: ibreeden
  Reading Specific Rows In a CSV File finndude 3 940 Dec-13-2022, 03:19 PM
Last Post: finndude
  Excel file reading problem max70990 1 865 Dec-11-2022, 07:00 PM
Last Post: deanhystad
  Replace columns indexes reading a XSLX file Larry1888 2 951 Nov-18-2022, 10:16 PM
Last Post: Pedroski55
  Failing reading a file and cannot exit it... tester_V 8 1,753 Aug-19-2022, 10:27 PM
Last Post: tester_V
  Reading .csv file doug2019 4 1,659 Apr-29-2022, 09:55 PM
Last Post: deanhystad
  Reading an Input File DaveG 1 1,213 Mar-27-2022, 02:08 AM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020