#!/usr/bin/python
# Creating an output file in writing mode
output_file = open("newfile.txt", "w")
# write 3 header records
output_file.write('<?xml version="1.0" encoding="utf-8"?>\n')
output_file.write("<!DOCTYPE KMYMONEY-FILE>\n")
output_file.write("<KMYMONEY-FILE>\n")
write_flag = 0
# Open the file in read mode
with open('Australian-2024-11-30.xml', 'r') as file:
# Read each line in the file
for line in file:
string = line
sub_str1 = "<TRANSACTIONS"
sub_str2 = " <SCHEDULES count"
if sub_str1 in string:
print("YES")
write_flag = 1 #commence writing to newfile.txt
elif sub_str2 in string:
write_flag = 0 #stop writing when this string found
print("schedules found")
if write_flag:
output_file.write(file.read())
# Close the output file
output_file.close()
The output has all the "<TRANSACTIONS" tag and associated children, BUT it also has all the "<SCHEDULES" tag , plus all data after that. The variable "write_flag" is not being turned off, despite the fact that the "schedules" tag is present ?
In the data, there is only
one occurence of "sub_str1" and "sub_str2". So the writes to the output get turned ON at sub_str1 and then turned OFF at sub_str2. But once that flag is on, it stays on, which suggests the
elif sub_str2 in string:
is not being tested. Or is being tested, yet returns false.
Your code does not look for “<SCHEDULES”. Maybe remove the leading blank and count from sub_str2.
But the real problem is using read(). output_file.write(file.read()) is the last command executed in the loop. It reads the remainder of file and writes that to the output file. It also moves the file pointer to the end of file, ending the loop. I think you might want to do this:
with open("input.txt", "r") as file, open("output.txt", "w") as output_file:
writing = False
for line in file:
if "<TRANSACTIONS" in line:
writing = True
elif "<SCHEDULES" in line:
writing = False
elif writing:
output_file.write(line)
When I run using this as the input.txt file:
Output:
A
<TRANSACTIONS
C
D
<SCHEDULES
F
I get this in the output.txt file
Output:
C
D
Thanks @
deanhystad , that code works just fine. Only a few extra lines as an XML requirement with BeautifulSoup. I have used the output file as input to other Python code, and the accounts now balance. Which they didn't do before, as the 'transactions' within schedules was altering totals.
#!/usr/bin/python
# Re-write the XML file - issues with BeautifulSoup finding "TRANSACTIONS" within schedules
with open("Australian-2024-11-30.xml", "r") as file, open("output.txt", "w") as output_file:
# write 3 header records, otherwise BeautifulSoup doesn't recognise the output file as XML'
output_file.write('<?xml version="1.0" encoding="utf-8"?>\n')
output_file.write("<!DOCTYPE KMYMONEY-FILE>\n")
output_file.write("<KMYMONEY-FILE>\n")
writing = False
for line in file:
if "<TRANSACTIONS" in line: #required
writing = True
elif "<SCHEDULES" in line: #not requred
writing = False
elif writing:
output_file.write(line)
Quote: 'transactions' within schedules was altering totals
I think an xml parser would be a better choice for filtering out scheduled transactions.
(Dec-03-2024, 03:55 PM)deanhystad Wrote: [ -> ]I think an xml parser would be a better choice for filtering out scheduled transactions.
Using a parser for this part of the project was the
reason why I needed to re-write the file. The problem was a limiting one, in that to effectively 'filter', there was a need to 'chase' the parents. However the parent level in both sets of data was very different. The KIS method to first re-write the file as per code above, and
then use BeautifulSoup on the second parse.