Posts: 7
Threads: 2
Joined: Oct 2016
Oct-21-2016, 01:20 PM
(This post was last modified: Oct-25-2016, 12:11 PM by metulburr.)
Folks,
I have a difficulty here to manipulate files.
Objective: I'm developing a script that needs to create folders and copy files from a source, the script reads the user the number of months to create. The script replicates the files equal the number of months to be created, if in 2017 need to change a string to a .txt file into the directory.
original string within the file: ANO INICIO DO ESTUDO 2016
String is to be changed: ANO INICIO DO ESTUDO 2017
PS: The file to be changed is not the original but a copy
It is possible to change a line from a txt file?
What I need:
Read program source file and play to an array
identifying a portion of a string within the array
modify the string if it is found to be part of the string
delete the source file and write another with the same name, or simply change a string (a word in a row) within the source file.
FILENAME_NEWAVE = Path of the source file
STRING_DGER = String to be searched
FILE_DATE = Year
This is not working, you are writing in the source file
def find_word_in_file_dger(FILENAME_NEWAVE, STRING_DGER, FILE_DATE):
f = open(FILENAME_NEWAVE, "r+")
file_array = f.readlines()
for i in file_array:
if i.find(STRING_DGER.encode('utf-8')):
f.write(i)
else:
print ("TO LENDO O ARRAY")
if FILE_DATE == "2016":
continue
else:
i.replace(STRING_DGER, "ANO INICIO DO ESTUDO " + FILE_DATE)
f.write(i)
print("TO ESCREVENDO A LINHA CORRETAMENTE MLK!! ")
return i
f.close()
return False
Correct script
Edit:
I have fixed it,mark all code next time an push "Remove Formatting" button.
Then "Insert Python tag" button.
Posts: 687
Threads: 37
Joined: Sep 2016
1) it's easier to understand what you want to do if you use English names for your variables
2) it's less painful on the eyes if you use lowercase
3) I don't expect a function call find_word_in_file() to write the file.
4) on the whole, you should never read and write the same file. There is only one read-write pointer, so once you have done the readlines(), that pointer is at the end of the file, and the lines you write are appended at the end of the file. You could use seek statements to force the pointer where you want it, buy this would work only in this very specific case where you replace a string with a string of the same length. The usual way is to open your source in read mode, and open a second file in write mode (use a temporary name created with tempfile.mkstemp), copy the data over (with possible modifications), close that temporary file, then erase the source file and rename the temporary file(*).
(*) Even safer: close temp, rename source to some temp name, rename temp from source, erase source.
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Posts: 7
Threads: 2
Joined: Oct 2016
Oct-25-2016, 11:53 AM
(This post was last modified: Oct-25-2016, 12:11 PM by metulburr.)
Thanks to the tips. I thought like you, my dificulty it's the python. So, i'm create a program and works. Thanks awnser me!
def dger_ano(FILE_DATE, ORIGEM, DESTINO):
cache = None
string = "ANO INICIO DO ESTUDO 2016" #str(int(FILE_DATE) - 1)
with open(ORIGEM , "r") as f: cache = f.read()
new_file = re.sub(string, "ANO INICIO DO ESTUDO {}".format(FILE_DATE), cache)
if new_file:
with open(DESTINO + "/temporario.txt", "w") as f: f.write(new_file)
os.remove(DESTINO + "/DGER.dat")
os.rename(DESTINO + '/temporario.txt',DESTINO + '/DGER.dat') In Python 2 : has the accent on the file appears this error message, "UnicodeDecodeError: 'ascii' codec can't decode byte 0xba in position 1355: ordinal not in range(128) "
Python 3 : Works!
BUT, i need to work on Python 2
Posts: 687
Threads: 37
Joined: Sep 2016
Oct-25-2016, 01:02 PM
(This post was last modified: Oct-25-2016, 01:04 PM by Ofnuts.)
Fairly well explained here. In Python2, file.read() reads bytes as single-byte characters that may have to real meaning. If you use characters that aren't in the ASCII set (ASCII codes up to 127, which excludes accented characters) you have to use the 'unicode' type that behaves like a string but can contain non-ASCII characters. To go from the string of single byte to unicode you decode it:
# read in the file contents
iso=open('iso-8859-15.txt').read()
utf=open('utf-8.txt').read()
# this is how they look, one <str> character for each byte in the source file
print 'ISO:', repr(iso)
print 'UTF:', repr(utf)
# transform them to unicode, specifying the appropriate encoding
unicodeISO=unicode(iso,encoding='iso-8859-15')
unicodeUTF=unicode(utf,encoding='UTF-8')
# Now, as unicode strings, they are identical
print repr(unicodeISO),unicodeISO
print repr(unicodeUTF),unicodeUTF (the two data files attached)
You may wonder why the Unicode string looks like the ISO one. It's an optical illusion. Of course the people who defined Unicode didn't completely reinvent the wheel, and integrated as many existing encoding as feasible. So the numbers that encode characters in ISO-8859 and Unicode can be the same. However, the first is a one-byte 0xe9 and the second is really the Unicode +00E9.
Needless to say, this means that you have to know in advance the encoding used to encode the files... On the other hand, there aren't that many encodings for Romance languages, so it will likely be either UTF-8 or some variant of ISO-8859.
Attached Files
dejavu.zip (Size: 369 bytes / Downloads: 502)
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Posts: 7
Threads: 2
Joined: Oct 2016
Oct-31-2016, 06:37 PM
(This post was last modified: Oct-31-2016, 06:46 PM by snippsat.)
Hi, i can't make this work. I use your code to read my file, works:
# read in the file contents
iso = open('E:/ENEL/Modelos/NW201610/DGER.dat').read()
utf = open('E:/ENEL/Modelos/NW201610/DGER.dat').read()
# this is how they look, one <str> character for each byte in the source file
print 'ISO:', repr(iso)
print 'UTF:', repr(utf)
# transform them to unicode, specifying the appropriate encoding
unicodeISO = unicode(iso, encoding='iso-8859-15')
#unicodeUTF = unicode(utf, encoding='UTF-8')
# Now, as unicode strings, they are identical
print repr(unicodeISO), unicodeISO
#print repr(unicodeUTF), unicodeUTF But my original function does not work:
def dger_ano(FILE_ORIGEM, FILE_DATE, ORIGEM, DESTINO): #ORIGEM: ../file.txt | DESTINO: ../../
cache = None
string = "YEAR " + FILE_ORIGEM
with open(ORIGEM , "r") as f: cache = f.read()
unicodeISO = unicode(cache, encoding='iso-8859-15')
print ('ISO:', repr(unicodeISO))
new_file = re.sub(string, "ANO INICIO DO ESTUDO {}".format(FILE_DATE), cache)
if new_file:
with open(DESTINO + "/temporario.txt", "w") as unicodeISO: unicodeISO.write(new_file)
os.remove(DESTINO + "/DGER.dat")
os.rename(DESTINO + '/temporario.txt',DESTINO + '/DGER.dat') Edit adim:
Mark all code an push "Remove Formatting" button next time.
Posts: 7,312
Threads: 123
Joined: Sep 2016
Oct-31-2016, 07:16 PM
(This post was last modified: Oct-31-2016, 07:16 PM by snippsat.)
The rule for Unicode is same encoding all the way in and out.
So for Python 2.x use codecs or newer io,Python 3.x has this build in.
Set utf-8 in first line,Python 2.x has ASCII as default encoding.
Then it look like this.
Test input iso.txt: Déjà vu peut-être...
# -*- coding: utf-8 -*-
import codecs
with codecs.open("iso.txt", encoding='utf-8') as f:
uni = f.read()
with codecs.open("iso_out.txt", 'w', encoding='utf-8') as f_out:
f_out.write(uni) iso_out.txt:
Output: Déjà vu peut-être...
Posts: 7
Threads: 2
Joined: Oct 2016
Snippsat, thanks to respond me.
I tried that:
import codecs
with codecs.open("E:/ENEL/Modelos/NW201610/DGER.dat", encoding='utf-8') as f:
uni = f.read()
with codecs.open("iso_out.txt", 'w', encoding='utf-8') as f_out:
f_out.write(uni) Error: newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 1388: invalid start byte
I put my file on attachment.
So, i just changed that: encoding='utf-8' for that: encoding='iso-8859-15' and it works!!!!
I don't know why, but it work.
Thanks snippsat, Ofnuts!
|