Python Forum
Manipulating files Python 2.7
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Manipulating files Python 2.7
#1
Folks,

I have a difficulty here to manipulate files.

Objective: I'm developing a script that needs to create folders and copy files from a source, the script reads the user the number of months to create. The script replicates the files equal the number of months to be created, if in 2017 need to change a string to a .txt file into the directory.

original string within the file: ANO INICIO DO ESTUDO 2016
String is to be changed: ANO INICIO DO ESTUDO 2017

PS: The file to be changed is not the original but a copy

It is possible to change a line from a txt file?

What I need:

Read program source file and play to an array
identifying a portion of a string within the array
modify the string if it is found to be part of the string
delete the source file and write another with the same name, or simply change a string (a word in a row) within the source file.
FILENAME_NEWAVE = Path of the source file
STRING_DGER = String to be searched
FILE_DATE = Year

This is not working, you are writing in the source file
def find_word_in_file_dger(FILENAME_NEWAVE, STRING_DGER, FILE_DATE):
   f = open(FILENAME_NEWAVE, "r+")
   file_array = f.readlines()
   for i in file_array:
       if i.find(STRING_DGER.encode('utf-8')):
           f.write(i)
       else:
           print ("TO LENDO O ARRAY")
           if FILE_DATE == "2016":
               continue
           else:
               i.replace(STRING_DGER, "ANO INICIO DO ESTUDO " + FILE_DATE)
               f.write(i)
               print("TO ESCREVENDO A LINHA CORRETAMENTE MLK!! ")
           return i
   f.close()
   return False

Correct script
Edit:
I have fixed it,mark all code next time an push "Remove Formatting" button.
Then "Insert Python tag" button.
Reply
#2
1) it's easier to understand what you want to do if you use English names for your variables
2) it's less painful on the eyes if you use lowercase
3) I don't expect a function call find_word_in_file() to write the file.
4) on the whole, you should never read and write the same file. There is only one read-write pointer, so once you have done the readlines(), that pointer is at the end of the file, and the lines you write are appended at the end of the file. You could use seek statements to force the pointer where you want it, buy this would work only in this very specific case where you replace a string with a string of the same length. The usual way is to open your source in read mode, and open a second file in write mode (use a temporary name created with tempfile.mkstemp), copy the data over (with possible modifications), close that temporary file, then erase the source file and rename the temporary file(*).

(*) Even safer: close temp, rename source to some temp name, rename temp from source, erase source.
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Reply
#3
Thanks to the tips. I thought like you, my dificulty it's the python. So, i'm create a program and works. Thanks awnser me!

def dger_ano(FILE_DATE, ORIGEM, DESTINO):
    cache = None
    string = "ANO INICIO DO ESTUDO 2016"  #str(int(FILE_DATE) - 1)
    with open(ORIGEM , "r") as f: cache = f.read()
    new_file = re.sub(string, "ANO INICIO DO ESTUDO {}".format(FILE_DATE), cache)
    if new_file:
        with open(DESTINO + "/temporario.txt", "w") as f: f.write(new_file)
    os.remove(DESTINO + "/DGER.dat")
    os.rename(DESTINO + '/temporario.txt',DESTINO + '/DGER.dat')
In Python 2 : has the accent on the file appears this error message, "UnicodeDecodeError: 'ascii' codec can't decode byte 0xba in position 1355: ordinal not in range(128) "
Python 3 : Works!

BUT, i need to work on Python 2
Reply
#4
Fairly well explained here. In Python2, file.read() reads bytes as single-byte characters that may have to real meaning. If you use characters that aren't in the ASCII set (ASCII codes up to 127, which excludes accented characters) you have to use the 'unicode' type that behaves like a string but can contain non-ASCII characters. To go from the string of single byte to unicode you decode it:
# read in the file contents
iso=open('iso-8859-15.txt').read()
utf=open('utf-8.txt').read()

# this is how they look, one <str> character for each byte in the source file
print 'ISO:', repr(iso)
print 'UTF:', repr(utf)

# transform them to unicode, specifying the appropriate encoding
unicodeISO=unicode(iso,encoding='iso-8859-15')
unicodeUTF=unicode(utf,encoding='UTF-8')

# Now, as unicode strings, they are identical
print repr(unicodeISO),unicodeISO
print repr(unicodeUTF),unicodeUTF
(the two data files attached)

You may wonder why the Unicode string looks like the ISO one. It's an optical illusion. Of course the people who defined Unicode didn't completely reinvent the wheel, and integrated as many existing encoding as feasible. So the numbers that encode characters in ISO-8859 and Unicode can be the same. However, the first is a one-byte 0xe9 and the second is really the Unicode +00E9.

Needless to say, this means that you have to know in advance the encoding used to encode the files... On the other hand, there aren't that many encodings for Romance languages, so it will likely be either UTF-8 or some variant of ISO-8859.

Attached Files

.zip   dejavu.zip (Size: 369 bytes / Downloads: 289)
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Reply
#5
Hi, i can't make this work. I use your code to read my file, works:

# read in the file contents
iso = open('E:/ENEL/Modelos/NW201610/DGER.dat').read()
utf = open('E:/ENEL/Modelos/NW201610/DGER.dat').read()

# this is how they look, one <str> character for each byte in the source file
print 'ISO:', repr(iso)
print 'UTF:', repr(utf)

# transform them to unicode, specifying the appropriate encoding
unicodeISO = unicode(iso, encoding='iso-8859-15')
#unicodeUTF = unicode(utf, encoding='UTF-8')

# Now, as unicode strings, they are identical
print repr(unicodeISO), unicodeISO
#print repr(unicodeUTF), unicodeUTF
But my original function does not work:

def dger_ano(FILE_ORIGEM, FILE_DATE, ORIGEM, DESTINO): #ORIGEM: ../file.txt | DESTINO: ../../
   cache = None
   string = "YEAR " + FILE_ORIGEM  
   with open(ORIGEM , "r") as f: cache = f.read()
   unicodeISO = unicode(cache, encoding='iso-8859-15')

   print ('ISO:', repr(unicodeISO))
   new_file = re.sub(string, "ANO INICIO DO ESTUDO {}".format(FILE_DATE), cache)
   if new_file:
       with open(DESTINO + "/temporario.txt", "w") as unicodeISO: unicodeISO.write(new_file)
   os.remove(DESTINO + "/DGER.dat")
   os.rename(DESTINO + '/temporario.txt',DESTINO + '/DGER.dat')
Edit adim:
Mark all code an push "Remove Formatting" button next time.
Reply
#6
The rule for Unicode is same encoding all the way in and out.
So for Python 2.x use codecs or newer io,Python 3.x has this build in.
Set utf-8 in first line,Python 2.x has ASCII as default encoding.
Then it look like this.
Test input iso.txt: Déjà vu peut-être...
# -*- coding: utf-8 -*-
import codecs

with codecs.open("iso.txt", encoding='utf-8') as f:
   uni = f.read()

with codecs.open("iso_out.txt", 'w', encoding='utf-8') as f_out:
   f_out.write(uni)
iso_out.txt:
Output:
Déjà vu peut-être...
Reply
#7
Snippsat, thanks to respond me.

I tried that:

import codecs

with codecs.open("E:/ENEL/Modelos/NW201610/DGER.dat", encoding='utf-8') as f:
   uni = f.read()

with codecs.open("iso_out.txt", 'w', encoding='utf-8') as f_out:
   f_out.write(uni)
Error:
newchars, decodedbytes = self.decode(data, self.errors) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 1388: invalid start byte
I put my file on attachment.


So, i just changed that: encoding='utf-8' for that: encoding='iso-8859-15' and it works!!!!

I don't know why, but it work.

Thanks snippsatOfnuts!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Manipulating data from a CSV EvanS1 5 2,679 Jun-12-2020, 05:59 PM
Last Post: perfringo
  manipulating two lists rancans 8 3,124 Apr-16-2020, 06:00 PM
Last Post: deanhystad
  Manipulating index value, what is wrong with this code? Emun 1 1,729 Feb-05-2020, 07:18 AM
Last Post: perfringo
  Manipulating the filename of an output script mckinneycm 4 11,828 Jan-15-2020, 07:29 PM
Last Post: mckinneycm
  Manipulating Excel with Python. Spacely 2 3,596 Jun-25-2019, 01:57 AM
Last Post: Dequanharrison
  Manipulating CSV Prince_Bhatia 1 1,925 Apr-25-2019, 11:55 AM
Last Post: Gribouillis
  Reading and manipulating csv Prince_Bhatia 11 4,975 Mar-14-2019, 11:40 AM
Last Post: Larz60+
  Manipulating an Excel Workbook Stanimal 4 2,957 Jan-18-2019, 11:03 PM
Last Post: Stanimal
  Running a python tool transforming xml files into epub files silfer 7 5,316 May-10-2018, 03:49 PM
Last Post: snippsat
  Manipulating Binary Data arsenal88 10 8,502 Apr-25-2017, 02:30 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020