Manipulating files Python 2.7

hugobaur · (This post was last modified: Oct-25-2016, 12:11 PM by metulburr.)

Folks,

I have a difficulty here to manipulate files.

Objective: I'm developing a script that needs to create folders and copy files from a source, the script reads the user the number of months to create. The script replicates the files equal the number of months to be created, if in 2017 need to change a string to a .txt file into the directory.

original string within the file: ANO INICIO DO ESTUDO 2016
String is to be changed: ANO INICIO DO ESTUDO 2017

PS: The file to be changed is not the original but a copy

It is possible to change a line from a txt file?

What I need:

Read program source file and play to an array
identifying a portion of a string within the array
modify the string if it is found to be part of the string
delete the source file and write another with the same name, or simply change a string (a word in a row) within the source file.
FILENAME_NEWAVE = Path of the source file
STRING_DGER = String to be searched
FILE_DATE = Year

This is not working, you are writing in the source file

def find_word_in_file_dger(FILENAME_NEWAVE, STRING_DGER, FILE_DATE):
   f = open(FILENAME_NEWAVE, "r+")
   file_array = f.readlines()
   for i in file_array:
       if i.find(STRING_DGER.encode('utf-8')):
           f.write(i)
       else:
           print ("TO LENDO O ARRAY")
           if FILE_DATE == "2016":
               continue
           else:
               i.replace(STRING_DGER, "ANO INICIO DO ESTUDO " + FILE_DATE)
               f.write(i)
               print("TO ESCREVENDO A LINHA CORRETAMENTE MLK!! ")
           return i
   f.close()
   return False

Correct script
Edit:
I have fixed it,mark all code next time an push "Remove Formatting" button.
Then "Insert Python tag" button.

***Ofnuts*** · Oct-21-2016, 04:53 PM

1) it's easier to understand what you want to do if you use English names for your variables
2) it's less painful on the eyes if you use lowercase
3) I don't expect a function call find_word_in_file() to write the file.
4) on the whole, you should never read and write the same file. There is only one read-write pointer, so once you have done the readlines(), that pointer is at the end of the file, and the lines you write are appended at the end of the file. You could use seek statements to force the pointer where you want it, buy this would work only in this very specific case where you replace a string with a string of the same length. The usual way is to open your source in read mode, and open a second file in write mode (use a temporary name created with tempfile.mkstemp), copy the data over (with possible modifications), close that temporary file, then erase the source file and rename the temporary file(*).

(*) Even safer: close temp, rename source to some temp name, rename temp from source, erase source.

hugobaur · (This post was last modified: Oct-25-2016, 12:11 PM by metulburr.)

Thanks to the tips. I thought like you, my dificulty it's the python. So, i'm create a program and works. Thanks awnser me!

def dger_ano(FILE_DATE, ORIGEM, DESTINO):
    cache = None
    string = "ANO INICIO DO ESTUDO 2016"  #str(int(FILE_DATE) - 1)
    with open(ORIGEM , "r") as f: cache = f.read()
    new_file = re.sub(string, "ANO INICIO DO ESTUDO {}".format(FILE_DATE), cache)
    if new_file:
        with open(DESTINO + "/temporario.txt", "w") as f: f.write(new_file)
    os.remove(DESTINO + "/DGER.dat")
    os.rename(DESTINO + '/temporario.txt',DESTINO + '/DGER.dat')

In Python 2 : has the accent on the file appears this error message, "UnicodeDecodeError: 'ascii' codec can't decode byte 0xba in position 1355: ordinal not in range(128) "
Python 3 : Works!

BUT, i need to work on Python 2

***Ofnuts*** · (This post was last modified: Oct-25-2016, 01:04 PM by Ofnuts.)

Fairly well explained here. In Python2, file.read() reads bytes as single-byte characters that may have to real meaning. If you use characters that aren't in the ASCII set (ASCII codes up to 127, which excludes accented characters) you have to use the 'unicode' type that behaves like a string but can contain non-ASCII characters. To go from the string of single byte to unicode you decode it:

# read in the file contents
iso=open('iso-8859-15.txt').read()
utf=open('utf-8.txt').read()

# this is how they look, one <str> character for each byte in the source file
print 'ISO:', repr(iso)
print 'UTF:', repr(utf)

# transform them to unicode, specifying the appropriate encoding
unicodeISO=unicode(iso,encoding='iso-8859-15')
unicodeUTF=unicode(utf,encoding='UTF-8')

# Now, as unicode strings, they are identical
print repr(unicodeISO),unicodeISO
print repr(unicodeUTF),unicodeUTF

(the two data files attached)

You may wonder why the Unicode string looks like the ISO one. It's an optical illusion. Of course the people who defined Unicode didn't completely reinvent the wheel, and integrated as many existing encoding as feasible. So the numbers that encode characters in ISO-8859 and Unicode can be the same. However, the first is a one-byte 0xe9 and the second is really the Unicode +00E9.

Needless to say, this means that you have to know in advance the encoding used to encode the files... On the other hand, there aren't that many encodings for Romance languages, so it will likely be either UTF-8 or some variant of ISO-8859.

hugobaur · (This post was last modified: Oct-31-2016, 06:46 PM by snippsat.)

Hi, i can't make this work. I use your code to read my file, works:

# read in the file contents
iso = open('E:/ENEL/Modelos/NW201610/DGER.dat').read()
utf = open('E:/ENEL/Modelos/NW201610/DGER.dat').read()

# this is how they look, one <str> character for each byte in the source file
print 'ISO:', repr(iso)
print 'UTF:', repr(utf)

# transform them to unicode, specifying the appropriate encoding
unicodeISO = unicode(iso, encoding='iso-8859-15')
#unicodeUTF = unicode(utf, encoding='UTF-8')

# Now, as unicode strings, they are identical
print repr(unicodeISO), unicodeISO
#print repr(unicodeUTF), unicodeUTF

But my original function does not work:

def dger_ano(FILE_ORIGEM, FILE_DATE, ORIGEM, DESTINO): #ORIGEM: ../file.txt | DESTINO: ../../
   cache = None
   string = "YEAR " + FILE_ORIGEM  
   with open(ORIGEM , "r") as f: cache = f.read()
   unicodeISO = unicode(cache, encoding='iso-8859-15')

   print ('ISO:', repr(unicodeISO))
   new_file = re.sub(string, "ANO INICIO DO ESTUDO {}".format(FILE_DATE), cache)
   if new_file:
       with open(DESTINO + "/temporario.txt", "w") as unicodeISO: unicodeISO.write(new_file)
   os.remove(DESTINO + "/DGER.dat")
   os.rename(DESTINO + '/temporario.txt',DESTINO + '/DGER.dat')

Edit adim:
Mark all code an push "Remove Formatting" button next time.

***snippsat*** · (This post was last modified: Oct-31-2016, 07:16 PM by snippsat.)

The rule for Unicode is same encoding all the way in and out.
So for Python 2.x use codecs or newer io,Python 3.x has this build in.
Set utf-8 in first line,Python 2.x has ASCII as default encoding.
Then it look like this.
Test input iso.txt: Déjà vu peut-être...

# -*- coding: utf-8 -*-
import codecs

with codecs.open("iso.txt", encoding='utf-8') as f:
   uni = f.read()

with codecs.open("iso_out.txt", 'w', encoding='utf-8') as f_out:
   f_out.write(uni)

iso_out.txt:

Output:
Déjà vu peut-être...

hugobaur · Nov-01-2016, 12:28 PM

Snippsat, thanks to respond me.

I tried that:

import codecs

with codecs.open("E:/ENEL/Modelos/NW201610/DGER.dat", encoding='utf-8') as f:
   uni = f.read()

with codecs.open("iso_out.txt", 'w', encoding='utf-8') as f_out:
   f_out.write(uni)

Error:newchars, decodedbytes = self.decode(data, self.errors)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 1388: invalid start byte

I put my file on attachment.

So, i just changed that: encoding='utf-8' for that: encoding='iso-8859-15' and it works!!!!

I don't know why, but it work.

Thanks snippsat, Ofnuts!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Manipulating data from a CSV	EvanS1	5	3,801	Jun-12-2020, 05:59 PM Last Post: perfringo
	manipulating two lists	rancans	8	4,780	Apr-16-2020, 06:00 PM Last Post: deanhystad
	Manipulating index value, what is wrong with this code?	Emun	1	2,474	Feb-05-2020, 07:18 AM Last Post: perfringo
	Manipulating the filename of an output script	mckinneycm	4	13,211	Jan-15-2020, 07:29 PM Last Post: mckinneycm
	Manipulating Excel with Python.	Spacely	2	5,372	Jun-25-2019, 01:57 AM Last Post: Dequanharrison
	Manipulating CSV	Prince_Bhatia	1	2,544	Apr-25-2019, 11:55 AM Last Post: Gribouillis
	Reading and manipulating csv	Prince_Bhatia	11	7,267	Mar-14-2019, 11:40 AM Last Post: Larz60+
	Manipulating an Excel Workbook	Stanimal	4	4,282	Jan-18-2019, 11:03 PM Last Post: Stanimal
	Running a python tool transforming xml files into epub files	silfer	7	7,221	May-10-2018, 03:49 PM Last Post: snippsat
	Manipulating Binary Data	arsenal88	10	10,595	Apr-25-2017, 02:30 PM Last Post: snippsat

Manipulating files Python 2.7

User Panel Messages

Announcements