Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regarding file editing
#1
Hello! new user here, and a beginner Python student, studying with the book "Learn Python the Hard Way"
I reached an exercise where my script opens a txt file, reads the contents, copies them into a new file.
The problem I reached is this, in the new_file.txt my script creates, it includes these strange characters at the end of the text, which do not exist in the original txt file.
Please bear in mind this is my first post on the forum, if I did something wrong, I apologize!

My code:

from sys import argv
from os.path import exists

script, from_file, to_file = argv

print(f"Copying from {from_file} to {to_file}")

# we could do these two on one line, how?
#in_file = open(from_file) ; indata = in_file.read()
indata = open(from_file).read()

print(f"The input file is {len(indata)} bytes long")

print(f"Does the output file exist? {exists(to_file)}")
print("Ready, hit RETURN to continue, CTRL-C to abort.")
input()

out_file = open(to_file, 'w')
out_file.write(indata)

print("Alright, all done")

out_file.close()
#in_file.close()
The strange characters, as can be seen in VS Code, PowerShell and notepad:

[Image: sePqcFv]

(previewing the post, it seems the image doesn't load, so I attach it as a file too just in case)

So, I'm trying to understand what causes these characters to appear in the second file, and what did I do wrong, if anything...
Thank you :)

Attached Files

Thumbnail(s)
   
Reply
#2
I do not get those characters when I run the code. Are they in the original file (from_file)?
Reply
#3
They are not, and only a few moments ago I had the idea of trying to create this again, so I have more information!
These characters only appear when my script copies text from a txt file created through PowerShell (as instructed by the book I follow) using the command: echo "This is a test file." > test.txt
If I create a .txt file through Notepad and run the script using that one as the source, the new file created does not end with these characters.
Any idea what could cause this? Could this be related to my system also having another language other than English even though I did not type anything in that language?
Reply
#4
What you are saying is "Yes, that is the problem. I added those characters using the command: "This is a test file." > test.txt. If from_file looks like to_file the program works and you should just move on to the next exercise and not worry about why power shell added something to the file that you did not intend.
Reply
#5
Your reply confuses me a bit, I'll try to clarify.
I create a txt file through PowerShell, I print this file, it has no characters at the end.
Only the files that my script creates, that were copied from txt files created by PowerShell contain the characters at the end of the line.
Reply
#6
You are confused. Understandably so.

The problem is as I said. Powershell is putting a bunch of extra stuff in your from_file. If you made your file using notepad or even windows command shell the extra stuff does not get put in the from_file, and you program will produce the expected results. You could also modify your program to understand the extra stuff that powershell adds to the file.

So what is the extra stuff? It is multi-byte character encoding. When powershell writes your text to a file, it writes it as unicode characters. The default encoding (from what I can see) is utf-16. When you open the file in Python without specifying any encoding it assumes the file encoding is utf8. Even though your test string appears to be 20 characters long, when generated using powershell it is actually 44 bytes long. (20 printable characters + carriage return + linefeed) * 2. What your program sees is 20 visible characters, 20 empty characters (0x00) and a confusing mess at the end where it tries to replace carriage and linefeed characters with a single linefeed when reading the file, and converting that back to a carriage and linefeed when writing.

One way to fix this is specify the file encoding when you open the from_file.
indata =open(from_file, encoding="utf-16").read()
Of course now it will mess up reading files that use utf-8 encoding.

Your choice. Modify the file encoding to match your program or change your program encoding to match the file.

I bet your book was written for python 2.
Reply
#7
Thank you very much! I almost understand it all hehe.
But, how do I change the file encoding? I mean, it's what PowerShell creates, with a simple "echo" command... is there a way to tell it to create that text file using a different encoding?

The book is for python 3.
Reply
#8
That is definately a Python 2 program "updated" for Python 3. Forget about using powershell, or better yet forget about this particular exercise which is kinda dumb anyway.
Reply
#9
Seeing as I am a complete beginner in programming, these dumb exercises teach me a lot, at least I'd like to think so...
What is another way to test out my .py files if not through PowerShell?
Reply
#10
Quote:What is another way to test out my .py files if not through PowerShell?

Use an IDE:
google: Python IDE'

After trying several, I settled on VS Code (have been using for several years now)
see: VS Code from start

Try a few and choose one that suits you.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  xml file editing with lxml.etree FlavioBueno 2 689 Jun-09-2023, 02:00 PM
Last Post: FlavioBueno
  pdf file processing: how to "Enable Editing" Pavel_47 4 3,215 Dec-04-2019, 10:00 AM
Last Post: Pavel_47
  Parsing and Editing a Structured Text File norsemanGrey 1 2,437 Jul-11-2018, 09:51 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020