Jun-09-2022, 08:23 PM
You are confused. Understandably so.
The problem is as I said. Powershell is putting a bunch of extra stuff in your from_file. If you made your file using notepad or even windows command shell the extra stuff does not get put in the from_file, and you program will produce the expected results. You could also modify your program to understand the extra stuff that powershell adds to the file.
So what is the extra stuff? It is multi-byte character encoding. When powershell writes your text to a file, it writes it as unicode characters. The default encoding (from what I can see) is utf-16. When you open the file in Python without specifying any encoding it assumes the file encoding is utf8. Even though your test string appears to be 20 characters long, when generated using powershell it is actually 44 bytes long. (20 printable characters + carriage return + linefeed) * 2. What your program sees is 20 visible characters, 20 empty characters (0x00) and a confusing mess at the end where it tries to replace carriage and linefeed characters with a single linefeed when reading the file, and converting that back to a carriage and linefeed when writing.
One way to fix this is specify the file encoding when you open the from_file.
Your choice. Modify the file encoding to match your program or change your program encoding to match the file.
I bet your book was written for python 2.
The problem is as I said. Powershell is putting a bunch of extra stuff in your from_file. If you made your file using notepad or even windows command shell the extra stuff does not get put in the from_file, and you program will produce the expected results. You could also modify your program to understand the extra stuff that powershell adds to the file.
So what is the extra stuff? It is multi-byte character encoding. When powershell writes your text to a file, it writes it as unicode characters. The default encoding (from what I can see) is utf-16. When you open the file in Python without specifying any encoding it assumes the file encoding is utf8. Even though your test string appears to be 20 characters long, when generated using powershell it is actually 44 bytes long. (20 printable characters + carriage return + linefeed) * 2. What your program sees is 20 visible characters, 20 empty characters (0x00) and a confusing mess at the end where it tries to replace carriage and linefeed characters with a single linefeed when reading the file, and converting that back to a carriage and linefeed when writing.
One way to fix this is specify the file encoding when you open the from_file.
indata =open(from_file, encoding="utf-16").read()Of course now it will mess up reading files that use utf-8 encoding.
Your choice. Modify the file encoding to match your program or change your program encoding to match the file.
I bet your book was written for python 2.