Python Forum
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 34: character
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 34: character
#1
hello. I donit know what to to with this error:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 34: character maps to <undefined>
This is the Python Code:

import fileinput
import glob
import os
import re
with open('c:\\Folder6\\merged.txt', 'w', encoding='UTF-8') as f:

        for line in fileinput.input(sorted(glob.glob('c:\\Folder6\\*.txt'))):
            f.write(line)
        fileinput.close()
        print(f)
And this is the ERROR:

Traceback (most recent call last):
  File "E:\Carte\BB\17 - Site Leadership\alte\Ionel Balauta\Aryeht\Task 1 - Traduce tot site-ul\Doar Google Web\Andreea\Meditatii\Sedinta 31 august 2022\merge txt - versiune 3 .py", line 8, in <module>
    for line in fileinput.input(sorted(glob.glob('c:\\Folder6\\*.txt'))):
  File "C:\Program Files\Python39\lib\fileinput.py", line 256, in __next__
    line = self._readline()
  File "C:\Program Files\Python39\lib\fileinput.py", line 389, in _readline
    return self._readline()
  File "C:\Program Files\Python39\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 34: character maps to <undefined>
This is a print screen:

[Image: McUwGS.jpg]

What can I do, as not to apear that error again? Can anyone help me?
Reply
#2
You should set the encoding when you read the file (fileinput). Windows must thing it is something other than utf-8.
Reply
#3
hello, sir. Thank you for answer.

Can you modify my script as to work with your solution? I don't know Python very good...I am a beginner.

I don't know if this is good, as I make it now. Doesn't do nothing...but have no error..

import fileinput
import glob
import os
import re
 
def read_text_from_file(file_path):
 
    with open(file_path, encoding='utf8') as f:
        text = f.read()
        return text
 
def write_to_file(text, file_path):
 
    with open(file_path, 'wb') as f:
        f.write(text.encode('utf8', 'ignore'))
     
    with open('c:\\Folder6\\translated\\merged.txt', 'w', encoding='UTF-8') as f:
  
        for file_name in sorted(glob.glob('c:\\Folder6\\translated\\*.txt')):
            contents = read_text_from_file(file_name)
            f.write(line)
        fileinput.close()
        print(f)
OR, SECOND VERSION:

import fileinput
import glob
import os
import re

with open('c:\\Folder6\\translated\\merged.txt', 'w', encoding='UTF-8') as f:
         current_content = f.read()
         modified = new_content != current_content
if modified and args.diff:
 
        for line in fileinput.input(sorted(glob.glob('c:\\Folder6\\translated\\*.txt'))) :
            f.write(line)
        fileinput.close()
        print(f)
OR, 3' SOLUTION

import fileinput
import glob
import os
import re
 

     
read_files = sorted(glob.glob("c:\\Folder6\\translated\\merged.txt\\*.txt"))

with open("c:\\Folder6\\translated\\merged.txt", "wb") as outfile:
    for f in read_files:
        with open(f, "rb") as infile:
            outfile.write(infile.read())
        fileinput.close()
        print(f)
None of them works. It creates the file, but does not write it
Reply
#4
I found a solution:

import fileinput
import glob
import os
import re
 
def read_files(file_path):
 
    with open(file_path, encoding='utf8') as f:
        text = f.read()
        return text
 
def read_files(text, file_path):
 
    with open(file_path, 'rb') as f:
        f.write(text.encode('utf8', 'ignore'))
        
     
read_files = sorted(glob.glob("c:\\Folder6\\translated\\*.txt"))

with open("c:\\Folder6\\translated\\merged.txt", "wb") as outfile:
    for f in read_files:
        with open(f, "rb") as infile:
            outfile.write(infile.read())
            outfile.write(b"\n\n")

            
        fileinput.close()
        print(f)
Reply
#5
In your first code it would be like this.
import fileinput
import glob
import os

with open('c:\\Folder6\\merged.txt', 'w', encoding='UTF-8') as f:
    for line in fileinput.input(sorted(glob.glob('c:\\Folder6\\*.txt')), encoding="utf-8"):
        print(line)
        f.write(line)
    fileinput.close()
This need Python 3.10 to work as in fileinput doc
Quote:Changed in version 3.10: The keyword-only parameter encoding and errors are added.
Melcu54 likes this post
Reply
#6
(Sep-26-2022, 09:09 AM)snippsat Wrote: In your first code it would be like this.
import fileinput
import glob
import os

with open('c:\\Folder6\\merged.txt', 'w', encoding='UTF-8') as f:
    for line in fileinput.input(sorted(glob.glob('c:\\Folder6\\*.txt')), encoding="utf-8"):
        print(line)
        f.write(line)
    fileinput.close()
This need Python 3.10 to work as in fileinput doc
Quote:Changed in version 3.10: The keyword-only parameter encoding and errors are added.

ok, thanks. But if I want to put an [b]f.write("\n\n") in order to have a dividing line between the files, where should I put it?[/b]
Reply
#7
(Sep-26-2022, 09:25 AM)Melcu54 Wrote: But if I want to put an [b]f.write("\n\n") in order to have a dividing line between the files
Change line 8:
f.write(f'{line}\n\n')
Reply
#8
(Sep-26-2022, 09:38 AM)snippsat Wrote: Change line 8:
f.write(f'{line}\n\n')

I try also this. But, in this case, will double all my lines from all text files, into one file.

See the duplicate lines after using your code (is better with f.write('\n')) , except this will put a new empty lines between all paragraphs)

[Image: zCgDSZ.jpg]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 562: ord ctrldan 23 4,600 Apr-24-2023, 03:40 PM
Last Post: ctrldan
  UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 16: invalid cont Melcu54 3 4,697 Mar-26-2023, 12:12 PM
Last Post: Gribouillis
  Decode string ? JohnnyCoffee 1 785 Jan-11-2023, 12:29 AM
Last Post: bowlofred
  [SOLVED] [Debian] UnicodeEncodeError: 'ascii' codec Winfried 1 988 Nov-16-2022, 11:41 AM
Last Post: Winfried
  UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in position 14: ordin Armandito 6 2,642 Apr-29-2022, 12:36 PM
Last Post: Armandito
  ASCII-Codec in Python3 [SOLVED] AlphaInc 4 5,984 Jul-07-2021, 07:05 PM
Last Post: AlphaInc
  [solved] unexpected character after line continuation character paul18fr 4 3,292 Jun-22-2021, 03:22 PM
Last Post: deanhystad
  python error: bad character range \|-t at position 12 faustineaiden 0 3,642 May-28-2021, 09:38 AM
Last Post: faustineaiden
  UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 error from Mysql call AkaAndrew123 1 3,383 Apr-28-2021, 08:16 AM
Last Post: AkaAndrew123
Question UnicodeDecodeError . . . JohnnyCoffee 5 3,461 Feb-28-2021, 02:32 AM
Last Post: JohnnyCoffee

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020