Python Forum

import bs4 as bs
import urllib.request
import re
import os
from colorama import Fore, Back, Style, init
init()
 
 
def highlight(word):
    if word in keywords:
        return Fore.RED + str(word) + Fore.RESET
    else:
        return str(word)
 
 
for newurl in newurls:
    url = urllib.request.urlopen(newurl)
    soup1 = bs.BeautifulSoup(url, 'lxml')
    paragraphs =soup1.findAll('p')
    print (Fore.GREEN + soup1.h2.text + Fore.RESET)
    print('')
    for paragraph in paragraphs:
        if paragraph != None:
            textpara = paragraph.text.strip().split(' ')
            colored_words = list(map(highlight, textpara))
            print(" ".join(colored_words).encode("utf-8")) #encode("utf-8")
        else:
            pass

I will have list of key words and urls to go through. After running few keywords in a url, I get output like this

Output:b'\x1b[31mthe desired \x1b[31mmystery corners \x1b[31mthe differential . 
\x1b[31mthe back \x1b[31mpretends to be \x1b[31mthe'

I removed encode("utf-8") and I get encoding error

Output:Traceback (most recent call last):
 File "C:\Users\resea\Desktop\Python Projects\Try 3.py", line 52, in 
 <module>
   print(" ".join(colored_words)) #encode("utf-8")
  File "C:\Python34\lib\site-packages\colorama\ansitowin32.py", line 41, in 
   write
  self.__convertor.write(text)
   File "C:\Python34\lib\site-packages\colorama\ansitowin32.py", line 162, 
   in write
    self.write_and_convert(text)
   File "C:\Python34\lib\site-packages\colorama\ansitowin32.py", line 190, 
  in write_and_convert
  self.write_plain_text(text, cursor, len(text))
  File "C:\Python34\lib\site-packages\colorama\ansitowin32.py", line 195, in 
   write_plain_text
  self.wrapped.write(text[start:end])
   File "C:\Python34\lib\encodings\cp850.py", line 19, in encode
   return codecs.charmap_encode(input,self.errors,encoding_map)[0]
   UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in 
   position 23: character maps to <undefined>

Can you help where I am going wrong please? Thanks
Do i need to use different encoding?

you show:

print(" ".join(colored_words).encode("utf-8")) #encode("utf-8")

yet error shows:

print(" ".join(colored_words)) #encode("utf-8")

Not the same code. Also, try encoding = None which is a long shot.
see: https://wiki.python.org/moin/PrintFails about u/2019

kapilan15

Larz60+