Python Forum
comparing 2 lists and highlighting key elements - Printable Version

+- Python Forum (
+-- Forum: Python Coding (
+--- Forum: Web Scraping & Web Development (
+--- Thread: comparing 2 lists and highlighting key elements (/thread-15379.html)

comparing 2 lists and highlighting key elements - kapilan15 - Jan-15-2019

from colorama import Fore, init


key_words = ['mystery', 'the', 'charge', 'pretends']
paragraph_split = ['the', 'desired', 'mystery', 'corners', 'the', 'differential', '.', 'the', 'back', 'pretends', 'to', 'be', 'the']

def highlight(word):
    if word in key_words:
        return Fore.RED + str(word)
        return str(word)

colored_paragraph = list(map(highlight, paragraph_split))
print (" ".join(colored_paragraph))
Essentially I want the code to go through each word in paragraph_split list and check whether the respective word exists in the paragraph_split list.

If it exists, I want to replace the word with red font and move on to the next word

Then I want to join the words in colored_paragraph list and print out as a sentence.

The output should have some words colored in red and others in normal colour.

I ran my code in cmd and it gives all output in red font rather than just the words in key_words.
the desired mystery corners the differential . the back pretends to be the

Then I added encode("utf-8") in the print statement so it becomes
print (" ".join(colored_paragraph).encode("utf-8"))
This gave me
b'\x1b[31mthe desired \x1b[31mmystery corners \x1b[31mthe differential . \x1b[31mthe back \x1b[31mpretends to be \x1b[31mthe'
The words were not printed in colors. Can someone help me on this please? Thanks

RE: comparing 2 lists and highlighting key elements - buran - Jan-15-2019

you need to reset back to normal color, e.g
print('\033[30m') # this will reset to default color
return Fore.RED + word + Fore.RESET
or when call init, supply autoreset=True argument (default is False)

In this case you don't need Fore.RESET

RE: comparing 2 lists and highlighting key elements - kapilan15 - Jan-15-2019

import bs4 as bs
import urllib.request
import re
import os
from colorama import Fore, Back, Style, init

def highlight(word):
    if word in keywords:
        return Fore.RED + str(word) + Fore.RESET
        return str(word)

for newurl in newurls:
    url = urllib.request.urlopen(newurl)
    soup1 = bs.BeautifulSoup(url, 'lxml')
    paragraphs =soup1.findAll('p')
    print (Fore.GREEN + soup1.h2.text + Fore.RESET)
    for paragraph in paragraphs:
        if paragraph != None:
            textpara = paragraph.text.strip().split(' ')
            colored_words = list(map(highlight, textpara))
            print(" ".join(colored_words).encode("utf-8")) #encode("utf-8")
Thanks it works on the previous code. This is a slightly different code but similar content. I will have list of key words and urls to go through.
after adding your suggestion, I ran few key words in a url and it is giving me an output similar to the previous code
b'\x1b[31mthe desired \x1b[31mmystery corners \x1b[31mthe differential . \x1b[31mthe back \x1b[31mpretends to be \x1b[31mthe'
If I remove encode("utf-8") then cmd is giving me encoding error
Traceback (most recent call last): File "C:\Users\resea\Desktop\Python Projects\Try", line 52, in <module> print(" ".join(colored_words)) #encode("utf-8") File "C:\Python34\lib\site-packages\colorama\", line 41, in write self.__convertor.write(text) File "C:\Python34\lib\site-packages\colorama\", line 162, in write self.write_and_convert(text) File "C:\Python34\lib\site-packages\colorama\", line 190, in write_and_convert self.write_plain_text(text, cursor, len(text)) File "C:\Python34\lib\site-packages\colorama\", line 195, in write_plain_text self.wrapped.write(text[start:end]) File "C:\Python34\lib\encodings\", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 23: character maps to <undefined>
Can you help where I am going wrong please?