Python Forum
reading html and edit chekcbox to html
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
reading html and edit chekcbox to html
#1
I have a question about reading HTML files, and edit by adding a checkbox before a tag on every link.

I have a test.html that looks like this:
Quote:
<a href="https://www.google.com">Link </a><br><a href="https://www.youtube.com">Link </a><br><a href="https://www.instagram.com">Link </a><br>


i wish my output will be like this, add a checkbox before each link which should look like this
<input type="checkbox"> <a href="https://www.google.com">Link </a><br>
<input type="checkbox"> <a href="https://www.youtube.com">Link </a>
<input type="checkbox"> <a href="https://www.instagram.com">Link </a>
Do anyone have any idea i try like this but seem not working
lines = []
#open file
with open(r'test.html', mode='r') as f:
    for line in f.readlines(): # iterate thru the lines    
        if '<br>' in line:
            text = '<input type="checkbox">'
            lines.append(text)    
            lines.append(line)
        

#write to a new file
with open(test.html', mode='w') as new_f:
    new_f.writelines(lines)
I think I need to add a new line after <br> else it won't work.
Does anyone know how to do solve it?

Step:
Read test.html
edit the test.html adding <checkkbox> before a tag
Reply
#2
I would do like this also remove <br> tag so it's clean and it's better to write CSS for new line.
Could also to this in a parser eg BeautifulSoup,but adding like simple whiteout.
with open(r'test.html') as f, open('out.html', 'w') as f_out:
    for line in f:
        line = line.replace('<br>', '')
        #print(f'<input type="checkbox"> {line}')
        f_out.write(f'<input type="checkbox"> {line}')
Output:
<input type="checkbox"> <a href="https://www.google.com">Link </a> <input type="checkbox"> <a href="https://www.youtube.com">Link </a> <input type="checkbox"> <a href="https://www.instagram.com">Link </a>
Example with CSS CodePen
Maybe also add a <ul> Tag for better CSS.
Pedroski55 likes this post
Reply
#3
(Jun-30-2021, 04:51 PM)snippsat Wrote: I would do like this also remove <br> tag so it's clean and it's better to write CSS for new line.
Could also to this in a parser eg BeautifulSoup,but adding like simple whiteout.
with open(r'test.html') as f, open('out.html', 'w') as f_out:
    for line in f:
        line = line.replace('<br>', '')
        #print(f'<input type="checkbox"> {line}')
        f_out.write(f'<input type="checkbox"> {line}')
Output:
<input type="checkbox"> <a href="https://www.google.com">Link </a> <input type="checkbox"> <a href="https://www.youtube.com">Link </a> <input type="checkbox"> <a href="https://www.instagram.com">Link </a>
Example with CSS CodePen
Maybe also add a <ul> Tag for better CSS.

why I run will occur
UnicodeDecodeError: 'cp950' codec can't decode byte 0xbf in position 2: illegal multibyte sequence

I try to add
with open(r'test.html',encoding="utf-8") as f, open('out123.html', 'w',encoding="utf-8") as f_out:
    for line in f:
        print(line)
        line = line.replace('<br>', '')
        print(f'<input type="checkbox"> {line}')
        f_out.write(f'<input type="checkbox"> {line}')
but the output look like this:
it still not add checkbox in front of the link, only the first one.
<input type="checkbox"> <a href="https://www.google.com">Link </a><a href="https://www.youtube.com">Link </a><a href="https://www.instagram.com">Link </a>
Thanks
Reply
#4
(Jul-01-2021, 04:11 AM)jacklee26 Wrote: t still not add checkbox in front of the link, only the first one.
Ok i thought that your input was on new line,see now that input is one big line.
If you make this file could probably fix it when save it,and also save it as utf-8.

Then it will like this.
with open(r'test.html', encoding='utf-8') as f, open('out.html', 'w', encoding='utf-8') as f_out:
    content = f.read().split('<br>')
    for line in content[:-1]:
        #print(f'<input type="checkbox"> {line}')
        f_out.write(f'<input type="checkbox"> {line}\n')
Output:
<input type="checkbox"> <a href="https://www.google.com">Link </a> <input type="checkbox"> <a href="https://www.youtube.com">Link </a> <input type="checkbox"> <a href="https://www.instagram.com">Link </a>
With <ul> tag as i talked about.
with open(r'test.html', encoding='utf-8') as f, open('out.html', 'w', encoding='utf-8') as f_out:
    content = f.read().split('<br>')
    f_out.write(f'<ul id="links">\n')
    for line in content[:-1]:
        f_out.write(f'<input type="checkbox"> {line}\n')
    f_out.write(f'</ul>\n')
CSS CodePen
Reply
#5
(Jul-01-2021, 08:43 AM)snippsat Wrote:
(Jul-01-2021, 04:11 AM)jacklee26 Wrote: t still not add checkbox in front of the link, only the first one.
Ok i thought that your input was on new line,see now that input is one big line.
If you make this file could probably fix it when save it,and also save it as utf-8.

Then it will like this.
with open(r'test.html', encoding='utf-8') as f, open('out.html', 'w', encoding='utf-8') as f_out:
    content = f.read().split('<br>')
    for line in content[:-1]:
        #print(f'<input type="checkbox"> {line}')
        f_out.write(f'<input type="checkbox"> {line}\n')
Output:
<input type="checkbox"> <a href="https://www.google.com">Link </a> <input type="checkbox"> <a href="https://www.youtube.com">Link </a> <input type="checkbox"> <a href="https://www.instagram.com">Link </a>
With <ul> tag as i talked about.
with open(r'test.html', encoding='utf-8') as f, open('out.html', 'w', encoding='utf-8') as f_out:
    content = f.read().split('<br>')
    f_out.write(f'<ul id="links">\n')
    for line in content[:-1]:
        f_out.write(f'<input type="checkbox"> {line}\n')
    f_out.write(f'</ul>\n')
CSS CodePen

thanks it can work
but i have a question why my code does work for this one, only the first line have check box, the rest don't have. Just doesn't know why
with open('test.html', 'r',encoding="utf-8") as file:
    # read a list of lines into data
    data = file.readlines()
    for i in data:
        line = i.replace('<br>', '<br> \n')
        with open('test_out.html', 'w',encoding="utf-8") as file:
            file.write(f'<input type="checkbox"> {line}')
            #file.write(line)
<input type="checkbox"> <a href="https://www.google.com">Link </a><br>
<a href="https://www.youtube.com">Link </a><br>
<a href="https://www.instagram.com">Link </a><br>
Reply
#6
(Jul-01-2021, 09:50 AM)jacklee26 Wrote: but i have a question why my code does work for this one, one the first line have check box, the rest don't have. Just doesn't know why
Because is still one line doing it like this,have to split on new line then a new loop.
.readlines() is pretty much never needed,see that as i done in previous posts directly loop over file object.
with open('test.html', 'r' ,encoding="utf-8") as file, open('test_out.html', 'w', encoding="utf-8") as f_out:
    for line in file:
        line = line.replace('<br>', '<br>\n').split('\n')
        for item in line[:-1]:
            f_out.write(f'<input type="checkbox"> {item}\n')
Output:
<input type="checkbox"> <a href="https://www.google.com">Link </a><br> <input type="checkbox"> <a href="https://www.youtube.com">Link </a><br> <input type="checkbox"> <a href="https://www.instagram.com">Link </a><br>
jacklee26 likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  WebElements of an HTML page Nik1811 2 266 Mar-14-2024, 12:39 PM
Last Post: Nik1811
  Going through HTML table with selenium emont 3 731 Sep-30-2023, 02:13 AM
Last Post: emont
  Need to replace a string with a file (HTML file) tester_V 1 699 Aug-30-2023, 03:42 AM
Last Post: Larz60+
  why doesn't it replace all html tags? Melcu54 3 692 Jul-05-2023, 04:47 AM
Last Post: Melcu54
  pyscript index error while calling input from html form pyscript_dude 2 938 May-21-2023, 08:17 AM
Last Post: snippsat
  html module in python 3.6.8 suifra 9 1,527 May-12-2023, 11:30 AM
Last Post: suifra
  How to display <IPython.core.display.HTML object>? pythopen 3 45,707 May-06-2023, 08:14 AM
Last Post: pramod08728
  Embedding python script into html via pyscript pyscript_dude 7 1,454 Apr-16-2023, 11:17 PM
Last Post: pyscript_dude
  Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row AaronCatolico1 0 877 Dec-25-2022, 06:28 PM
Last Post: AaronCatolico1
  BeautifulSoup - I can't translate html tags that contain <a href=..</a> OR <em></em> Melcu54 10 1,564 Oct-27-2022, 08:58 AM
Last Post: wavic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020