Python Forum
reading html and edit chekcbox to html
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
reading html and edit chekcbox to html
#1
I have a question about reading HTML files, and edit by adding a checkbox before a tag on every link.

I have a test.html that looks like this:
Quote:
<a href="https://www.google.com">Link </a><br><a href="https://www.youtube.com">Link </a><br><a href="https://www.instagram.com">Link </a><br>


i wish my output will be like this, add a checkbox before each link which should look like this
<input type="checkbox"> <a href="https://www.google.com">Link </a><br>
<input type="checkbox"> <a href="https://www.youtube.com">Link </a>
<input type="checkbox"> <a href="https://www.instagram.com">Link </a>
Do anyone have any idea i try like this but seem not working
lines = []
#open file
with open(r'test.html', mode='r') as f:
    for line in f.readlines(): # iterate thru the lines    
        if '<br>' in line:
            text = '<input type="checkbox">'
            lines.append(text)    
            lines.append(line)
        

#write to a new file
with open(test.html', mode='w') as new_f:
    new_f.writelines(lines)
I think I need to add a new line after <br> else it won't work.
Does anyone know how to do solve it?

Step:
Read test.html
edit the test.html adding <checkkbox> before a tag
Reply
#2
I would do like this also remove <br> tag so it's clean and it's better to write CSS for new line.
Could also to this in a parser eg BeautifulSoup,but adding like simple whiteout.
with open(r'test.html') as f, open('out.html', 'w') as f_out:
    for line in f:
        line = line.replace('<br>', '')
        #print(f'<input type="checkbox"> {line}')
        f_out.write(f'<input type="checkbox"> {line}')
Output:
<input type="checkbox"> <a href="https://www.google.com">Link </a> <input type="checkbox"> <a href="https://www.youtube.com">Link </a> <input type="checkbox"> <a href="https://www.instagram.com">Link </a>
Example with CSS CodePen
Maybe also add a <ul> Tag for better CSS.
Pedroski55 likes this post
Reply
#3
(Jun-30-2021, 04:51 PM)snippsat Wrote: I would do like this also remove <br> tag so it's clean and it's better to write CSS for new line.
Could also to this in a parser eg BeautifulSoup,but adding like simple whiteout.
with open(r'test.html') as f, open('out.html', 'w') as f_out:
    for line in f:
        line = line.replace('<br>', '')
        #print(f'<input type="checkbox"> {line}')
        f_out.write(f'<input type="checkbox"> {line}')
Output:
<input type="checkbox"> <a href="https://www.google.com">Link </a> <input type="checkbox"> <a href="https://www.youtube.com">Link </a> <input type="checkbox"> <a href="https://www.instagram.com">Link </a>
Example with CSS CodePen
Maybe also add a <ul> Tag for better CSS.

why I run will occur
UnicodeDecodeError: 'cp950' codec can't decode byte 0xbf in position 2: illegal multibyte sequence

I try to add
with open(r'test.html',encoding="utf-8") as f, open('out123.html', 'w',encoding="utf-8") as f_out:
    for line in f:
        print(line)
        line = line.replace('<br>', '')
        print(f'<input type="checkbox"> {line}')
        f_out.write(f'<input type="checkbox"> {line}')
but the output look like this:
it still not add checkbox in front of the link, only the first one.
<input type="checkbox"> <a href="https://www.google.com">Link </a><a href="https://www.youtube.com">Link </a><a href="https://www.instagram.com">Link </a>
Thanks
Reply
#4
(Jul-01-2021, 04:11 AM)jacklee26 Wrote: t still not add checkbox in front of the link, only the first one.
Ok i thought that your input was on new line,see now that input is one big line.
If you make this file could probably fix it when save it,and also save it as utf-8.

Then it will like this.
with open(r'test.html', encoding='utf-8') as f, open('out.html', 'w', encoding='utf-8') as f_out:
    content = f.read().split('<br>')
    for line in content[:-1]:
        #print(f'<input type="checkbox"> {line}')
        f_out.write(f'<input type="checkbox"> {line}\n')
Output:
<input type="checkbox"> <a href="https://www.google.com">Link </a> <input type="checkbox"> <a href="https://www.youtube.com">Link </a> <input type="checkbox"> <a href="https://www.instagram.com">Link </a>
With <ul> tag as i talked about.
with open(r'test.html', encoding='utf-8') as f, open('out.html', 'w', encoding='utf-8') as f_out:
    content = f.read().split('<br>')
    f_out.write(f'<ul id="links">\n')
    for line in content[:-1]:
        f_out.write(f'<input type="checkbox"> {line}\n')
    f_out.write(f'</ul>\n')
CSS CodePen
Reply
#5
(Jul-01-2021, 08:43 AM)snippsat Wrote:
(Jul-01-2021, 04:11 AM)jacklee26 Wrote: t still not add checkbox in front of the link, only the first one.
Ok i thought that your input was on new line,see now that input is one big line.
If you make this file could probably fix it when save it,and also save it as utf-8.

Then it will like this.
with open(r'test.html', encoding='utf-8') as f, open('out.html', 'w', encoding='utf-8') as f_out:
    content = f.read().split('<br>')
    for line in content[:-1]:
        #print(f'<input type="checkbox"> {line}')
        f_out.write(f'<input type="checkbox"> {line}\n')
Output:
<input type="checkbox"> <a href="https://www.google.com">Link </a> <input type="checkbox"> <a href="https://www.youtube.com">Link </a> <input type="checkbox"> <a href="https://www.instagram.com">Link </a>
With <ul> tag as i talked about.
with open(r'test.html', encoding='utf-8') as f, open('out.html', 'w', encoding='utf-8') as f_out:
    content = f.read().split('<br>')
    f_out.write(f'<ul id="links">\n')
    for line in content[:-1]:
        f_out.write(f'<input type="checkbox"> {line}\n')
    f_out.write(f'</ul>\n')
CSS CodePen

thanks it can work
but i have a question why my code does work for this one, only the first line have check box, the rest don't have. Just doesn't know why
with open('test.html', 'r',encoding="utf-8") as file:
    # read a list of lines into data
    data = file.readlines()
    for i in data:
        line = i.replace('<br>', '<br> \n')
        with open('test_out.html', 'w',encoding="utf-8") as file:
            file.write(f'<input type="checkbox"> {line}')
            #file.write(line)
<input type="checkbox"> <a href="https://www.google.com">Link </a><br>
<a href="https://www.youtube.com">Link </a><br>
<a href="https://www.instagram.com">Link </a><br>
Reply
#6
(Jul-01-2021, 09:50 AM)jacklee26 Wrote: but i have a question why my code does work for this one, one the first line have check box, the rest don't have. Just doesn't know why
Because is still one line doing it like this,have to split on new line then a new loop.
.readlines() is pretty much never needed,see that as i done in previous posts directly loop over file object.
with open('test.html', 'r' ,encoding="utf-8") as file, open('test_out.html', 'w', encoding="utf-8") as f_out:
    for line in file:
        line = line.replace('<br>', '<br>\n').split('\n')
        for item in line[:-1]:
            f_out.write(f'<input type="checkbox"> {item}\n')
Output:
<input type="checkbox"> <a href="https://www.google.com">Link </a><br> <input type="checkbox"> <a href="https://www.youtube.com">Link </a><br> <input type="checkbox"> <a href="https://www.instagram.com">Link </a><br>
jacklee26 likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  BeautifulSoup - I can't translate html tags that contain <a href=..</a> OR <em></em> Melcu54 10 387 Oct-27-2022, 08:58 AM
Last Post: wavic
  Why is the 'meta description' html tag not translated? Melcu54 2 327 Oct-15-2022, 10:55 PM
Last Post: Larz60+
  color column in mail html df bnadir55 0 264 Aug-14-2022, 07:11 AM
Last Post: bnadir55
  [ElementTree] Insert big block of HTML? Winfried 0 578 May-12-2022, 07:08 AM
Last Post: Winfried
Question How to get html information from a tab of my default browser opened with webbrowser? noahverner1995 2 1,520 Jan-14-2022, 10:02 AM
Last Post: noahverner1995
  Get text from within h3 html tags Pedroski55 8 1,579 Jan-05-2022, 06:50 AM
Last Post: Larz60+
  HTML file crashes program mikefirth 12 2,002 Dec-31-2021, 03:57 AM
Last Post: Pedroski55
  simple html page with update data korenron 3 1,287 Nov-15-2021, 09:31 AM
Last Post: jamesaarr
  Dictionary within html code ebolisa 4 1,573 Aug-09-2021, 11:36 AM
Last Post: ebolisa
  Help with Python -> html lukeyd 6 1,294 Aug-01-2021, 03:17 PM
Last Post: lukeyd

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020