Python Forum

Full Version: reading html and edit chekcbox to html
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have a question about reading HTML files, and edit by adding a checkbox before a tag on every link.

I have a test.html that looks like this:
Quote:
<a href="https://www.google.com">Link </a><br><a href="https://www.youtube.com">Link </a><br><a href="https://www.instagram.com">Link </a><br>


i wish my output will be like this, add a checkbox before each link which should look like this
<input type="checkbox"> <a href="https://www.google.com">Link </a><br>
<input type="checkbox"> <a href="https://www.youtube.com">Link </a>
<input type="checkbox"> <a href="https://www.instagram.com">Link </a>
Do anyone have any idea i try like this but seem not working
lines = []
#open file
with open(r'test.html', mode='r') as f:
    for line in f.readlines(): # iterate thru the lines    
        if '<br>' in line:
            text = '<input type="checkbox">'
            lines.append(text)    
            lines.append(line)
        

#write to a new file
with open(test.html', mode='w') as new_f:
    new_f.writelines(lines)
I think I need to add a new line after <br> else it won't work.
Does anyone know how to do solve it?

Step:
Read test.html
edit the test.html adding <checkkbox> before a tag
I would do like this also remove <br> tag so it's clean and it's better to write CSS for new line.
Could also to this in a parser eg BeautifulSoup,but adding like simple whiteout.
with open(r'test.html') as f, open('out.html', 'w') as f_out:
    for line in f:
        line = line.replace('<br>', '')
        #print(f'<input type="checkbox"> {line}')
        f_out.write(f'<input type="checkbox"> {line}')
Output:
<input type="checkbox"> <a href="https://www.google.com">Link </a> <input type="checkbox"> <a href="https://www.youtube.com">Link </a> <input type="checkbox"> <a href="https://www.instagram.com">Link </a>
Example with CSS CodePen
Maybe also add a <ul> Tag for better CSS.
(Jun-30-2021, 04:51 PM)snippsat Wrote: [ -> ]I would do like this also remove <br> tag so it's clean and it's better to write CSS for new line.
Could also to this in a parser eg BeautifulSoup,but adding like simple whiteout.
with open(r'test.html') as f, open('out.html', 'w') as f_out:
    for line in f:
        line = line.replace('<br>', '')
        #print(f'<input type="checkbox"> {line}')
        f_out.write(f'<input type="checkbox"> {line}')
Output:
<input type="checkbox"> <a href="https://www.google.com">Link </a> <input type="checkbox"> <a href="https://www.youtube.com">Link </a> <input type="checkbox"> <a href="https://www.instagram.com">Link </a>
Example with CSS CodePen
Maybe also add a <ul> Tag for better CSS.

why I run will occur
UnicodeDecodeError: 'cp950' codec can't decode byte 0xbf in position 2: illegal multibyte sequence

I try to add
with open(r'test.html',encoding="utf-8") as f, open('out123.html', 'w',encoding="utf-8") as f_out:
    for line in f:
        print(line)
        line = line.replace('<br>', '')
        print(f'<input type="checkbox"> {line}')
        f_out.write(f'<input type="checkbox"> {line}')
but the output look like this:
it still not add checkbox in front of the link, only the first one.
<input type="checkbox"> <a href="https://www.google.com">Link </a><a href="https://www.youtube.com">Link </a><a href="https://www.instagram.com">Link </a>
Thanks
(Jul-01-2021, 04:11 AM)jacklee26 Wrote: [ -> ]t still not add checkbox in front of the link, only the first one.
Ok i thought that your input was on new line,see now that input is one big line.
If you make this file could probably fix it when save it,and also save it as utf-8.

Then it will like this.
with open(r'test.html', encoding='utf-8') as f, open('out.html', 'w', encoding='utf-8') as f_out:
    content = f.read().split('<br>')
    for line in content[:-1]:
        #print(f'<input type="checkbox"> {line}')
        f_out.write(f'<input type="checkbox"> {line}\n')
Output:
<input type="checkbox"> <a href="https://www.google.com">Link </a> <input type="checkbox"> <a href="https://www.youtube.com">Link </a> <input type="checkbox"> <a href="https://www.instagram.com">Link </a>
With <ul> tag as i talked about.
with open(r'test.html', encoding='utf-8') as f, open('out.html', 'w', encoding='utf-8') as f_out:
    content = f.read().split('<br>')
    f_out.write(f'<ul id="links">\n')
    for line in content[:-1]:
        f_out.write(f'<input type="checkbox"> {line}\n')
    f_out.write(f'</ul>\n')
CSS CodePen
(Jul-01-2021, 08:43 AM)snippsat Wrote: [ -> ]
(Jul-01-2021, 04:11 AM)jacklee26 Wrote: [ -> ]t still not add checkbox in front of the link, only the first one.
Ok i thought that your input was on new line,see now that input is one big line.
If you make this file could probably fix it when save it,and also save it as utf-8.

Then it will like this.
with open(r'test.html', encoding='utf-8') as f, open('out.html', 'w', encoding='utf-8') as f_out:
    content = f.read().split('<br>')
    for line in content[:-1]:
        #print(f'<input type="checkbox"> {line}')
        f_out.write(f'<input type="checkbox"> {line}\n')
Output:
<input type="checkbox"> <a href="https://www.google.com">Link </a> <input type="checkbox"> <a href="https://www.youtube.com">Link </a> <input type="checkbox"> <a href="https://www.instagram.com">Link </a>
With <ul> tag as i talked about.
with open(r'test.html', encoding='utf-8') as f, open('out.html', 'w', encoding='utf-8') as f_out:
    content = f.read().split('<br>')
    f_out.write(f'<ul id="links">\n')
    for line in content[:-1]:
        f_out.write(f'<input type="checkbox"> {line}\n')
    f_out.write(f'</ul>\n')
CSS CodePen

thanks it can work
but i have a question why my code does work for this one, only the first line have check box, the rest don't have. Just doesn't know why
with open('test.html', 'r',encoding="utf-8") as file:
    # read a list of lines into data
    data = file.readlines()
    for i in data:
        line = i.replace('<br>', '<br> \n')
        with open('test_out.html', 'w',encoding="utf-8") as file:
            file.write(f'<input type="checkbox"> {line}')
            #file.write(line)
<input type="checkbox"> <a href="https://www.google.com">Link </a><br>
<a href="https://www.youtube.com">Link </a><br>
<a href="https://www.instagram.com">Link </a><br>
(Jul-01-2021, 09:50 AM)jacklee26 Wrote: [ -> ]but i have a question why my code does work for this one, one the first line have check box, the rest don't have. Just doesn't know why
Because is still one line doing it like this,have to split on new line then a new loop.
.readlines() is pretty much never needed,see that as i done in previous posts directly loop over file object.
with open('test.html', 'r' ,encoding="utf-8") as file, open('test_out.html', 'w', encoding="utf-8") as f_out:
    for line in file:
        line = line.replace('<br>', '<br>\n').split('\n')
        for item in line[:-1]:
            f_out.write(f'<input type="checkbox"> {item}\n')
Output:
<input type="checkbox"> <a href="https://www.google.com">Link </a><br> <input type="checkbox"> <a href="https://www.youtube.com">Link </a><br> <input type="checkbox"> <a href="https://www.instagram.com">Link </a><br>