Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
HTML file crashes program
#11
Riding my bicycle to work this beautiful but cold morning, I thought of another way to do this.

Just out of interest, no modules needed, if you have the html.

def myApp():
    # get the html somehow first, then open it

    path2text = '/home/pedro/temp/lovely_louise.html'
    with open(path2text) as f:
        lines = f.readlines()

    print('lines is', len(lines), 'long')

    # get lines with '<img src="' because they contain pictures
    # put these lines in data
    data = []
    for line in lines:
        if '<img src="' in line:
            data.append(line)

    print('data is', len(data), 'long')

    # have a look at the data
    for d in data:
        print(d)

    jpg_names = []
    
    # split each line on img src=
    # you get the list splitline
    # the second element of the list splitline, splitline[1], contains the name of the picture file   

    for line in data:
        print(line)
        splitline = line.split('img src=')
        pic_data = splitline[1]
        pic_datalist = pic_data.split()
        name = pic_datalist[0]
        # maybe the picture file name is enclosed in ' ' otherwise by " ", get rid of them
        # maybe there is some leading or trailing space in the html
        # before or after the file name
        filename = name.replace('"', '').replace('\'', '').replace(' ', '')
        jpg_names.append(filename)

    print('jpg_names is', len(jpg_names), 'long')

    for j in jpg_names:
        print(j)

    savename = '/home/pedro/temp/photo_names.txt'

    with open(savename, 'w') as f:
        text = '\n'.join(jpg_names)
        f.write(text)

    print('All done!')
Reply
#12
(Dec-30-2021, 04:29 AM)Pedroski55 Wrote: Just out of interest, no modules needed, if you have the html.
Good effort,but missing 10 images links Doh
If run my code you see it download 62 images.
Run into similar problem as using regex,that can not mange all HTML rules.
If want to see all image links using a local file,a little simpler than your code and it works 62 image links.
import requests, os
from bs4 import BeautifulSoup

soup = BeautifulSoup(open('Louise.html', encoding='ISO-8859-1'), 'lxml')
for im in soup.select('img'):
    print(im.get('src'))
Reply
#13
@snippsat You are right!

I just opened the html file in gedit and a search for <img gives 62 counts.

I changed my soupless code and now get the correct number of image files!

That was interesting!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Need to replace a string with a file (HTML file) tester_V 1 772 Aug-30-2023, 03:42 AM
Last Post: Larz60+
  Understanding and debugging memory error crashes with python3.10.10 Arkaik 5 2,105 Apr-18-2023, 03:22 AM
Last Post: Larz60+
  Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row AaronCatolico1 0 931 Dec-25-2022, 06:28 PM
Last Post: AaronCatolico1
  Pydroid3 app crashes on xiaomi poco F3 JMD 2 1,253 Nov-27-2022, 11:56 AM
Last Post: JMD
  Scraping a Flexible Element - works at first, and then crashes JonnyB 0 1,514 Aug-14-2021, 07:25 PM
Last Post: JonnyB
  reading html and edit chekcbox to html jacklee26 5 3,080 Jul-01-2021, 10:31 AM
Last Post: snippsat
  code for CSV file to html file without pandas jony057 1 2,969 Apr-24-2021, 09:41 PM
Last Post: snippsat
  Making .exe file that requires access to text and html files ClassicalSoul 0 1,583 Apr-23-2020, 05:03 PM
Last Post: ClassicalSoul
  importing CSV file into a HTML table using Python trybakov 1 2,300 Feb-22-2020, 09:47 PM
Last Post: scidam
  How do I read the HTML files in a directory and write the content into a CSV file? glittergirl 1 2,600 Sep-23-2019, 11:01 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020