HTML file crashes program

Pedroski55 · Dec-30-2021, 04:29 AM

Riding my bicycle to work this beautiful but cold morning, I thought of another way to do this.

Just out of interest, no modules needed, if you have the html.

def myApp():
    # get the html somehow first, then open it

    path2text = '/home/pedro/temp/lovely_louise.html'
    with open(path2text) as f:
        lines = f.readlines()

    print('lines is', len(lines), 'long')

    # get lines with '<img src="' because they contain pictures
    # put these lines in data
    data = []
    for line in lines:
        if '<img src="' in line:
            data.append(line)

    print('data is', len(data), 'long')

    # have a look at the data
    for d in data:
        print(d)

    jpg_names = []
    
    # split each line on img src=
    # you get the list splitline
    # the second element of the list splitline, splitline[1], contains the name of the picture file   

    for line in data:
        print(line)
        splitline = line.split('img src=')
        pic_data = splitline[1]
        pic_datalist = pic_data.split()
        name = pic_datalist[0]
        # maybe the picture file name is enclosed in ' ' otherwise by " ", get rid of them
        # maybe there is some leading or trailing space in the html
        # before or after the file name
        filename = name.replace('"', '').replace('\'', '').replace(' ', '')
        jpg_names.append(filename)

    print('jpg_names is', len(jpg_names), 'long')

    for j in jpg_names:
        print(j)

    savename = '/home/pedro/temp/photo_names.txt'

    with open(savename, 'w') as f:
        text = '\n'.join(jpg_names)
        f.write(text)

    print('All done!')

***snippsat*** · (This post was last modified: Dec-30-2021, 04:26 PM by snippsat.)

(Dec-30-2021, 04:29 AM)Pedroski55 Wrote: Just out of interest, no modules needed, if you have the html.

Good effort,but missing 10 images links Doh

If run my code you see it download 62 images.
Run into similar problem as using regex,that can not mange all HTML rules.
If want to see all image links using a local file,a little simpler than your code and it works 62 image links.

import requests, os
from bs4 import BeautifulSoup

soup = BeautifulSoup(open('Louise.html', encoding='ISO-8859-1'), 'lxml')
for im in soup.select('img'):
    print(im.get('src'))

Pedroski55 · Dec-31-2021, 03:57 AM

@snippsat You are right!

I just opened the html file in gedit and a search for <img gives 62 counts.

I changed my soupless code and now get the correct number of image files!

That was interesting!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Need to replace a string with a file (HTML file)	tester_V	1	2,070	Aug-30-2023, 03:42 AM Last Post: Larz60+
	Understanding and debugging memory error crashes with python3.10.10	Arkaik	5	4,952	Apr-18-2023, 03:22 AM Last Post: Larz60+
	Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row	AaronCatolico1	0	2,002	Dec-25-2022, 06:28 PM Last Post: AaronCatolico1
	Pydroid3 app crashes on xiaomi poco F3	JMD	2	2,597	Nov-27-2022, 11:56 AM Last Post: JMD
	Scraping a Flexible Element - works at first, and then crashes	JonnyB	0	1,980	Aug-14-2021, 07:25 PM Last Post: JonnyB
	reading html and edit chekcbox to html	jacklee26	5	4,583	Jul-01-2021, 10:31 AM Last Post: snippsat
	code for CSV file to html file without pandas	jony057	1	4,180	Apr-24-2021, 09:41 PM Last Post: snippsat
	Making .exe file that requires access to text and html files	ClassicalSoul	0	2,074	Apr-23-2020, 05:03 PM Last Post: ClassicalSoul
	importing CSV file into a HTML table using Python	trybakov	1	3,313	Feb-22-2020, 09:47 PM Last Post: scidam
	How do I read the HTML files in a directory and write the content into a CSV file?	glittergirl	1	3,476	Sep-23-2019, 11:01 AM Last Post: Larz60+

HTML file crashes program

User Panel Messages

Announcements