Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
HTML file crashes program
#8
@snippsatt Just trying!

Still not too sure how to jump ahead from the start_pos in this start_pos = pattern1.search(line).span()[1], but I get the result I wanted!

I believe bowlfred is very good with re, maybe he can straighten out my code!

This gets me the desired result. The problem was, the html is not uniform.

I needed another pattern to cater for the lack of alt="

#! /usr/bin/python3
import re
  
path2text = '/home/pedro/temp/lovely_louise.html'
with open(path2text) as f:
    lines = f.readlines()

print('lines is', len(lines), 'long')

# get lines with '<img src="'
data = []
for line in lines:
    if '<img src="' in line:
        data.append(line)

print('data is', len(data), 'long')

for d in data:
    print(d)

pattern1 = re.compile('<img src="')
#pattern2 = re.compile('img src=".jpg"')
pattern2 = re.compile('jpg" alt="')
pattern3 = re.compile('gif" alt="')
pattern4 = re.compile('jpg" border="')

def get_Image_name(line):
    start_span = pattern1.search(line).span()
    start_pos = pattern1.search(line).span()[1]
    # maybe not a jpg
    if not pattern2.search(line) == None:
        end_pos = pattern2.search(line).span()[0] + 3
    elif not pattern3.search(line) == None:
        end_pos = pattern3.search(line).span()[0] + 3
    elif not pattern4.search(line) == None:
        end_pos = pattern4.search(line).span()[0] + 3
    # add more ifs for other images
    img_name = line[start_pos:end_pos]
    return img_name

# a list to take the names
jpg_names = []

# some names are not picked up, need to look at that
for line in data:
    print(line)
    name = get_Image_name(line)
    jpg_names.append(name)

print('jpg_names is', len(jpg_names), 'long')
savename = '/home/pedro/temp/photo_names.txt'

for jpeg in jpg_names:
    print('picture is', jpeg)

with open(savename, 'w') as f:
    text = '\n'.join(jpg_names)
    f.write(text)

print('All done!')   
Reply


Messages In This Thread
HTML file crashes program - by mikefirth - Dec-27-2021, 07:01 PM
RE: HTML file crashes program - by snippsat - Dec-27-2021, 09:21 PM
RE: HTML file crashes program - by mikefirth - Dec-28-2021, 04:29 AM
RE: HTML file crashes program - by ibreeden - Dec-28-2021, 10:01 AM
RE: HTML file crashes program - by snippsat - Dec-28-2021, 10:26 AM
RE: HTML file crashes program - by Pedroski55 - Dec-29-2021, 12:06 AM
RE: HTML file crashes program - by snippsat - Dec-29-2021, 01:00 AM
RE: HTML file crashes program - by Pedroski55 - Dec-29-2021, 04:45 AM
RE: HTML file crashes program - by snippsat - Dec-29-2021, 12:10 PM
RE: HTML file crashes program - by Pedroski55 - Dec-29-2021, 10:18 PM
RE: HTML file crashes program - by Pedroski55 - Dec-30-2021, 04:29 AM
RE: HTML file crashes program - by snippsat - Dec-30-2021, 04:26 PM
RE: HTML file crashes program - by Pedroski55 - Dec-31-2021, 03:57 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Need to replace a string with a file (HTML file) tester_V 1 2,043 Aug-30-2023, 03:42 AM
Last Post: Larz60+
  Understanding and debugging memory error crashes with python3.10.10 Arkaik 5 4,887 Apr-18-2023, 03:22 AM
Last Post: Larz60+
  Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row AaronCatolico1 0 1,965 Dec-25-2022, 06:28 PM
Last Post: AaronCatolico1
  Pydroid3 app crashes on xiaomi poco F3 JMD 2 2,560 Nov-27-2022, 11:56 AM
Last Post: JMD
  Scraping a Flexible Element - works at first, and then crashes JonnyB 0 1,966 Aug-14-2021, 07:25 PM
Last Post: JonnyB
  reading html and edit chekcbox to html jacklee26 5 4,559 Jul-01-2021, 10:31 AM
Last Post: snippsat
  code for CSV file to html file without pandas jony057 1 4,148 Apr-24-2021, 09:41 PM
Last Post: snippsat
  Making .exe file that requires access to text and html files ClassicalSoul 0 2,049 Apr-23-2020, 05:03 PM
Last Post: ClassicalSoul
  importing CSV file into a HTML table using Python trybakov 1 3,279 Feb-22-2020, 09:47 PM
Last Post: scidam
  How do I read the HTML files in a directory and write the content into a CSV file? glittergirl 1 3,465 Sep-23-2019, 11:01 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020