Dec-29-2021, 04:45 AM
(This post was last modified: Dec-29-2021, 04:46 AM by Pedroski55.)
@snippsatt Just trying!
Still not too sure how to jump ahead from the start_pos in this start_pos = pattern1.search(line).span()[1], but I get the result I wanted!
I believe bowlfred is very good with re, maybe he can straighten out my code!
This gets me the desired result. The problem was, the html is not uniform.
I needed another pattern to cater for the lack of alt="
Still not too sure how to jump ahead from the start_pos in this start_pos = pattern1.search(line).span()[1], but I get the result I wanted!
I believe bowlfred is very good with re, maybe he can straighten out my code!
This gets me the desired result. The problem was, the html is not uniform.
I needed another pattern to cater for the lack of alt="
#! /usr/bin/python3 import re path2text = '/home/pedro/temp/lovely_louise.html' with open(path2text) as f: lines = f.readlines() print('lines is', len(lines), 'long') # get lines with '<img src="' data = [] for line in lines: if '<img src="' in line: data.append(line) print('data is', len(data), 'long') for d in data: print(d) pattern1 = re.compile('<img src="') #pattern2 = re.compile('img src=".jpg"') pattern2 = re.compile('jpg" alt="') pattern3 = re.compile('gif" alt="') pattern4 = re.compile('jpg" border="') def get_Image_name(line): start_span = pattern1.search(line).span() start_pos = pattern1.search(line).span()[1] # maybe not a jpg if not pattern2.search(line) == None: end_pos = pattern2.search(line).span()[0] + 3 elif not pattern3.search(line) == None: end_pos = pattern3.search(line).span()[0] + 3 elif not pattern4.search(line) == None: end_pos = pattern4.search(line).span()[0] + 3 # add more ifs for other images img_name = line[start_pos:end_pos] return img_name # a list to take the names jpg_names = [] # some names are not picked up, need to look at that for line in data: print(line) name = get_Image_name(line) jpg_names.append(name) print('jpg_names is', len(jpg_names), 'long') savename = '/home/pedro/temp/photo_names.txt' for jpeg in jpg_names: print('picture is', jpeg) with open(savename, 'w') as f: text = '\n'.join(jpg_names) f.write(text) print('All done!')