.txt return specific lines or strings

***snippsat*** · (This post was last modified: Feb-08-2019, 11:49 AM by snippsat.)

(Feb-08-2019, 02:34 AM)s_o_what Wrote: No idea what this does.......
urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+', read_file)

It takes url address out of link in HTML source code.
A better way always when dealing with HTML/XML is to use a Parser.
Example:
urls.txt

Output:<a href="https://python-forum.io/" target="_blank">Visit Python Forum</a>
<li class="tier-2" role="treeitem"><a href="http://docs.python.org/devguide/">Developer Guide</a></li>
<a href="ftp://theftpserver.com/files/acounts.pdf">Download file</a>

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('urls.txt', encoding='utf-8'), 'lxml')
links = soup.find_all('a', href=True)
for url in links:
    print(url.get('href'))

Output:https://python-forum.io/
http://docs.python.org/devguide/
ftp://theftpserver.com/files/acounts.pdf

Web-Scraping part-1

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How do I extract specific lines from HTML files before and after a word?	glittergirl	1	5,106	Aug-06-2019, 07:23 AM Last Post: fishhook

.txt return specific lines or strings

User Panel Messages

Announcements