Python Forum
.txt return specific lines or strings
Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
.txt return specific lines or strings
#9
(Feb-08-2019, 02:34 AM)s_o_what Wrote: No idea what this does.......
urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+', read_file)
It takes url address out of link in HTML source code.
A better way always when dealing with HTML/XML is to use a Parser.
Example:
urls.txt
Output:
<a href="https://python-forum.io/" target="_blank">Visit Python Forum</a> <li class="tier-2" role="treeitem"><a href="http://docs.python.org/devguide/">Developer Guide</a></li> <a href="ftp://theftpserver.com/files/acounts.pdf">Download file</a>

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('urls.txt', encoding='utf-8'), 'lxml')
links = soup.find_all('a', href=True)
for url in links:
    print(url.get('href'))
Output:
https://python-forum.io/ http://docs.python.org/devguide/ ftp://theftpserver.com/files/acounts.pdf
Web-Scraping part-1
Reply


Messages In This Thread
.txt return specific lines or strings - by s_o_what - Feb-06-2019, 02:53 AM
RE: .txt return specific lines or strings - by snippsat - Feb-08-2019, 11:49 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  How do I extract specific lines from HTML files before and after a word? glittergirl 1 5,106 Aug-06-2019, 07:23 AM
Last Post: fishhook

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020