Python Forum
Read input file and print hyperlinks
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Read input file and print hyperlinks
#1
Hello everybody, sorry for my last post it does not show the picture,
Edit admin:
No problem,just find "Insert Python tag" button.

I am new in python and i am trying to make a program that prompts for an input file, reads it and prints all the lines
that contain hyperlinks and the text that follows the hyperlink. For example if the file contains the link :

"<a href="http://python-forum.io/search.php?action=unreads">Unread Posts</a>" 
The output print should be:
Output:
htt://python-forum.io/search.php?action=unreads     Unread Posts
Reply
#2
You can take a look at my tutorial here Web-Scraping part-1.
Reply
#3
Thank you for the reply,

just i have difficulty to make it work for files that are stored in my computer.
Reply
#4
What have you tried so far?  Please post the code you've written.  I would suggest starting with a small file, perhaps 3 or 4 lines.  To make it easy, make sure the file is in the same location as your script.  Your script should start off simple as well, open the file, read a line, write it to the screen, go back read the next line, write it to the screen, and so on. Once you do that and it runs without errors, start refining your script.
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
Reply
#5
Here a example with line you have post.
from bs4 import BeautifulSoup

with open('html_from_disk.txt') as f:
   html = f.read()

soup = BeautifulSoup(html, 'html.parser')
text = soup.find('a').text
link = soup.find('a')
print(text) #--> Unread Posts
print(link.get('href')) #--> http:/python-forum.io/search.php?action=unreads
Reply
#6
Hello and thank you for the precious help,

with this code I managed to print all hyperlinks in separate lines , but still I can't find how to print also the text that follows every hyperlink.
Could I add to the above code a prompt for the user to give me the input file?

I tried to add this:
test=raw_input('Enter a filename: ')

with open('test') as f: 
but it does not work.
Reply
#7
You can not have quotes around 'test',
then is just a string test.

Here with a better variable name.
file_name = raw_input('Enter a filename: ')

with open(file_name) as f:
Reply
#8
I managed to make it work with this code

from bs4 import BeautifulSoup
file = raw_input('Type file path: ')
with open(file) as f:
 html = f.read()
soup = BeautifulSoup(html, 'html.parser')
for link in soup.find_all('a'):
    print(link.get('href'))
    print(link.get_text())
but I still get links that I do not wont, like the links from img tags, is there any way to exclude them from print?
Quote:<img src="http://www.ekdd.gr/ekdda/custom/seminars/bullet_green.png"><a>test1</a></div>
<img src="http://www.ekdd.gr/ekdda/custom/seminars/bullet_red.png"><a>test2</a></div>
from the above I get None test1
                               None test2
Reply
#9
You most learn to not use quote tag on code,
i have fixed all you post.
In editor there there is "Insert python tag" to right of "Insert quote" button.

This is wrong:
print(link.get_text())
# Shall be
print(link.text)
Quote:but I still get links that I do not wont, like the links from img tags, is there any way to exclude them from print?
for link in soup.find_all('a'):
    if 'img' not in link:
        print(link.get('href'))
        print(link.text)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Flask read real time print from subprocess without deadlock vofka32 0 4,022 Jun-02-2021, 09:36 AM
Last Post: vofka32
  Open and read a tab delimited file from html using python cgi luffy 2 2,633 Aug-24-2020, 06:25 AM
Last Post: luffy
  Read owl file using python flask Gayathri 1 2,398 Nov-20-2019, 12:56 PM
Last Post: ChislaineWijdeven
  Read XML-File yuyu 16 6,982 Dec-15-2018, 10:49 PM
Last Post: snippsat
  how to read data from xml file Raj 7 5,186 Apr-14-2018, 12:14 PM
Last Post: Raj
  How to get hyperlinks in to the table extracted by BeautifulSoup KenniT 2 4,900 Apr-04-2018, 10:05 AM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020