Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Problem with this script
#1
Hi,

I have this scipt
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://en.wikipedia.org/wiki/Python')
bs = BeautifulSoup(html, "html.parser")
titles = bs.find_all(['title', 'h1', 'h2','h3','h4','h5','h6','p','img','alt'])
print('List all the header tags :', *titles, sep='\n\n')
This script open One url but I want to open 2 or 3 ou 4 Url

or open all url from 100 result of google request.

How do I make changes ?

Thanks
Max
Reply
#2
urls = ['url1', 'url2', 'url3', 'etc']
for url in urls:
    html = urlopen(url)
    #your code ...
Reply
#3
I write
from urllib.request import urlopen
from bs4 import BeautifulSoup
urls = ['https://en.wikipedia.org/wiki/Python', 'https://en.wikipedia.org/wiki/JavaScript']
for url in urls:
    html = urlopen(url)
bs = BeautifulSoup(html, "html.parser")
titles = bs.find_all(['title', 'h1', 'h2','h3','h4','h5','h6','p','img','alt'])
print('List all the header tags :', *titles, sep='\n\n')
But this error is generated

TypeError: POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str.

How do I write the code ?

Thanks for your help
Reply
#4
Like this.
from urllib.request import urlopen
from bs4 import BeautifulSoup

urls = ['https://en.wikipedia.org/wiki/Python', 'https://en.wikipedia.org/wiki/JavaScript']
for url in urls:
    bs = BeautifulSoup(urlopen(url), "html.parser")
    titles = bs.find_all(['title', 'h1', 'h2','h3','h4','h5','h6','p','img','alt'])
    print('List all the header tags :', *titles, sep='\n\n')
The output can can get messy soon or already,maybe you have a plan for this output.

A advice is to use Requests and lxml(faster) as parser.
import requests
from bs4 import BeautifulSoup

urls = ['https://en.wikipedia.org/wiki/Python', 'https://en.wikipedia.org/wiki/JavaScript']
for url in urls:
    bs = BeautifulSoup(requests.get(url).content, "lxml")
    titles = bs.find_all(['title', 'h1', 'h2','h3','h4','h5','h6','p','img','alt'])
    print('List all the header tags :', *titles, sep='\n\n')
Reply
#5
Thanks for your Help

I will share the plan with you later or through another topic as I have to leave. Thank you very much for your help
Reply
#6
Hi,
Excuse me, but if I may, I’ll come back to the subject.
For the script to work, I need to add the urls one by one.
(If I have 50 url, I must add 50 url)
for exemple:
urls = ['https://fr.wikipedia.org/wiki/JavaScript', 'https://fr.wikipedia.org/wiki/Python_(langage)']
Could we make the script work but with the google request
www.google.fr/search?q=wikipedia+javascript+python
Thanks you for your help
Maxwell
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Script stop work after 3 actioins - PLEASE WHERE IS THE PROBLEM? rondon442 0 1,561 Sep-27-2021, 05:40 PM
Last Post: rondon442
  Problem executing a script on a remote host tester_V 3 2,459 Sep-26-2021, 04:25 AM
Last Post: tester_V
  problem with sphinx and file directory in script kiyoshi7 0 2,288 Mar-11-2021, 03:52 PM
Last Post: kiyoshi7
  problem about slope in python script for bitcoin trading fisher_garry 1 2,504 Sep-02-2020, 01:39 PM
Last Post: fisher_garry
  Problem running script within console koepjo 3 9,906 Mar-26-2020, 07:11 AM
Last Post: koepjo
  Beginner problem in python script Cedmo 3 2,775 Jul-04-2019, 08:22 PM
Last Post: Cedmo
  Problem with croniter in python script Lafayette 0 2,373 Mar-27-2019, 09:02 AM
Last Post: Lafayette
  Pass variable script return twice output problem Faruk 8 4,390 Dec-26-2018, 11:57 AM
Last Post: Faruk
  Newbie Ubuntu script problem Kloontor 6 3,536 Sep-24-2018, 03:51 PM
Last Post: Kloontor
  PyPDF2 script problem mepyyeti 7 5,101 Mar-13-2018, 11:52 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020