Problem with this script

MaxwellCosta · Jun-20-2019, 09:53 AM

Hi,

I have this scipt

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://en.wikipedia.org/wiki/Python')
bs = BeautifulSoup(html, "html.parser")
titles = bs.find_all(['title', 'h1', 'h2','h3','h4','h5','h6','p','img','alt'])
print('List all the header tags :', *titles, sep='\n\n')

This script open One url but I want to open 2 or 3 ou 4 Url

or open all url from 100 result of google request.

How do I make changes ?

Thanks
Max

ThomasL · Jun-20-2019, 10:31 AM

urls = ['url1', 'url2', 'url3', 'etc']
for url in urls:
    html = urlopen(url)
    #your code ...

MaxwellCosta · (This post was last modified: Jun-20-2019, 12:52 PM by MaxwellCosta.)

I write

from urllib.request import urlopen
from bs4 import BeautifulSoup
urls = ['https://en.wikipedia.org/wiki/Python', 'https://en.wikipedia.org/wiki/JavaScript']
for url in urls:
    html = urlopen(url)
bs = BeautifulSoup(html, "html.parser")
titles = bs.find_all(['title', 'h1', 'h2','h3','h4','h5','h6','p','img','alt'])
print('List all the header tags :', *titles, sep='\n\n')

But this error is generated

TypeError: POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str.

How do I write the code ?

Thanks for your help

***snippsat*** · Jun-20-2019, 01:17 PM

Like this.

from urllib.request import urlopen
from bs4 import BeautifulSoup

urls = ['https://en.wikipedia.org/wiki/Python', 'https://en.wikipedia.org/wiki/JavaScript']
for url in urls:
    bs = BeautifulSoup(urlopen(url), "html.parser")
    titles = bs.find_all(['title', 'h1', 'h2','h3','h4','h5','h6','p','img','alt'])
    print('List all the header tags :', *titles, sep='\n\n')

The output can can get messy soon or already,maybe you have a plan for this output.

A advice is to use Requests and lxml(faster) as parser.

import requests
from bs4 import BeautifulSoup

urls = ['https://en.wikipedia.org/wiki/Python', 'https://en.wikipedia.org/wiki/JavaScript']
for url in urls:
    bs = BeautifulSoup(requests.get(url).content, "lxml")
    titles = bs.find_all(['title', 'h1', 'h2','h3','h4','h5','h6','p','img','alt'])
    print('List all the header tags :', *titles, sep='\n\n')

MaxwellCosta · (This post was last modified: Jun-20-2019, 01:54 PM by MaxwellCosta.)

Thanks for your Help

I will share the plan with you later or through another topic as I have to leave. Thank you very much for your help

MaxwellCosta · Jun-21-2019, 07:13 AM

Hi,
Excuse me, but if I may, I’ll come back to the subject.
For the script to work, I need to add the urls one by one.
(If I have 50 url, I must add 50 url)
for exemple:

urls = ['https://fr.wikipedia.org/wiki/JavaScript', 'https://fr.wikipedia.org/wiki/Python_(langage)']

Could we make the script work but with the google request
www.google.fr/search?q=wikipedia+javascript+python
Thanks you for your help
Maxwell

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Script problem - Illegal access to removed OSM object	MarcPolo72	0	932	Jun-23-2024, 04:26 PM Last Post: MarcPolo72
	Script stop work after 3 actioins - PLEASE WHERE IS THE PROBLEM?	rondon442	0	1,995	Sep-27-2021, 05:40 PM Last Post: rondon442
	Problem executing a script on a remote host	tester_V	3	3,808	Sep-26-2021, 04:25 AM Last Post: tester_V
	problem with sphinx and file directory in script	kiyoshi7	0	2,904	Mar-11-2021, 03:52 PM Last Post: kiyoshi7
	problem about slope in python script for bitcoin trading	fisher_garry	1	3,288	Sep-02-2020, 01:39 PM Last Post: fisher_garry
	Problem running script within console	koepjo	3	11,063	Mar-26-2020, 07:11 AM Last Post: koepjo
	Beginner problem in python script	Cedmo	3	3,537	Jul-04-2019, 08:22 PM Last Post: Cedmo
	Problem with croniter in python script	Lafayette	0	3,011	Mar-27-2019, 09:02 AM Last Post: Lafayette
	Pass variable script return twice output problem	Faruk	8	6,064	Dec-26-2018, 11:57 AM Last Post: Faruk
	Newbie Ubuntu script problem	Kloontor	6	4,710	Sep-24-2018, 03:51 PM Last Post: Kloontor

Problem with this script

User Panel Messages

Announcements