Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
The "FindAll" Error
#5
(Apr-11-2020, 12:36 AM)BadWhite Wrote: but have you tried to run the code?
Yes.
import requests
from bs4 import BeautifulSoup
#from Data import row

# Collect and parse first page
headers = {'User-agent': 'Mozilla/5.0'}
page = requests.get('https://web.archive.org/web/20121007172955/https://www.nga.gov/collection/anZ1.htm', headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')

# Pull all text from the BodyText div
artist_name_list = soup.find(class_='BodyText')

# Pull text from all instances of <a> tag within BodyText div
artist_name_list_items = artist_name_list.find_all('a')

# Create for loop to print out all artists' names
for artist_name in artist_name_list_items:
    print(artist_name.text)
Output:
Zabaglia, Niccola Zaccone, Fabian Zadkine, Ossip Zaech, Bernhard Zagar, Jacob Zagroba, Idalia Zaidenberg, A. Zaidenberg, Arthur Zaisinger, Matthäus Zajac, Jack Zak, Eugène Zakharov, Gurii Fillipovich Zakowortny, Igor Zalce, Alfredo Zalopany, Michele Zammiello, Craig Zammitt, Norman Zampieri, Domenico Zampieri, called Domenichino, Domenico Zanartú, Enrique Antunez Zanchi, Antonio Zanetti, Anton Maria Zanetti Borzino, Leopoldina Zanetti I, Antonio Maria, conte Zanguidi, Jacopo Zanini, Giuseppe Zanini-Viola, Giuseppe Zanotti, Giampietro Zao Wou-Ki Zas-Zie Zie-Zor nextpage
BadWhite Wrote:why you have added "headers" variable?
That was what i explain first,the site return 455 The request was rejected without user agent.
import requests
from bs4 import BeautifulSoup
#from Data import row

# Collect and parse first page
page = requests.get('https://web.archive.org/web/20121007172955/https://www.nga.gov/collection/anZ1.htm')
print(page.status_code)
Output:
445
So when get this no more scraping is possible,using a user agent we identify as browser in this case Firefox.
The get 200 OK and can continue to scrape.

The problem most be something on your side here a run in a other environment colab.
As you see it work fine there to.
Reply


Messages In This Thread
The "FindAll" Error - by BadWhite - Apr-10-2020, 09:40 PM
RE: The "FindAll" Error - by stullis - Apr-10-2020, 10:18 PM
RE: The "FindAll" Error - by snippsat - Apr-10-2020, 11:40 PM
RE: The "FindAll" Error - by BadWhite - Apr-11-2020, 12:36 AM
RE: The "FindAll" Error - by snippsat - Apr-11-2020, 08:07 AM
RE: The "FindAll" Error - by BadWhite - Apr-11-2020, 05:09 PM
RE: The "FindAll" Error - by snippsat - Apr-11-2020, 05:59 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  regex findall() returning weird result Radical 1 755 Oct-15-2023, 08:47 PM
Last Post: snippsat
  Python: re.findall to find multiple instances don't work but search worked Secret 1 1,308 Aug-30-2022, 08:40 PM
Last Post: deanhystad
  regex.findall that won't match anything xiaobai97 1 2,116 Sep-24-2020, 02:02 PM
Last Post: DeaD_EyE
  Regex findall() NewBeie 2 4,417 Jul-10-2020, 12:19 PM
Last Post: DeaD_EyE
  re.findall HELP!!! only returns None Rusty 10 7,369 Jun-20-2020, 12:13 AM
Last Post: Rusty
  Beginner question: lxml's findall in an xml namespace aecklers 0 3,009 Jan-22-2020, 10:53 AM
Last Post: aecklers
  Issue with re.findall alinaveed786 8 5,085 Oct-20-2018, 09:28 AM
Last Post: volcano63
  [Regex] Findall returns wrong number of hits Winfried 8 6,034 Aug-23-2018, 02:21 PM
Last Post: Winfried
  Combining the regex into single findall syoung 0 2,585 May-28-2018, 10:11 AM
Last Post: syoung
  unable to print the list when using re.findall() satyaneel 5 4,281 Sep-27-2017, 10:26 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020