May-22-2021, 08:14 AM
dear Python-experts
first of all: i hope that you are all right and everything goes well at your site.
I am currently attempting to gather some data on the cost of Failed Bank List: This list includes banks which have failed since October 1, 2000.
getting it from the website below: https://www.fdic.gov/resources/resolutio...bank-list/
With this approach i extract links from the basic-website into an nice little array. Besides that - i want to open all the links and gather a little piece of information form the (subsequent linked ) sub-page.
my Approach: for the sake of repeatingly extract links out of the targetpage i use the function below:
Setup: i run Anaconda on Win 10 with Python 3.8.5 and BS4 (version 4.8.2)
that contains the following pages that hold information about the towns with inhabitants:
see the dataset:
note: what is aimed to gather the data out of the sub-pages:
therefore i need a parser that loops through the subpages - eg like the following:
https://www.fdic.gov/resources/resolutio...state.html
https://www.fdic.gov/resources/resolutio...ybank.html
https://www.fdic.gov/resources/resolutio...sb-wv.html
and so forth.
btw. currently i am getting back the error:
although i run BeautifulSoup4 Version 4.8.2
after fixing this issue i want to get all the infos out of the Failed-Bank-List:
cf: https://www.fdic.gov/resources/resolutio...ybank.html
this is the tag:
After having fixed this i will have a closer look how to get the combination of
a. gathering the links on the first page and
b. collecting the piece of data that is on the second page ...
first of all: i hope that you are all right and everything goes well at your site.
I am currently attempting to gather some data on the cost of Failed Bank List: This list includes banks which have failed since October 1, 2000.
getting it from the website below: https://www.fdic.gov/resources/resolutio...bank-list/
With this approach i extract links from the basic-website into an nice little array. Besides that - i want to open all the links and gather a little piece of information form the (subsequent linked ) sub-page.
my Approach: for the sake of repeatingly extract links out of the targetpage i use the function below:
Setup: i run Anaconda on Win 10 with Python 3.8.5 and BS4 (version 4.8.2)
from bs4 import BeautifulSoup import requests import re def getLinks(url): r = requests.get("https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/") soup = BeautifulSoup(r.content) links = [] for link in soup.findAll('a', attrs={'href': re.compile("^http://")}): links.append(link.get('href')) ##It will scrape all the a tags, and for each a tags, it will append the href attribute to the links list. return links print( getLinks("https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/") )dataset: html_page = ("https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/")
that contains the following pages that hold information about the towns with inhabitants:
see the dataset:
Quote:Almena State Bank Almena KS 15426 Equity Bank October 23, 2020 10538
First City Bank of Florida Fort Walton Beach FL 16748 United Fidelity Bank, fsb October 16, 2020 10537
The First State Bank Barboursville WV 14361 MVB Bank, Inc. April 3, 2020 10536
Ericson State Bank Ericson NE 18265 Farmers and Merchants Bank February 14, 2020 10535
City National Bank of New Jersey Newark NJ 21111 Industrial Bank November 1, 2019 10534
Resolute Bank Maumee OH 58317 Buckeye State Bank October 25, 2019 10533
Louisa Community Bank Louisa KY 58112 Kentucky Farmers Bank Corporation October 25, 2019 10532
The Enloe State Bank Cooper TX 10716 Legend Bank, N. A. May 31, 2019 10531
note: what is aimed to gather the data out of the sub-pages:
therefore i need a parser that loops through the subpages - eg like the following:
https://www.fdic.gov/resources/resolutio...state.html
https://www.fdic.gov/resources/resolutio...ybank.html
https://www.fdic.gov/resources/resolutio...sb-wv.html
and so forth.
btw. currently i am getting back the error:
ModuleNotFoundError: No module named 'BeautifulSoup'
although i run BeautifulSoup4 Version 4.8.2
after fixing this issue i want to get all the infos out of the Failed-Bank-List:
cf: https://www.fdic.gov/resources/resolutio...ybank.html
Quote:Failed Bank Information for First City Bank of Florida, Fort Walton Beach, FL
On Friday, October 16, 2020, First City Bank of Florida was closed by the Florida Office of Financial Regulation. The FDIC was named Receiver. No advance notice is given to the public when a financial institution is closed. United Fidelity Bank, fsb, Evansville, IN acquired all deposit accounts and substantially all the assets. All shares of stock were owned by the holding company, which was not involved in this transaction.
this is the tag:
<div class="usa-layout desktop:grid-col-12"> <p class="fbankcategory">Failed Bank List</p> <!-- don't touch --> <!--Failed Bank Title--> <h1 class="fbanktitle">Failed Bank Information for First City Bank of Florida, Fort Walton Beach, FL</h1> <!-- update --> <div class="fbankgrayborder"></div> <!-- don't touch --> <p class="fbankdescription"><!-- update --> On Friday, October 16, 2020, First City Bank of Florida was closed by the Florida Office of Financial Regulation. The FDIC was named Receiver. No advance notice is given to the public when a financial institution is closed. United Fidelity Bank, fsb, Evansville, IN acquired all deposit accounts and substantially all the assets. All shares of stock were owned by the holding company, which was not involved in this transaction.</p> </div>at the moment i need to fix the first issue that i have with the getting back the error:
ModuleNotFoundError: No module named 'BeautifulSoup'although i run BeautifulSoup4 Version 4.8.2
After having fixed this i will have a closer look how to get the combination of
a. gathering the links on the first page and
b. collecting the piece of data that is on the second page ...