Python Forum
webscrapping links from pandas dataframe - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: webscrapping links from pandas dataframe (/thread-19224.html)



webscrapping links from pandas dataframe - Wolverin - Jun-19-2019

i have pandas dataframe which contain one of the columns as weblinks, i have to open the link and read the text from link, calculate some scores, almost 200 weblinks are there.

how should i proceed ?


RE: webscrapping links from pandas dataframe - Larz60+ - Jun-19-2019

I suggest a basic pandas tutorial as this will be covered in the first few pages.


RE: webscrapping links from pandas dataframe - Gaurav_Kumar - Aug-28-2023

You should follow these steps:-

1.> Import Libraries:
Import the necessary libraries, including pandas for DataFrame manipulation, requests for making HTTP requests, and libraries for text processing and analysis.

2.> Load DataFrame:
Load your pandas DataFrame containing the web links.

3.> Iterate Through Links:
Iterate through the rows of the DataFrame, extracting each web link.

4.> HTTP Requests and Text Extraction:
For each web link, make an HTTP request to retrieve the web page content. You can use the requests library for this. Once you have the HTML content, you can use libraries like BeautifulSoup to parse the HTML and extract the text content.

5.> Text Analysis and Scoring:
Analyze the extracted text to calculate the required scores. You might use natural language processing libraries like NLTK or spaCy for text processing and analysis.

6.> Store Results:
Store the calculated scores back in the DataFrame or in a separate data structure for further analysis or visualization.