Python Forum
Unable to print data while looping through list in csv for webscraping - Python
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unable to print data while looping through list in csv for webscraping - Python
#1
I have a CSV file which has list of url that needed to be scraped. Website i am scraping is http://www.rera-rajasthan.in/ProjectSearch which is real estate website which has property name and a link which has property details. I was able to scrape those links into csv, now i need to loop through all the links which i extracted for further web scraping.

This website requires post method to search project. I applied same method on the extracted links too.

But when i run this code it prints nothing :
import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
import csv
import json

#links = []

links = []
reranumber = []
table_attr = {"class":"table table-bordered"}

with open("RajLinks.csv", newline= '') as f:
    reader = csv.reader(f)
    for row in reader:
        reranumber = row[0]
        link = row[1]
        links.append(link)

def getData(url):
    url = "http://www.rera-rajasthan.in/Home/GetProjectsList"
    user_agent = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0"}
    payload = {'certificateNo': '', 'PageSize': '50', 'District': '', 'v': '', 'projectName': '', 'promoterName': '', 'page': '1', 'tehsil': ''}
    r = requests.post(url, headers=user_agent, params=payload)
    data = r.text
    return data

#getdata

for sublist in links:
    htmldata = getData(link)
    soup = BeautifulSoup(htmldata, "html.parser")
    tables = soup.find_all("table", table_attr)
    for table in tables:
        txt = table.text
    if txt.find("Contact Address"):
        trs = table.find_all("tr")
        for data in trs:
            name = data[1].text
            print(name)
it should print first tr in contact address that it founds. i am extracting the links column

i am attaching the CSV. Can someone please guide?

Attached Files

.csv   RajLinks.csv (Size: 9.17 KB / Downloads: 729)
Reply
#2
The page strangely lacks classes and ids so no one can target specific element directly. What you could do is to find the table you want to scrape by using the above h3 tag:
table = soup.find('h3', text='CONTRACTOR').find_next_sibling('table')
Note find_next_sibling method.
Then you can get all tr tags and from second, get the desired td. Have to use indices because as I said there is no classes or id to point to.

address = table.find_all('tr')[1].find_all('td')[2].text
Finally, you get 'S-33/34, JDA Shopping Center, Amrapali Circel, Vaishali Nagar, Jaipur' form the first url in the csv
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Webscraping news articles by using selenium cate16 7 3,116 Aug-28-2023, 09:58 AM
Last Post: snippsat
  Webscraping with beautifulsoup cormanstan 3 1,956 Aug-24-2023, 11:57 AM
Last Post: snippsat
  Webscraping returning empty table Buuuwq 0 1,393 Dec-09-2022, 10:41 AM
Last Post: Buuuwq
  WebScraping using Selenium library Korgik 0 1,045 Dec-09-2022, 09:51 AM
Last Post: Korgik
  Selenium innerHTML list, print specific value denis22934 2 3,235 Jun-14-2021, 04:59 AM
Last Post: denis22934
  DJANGO Looping Through Context Variable with specific data Taz 0 1,814 Feb-18-2021, 03:52 PM
Last Post: Taz
  How to get rid of numerical tokens in output (webscraping issue)? jps2020 0 1,940 Oct-26-2020, 05:37 PM
Last Post: jps2020
  Python Webscraping with a Login Website warriordazza 0 2,601 Jun-07-2020, 07:04 AM
Last Post: warriordazza
  Unable to get the data from web API using authentication key lokamaba 0 1,973 May-15-2020, 05:07 AM
Last Post: lokamaba
  Help with basic webscraping Captain_Snuggle 2 3,930 Nov-07-2019, 08:07 PM
Last Post: kozaizsvemira

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020