Unable to print data while looping through list in csv for webscraping - Python

Prince_Bhatia · Oct-04-2017, 10:25 AM

I have a CSV file which has list of url that needed to be scraped. Website i am scraping is http://www.rera-rajasthan.in/ProjectSearch which is real estate website which has property name and a link which has property details. I was able to scrape those links into csv, now i need to loop through all the links which i extracted for further web scraping.

This website requires post method to search project. I applied same method on the extracted links too.

But when i run this code it prints nothing :

import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
import csv
import json

#links = []

links = []
reranumber = []
table_attr = {"class":"table table-bordered"}

with open("RajLinks.csv", newline= '') as f:
    reader = csv.reader(f)
    for row in reader:
        reranumber = row[0]
        link = row[1]
        links.append(link)

def getData(url):
    url = "http://www.rera-rajasthan.in/Home/GetProjectsList"
    user_agent = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0"}
    payload = {'certificateNo': '', 'PageSize': '50', 'District': '', 'v': '', 'projectName': '', 'promoterName': '', 'page': '1', 'tehsil': ''}
    r = requests.post(url, headers=user_agent, params=payload)
    data = r.text
    return data

#getdata

for sublist in links:
    htmldata = getData(link)
    soup = BeautifulSoup(htmldata, "html.parser")
    tables = soup.find_all("table", table_attr)
    for table in tables:
        txt = table.text
    if txt.find("Contact Address"):
        trs = table.find_all("tr")
        for data in trs:
            name = data[1].text
            print(name)

it should print first tr in contact address that it founds. i am extracting the links column

i am attaching the CSV. Can someone please guide?

wavic · Oct-04-2017, 11:18 AM

The page strangely lacks classes and ids so no one can target specific element directly. What you could do is to find the table you want to scrape by using the above h3 tag:

table = soup.find('h3', text='CONTRACTOR').find_next_sibling('table')

Note find_next_sibling method.
Then you can get all tr tags and from second, get the desired td. Have to use indices because as I said there is no classes or id to point to.

address = table.find_all('tr')[1].find_all('td')[2].text

Finally, you get 'S-33/34, JDA Shopping Center, Amrapali Circel, Vaishali Nagar, Jaipur' form the first url in the csv

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Webscraping news articles by using selenium	cate16	7	3,116	Aug-28-2023, 09:58 AM Last Post: snippsat
	Webscraping with beautifulsoup	cormanstan	3	1,956	Aug-24-2023, 11:57 AM Last Post: snippsat
	Webscraping returning empty table	Buuuwq	0	1,393	Dec-09-2022, 10:41 AM Last Post: Buuuwq
	WebScraping using Selenium library	Korgik	0	1,045	Dec-09-2022, 09:51 AM Last Post: Korgik
	Selenium innerHTML list, print specific value	denis22934	2	3,235	Jun-14-2021, 04:59 AM Last Post: denis22934
	DJANGO Looping Through Context Variable with specific data	Taz	0	1,814	Feb-18-2021, 03:52 PM Last Post: Taz
	How to get rid of numerical tokens in output (webscraping issue)?	jps2020	0	1,940	Oct-26-2020, 05:37 PM Last Post: jps2020
	Python Webscraping with a Login Website	warriordazza	0	2,601	Jun-07-2020, 07:04 AM Last Post: warriordazza
	Unable to get the data from web API using authentication key	lokamaba	0	1,973	May-15-2020, 05:07 AM Last Post: lokamaba
	Help with basic webscraping	Captain_Snuggle	2	3,930	Nov-07-2019, 08:07 PM Last Post: kozaizsvemira

Unable to print data while looping through list in csv for webscraping - Python

User Panel Messages

Announcements