Python Forum

Full Version: Unable to print data while looping through list in csv for webscraping - Python
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have a CSV file which has list of url that needed to be scraped. Website i am scraping is http://www.rera-rajasthan.in/ProjectSearch which is real estate website which has property name and a link which has property details. I was able to scrape those links into csv, now i need to loop through all the links which i extracted for further web scraping.

This website requires post method to search project. I applied same method on the extracted links too.

But when i run this code it prints nothing :
import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
import csv
import json

#links = []

links = []
reranumber = []
table_attr = {"class":"table table-bordered"}

with open("RajLinks.csv", newline= '') as f:
    reader = csv.reader(f)
    for row in reader:
        reranumber = row[0]
        link = row[1]
        links.append(link)

def getData(url):
    url = "http://www.rera-rajasthan.in/Home/GetProjectsList"
    user_agent = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0"}
    payload = {'certificateNo': '', 'PageSize': '50', 'District': '', 'v': '', 'projectName': '', 'promoterName': '', 'page': '1', 'tehsil': ''}
    r = requests.post(url, headers=user_agent, params=payload)
    data = r.text
    return data

#getdata

for sublist in links:
    htmldata = getData(link)
    soup = BeautifulSoup(htmldata, "html.parser")
    tables = soup.find_all("table", table_attr)
    for table in tables:
        txt = table.text
    if txt.find("Contact Address"):
        trs = table.find_all("tr")
        for data in trs:
            name = data[1].text
            print(name)
it should print first tr in contact address that it founds. i am extracting the links column

i am attaching the CSV. Can someone please guide?
The page strangely lacks classes and ids so no one can target specific element directly. What you could do is to find the table you want to scrape by using the above h3 tag:
table = soup.find('h3', text='CONTRACTOR').find_next_sibling('table')
Note find_next_sibling method.
Then you can get all tr tags and from second, get the desired td. Have to use indices because as I said there is no classes or id to point to.

address = table.find_all('tr')[1].find_all('td')[2].text
Finally, you get 'S-33/34, JDA Shopping Center, Amrapali Circel, Vaishali Nagar, Jaipur' form the first url in the csv