Sep-24-2018, 06:02 PM
hi,
I have some html links and i want to find some particular text and it's next text also. I am using regex but receiving lost of empty lists.
These are links:
https://www.99acres.com/mailers/mmm_html...7-558.html https://www.99acres.com/mailers/mmm_html...-2016.html https://www.99acres.com/mailers/mmm_html...7-553.html
text i am finding Area Range: Next Text also Possession: next text also for example possession 2019 Price: next text also
below are my codes:
I have some html links and i want to find some particular text and it's next text also. I am using regex but receiving lost of empty lists.
These are links:
https://www.99acres.com/mailers/mmm_html...7-558.html https://www.99acres.com/mailers/mmm_html...-2016.html https://www.99acres.com/mailers/mmm_html...7-553.html
text i am finding Area Range: Next Text also Possession: next text also for example possession 2019 Price: next text also
below are my codes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
import requests from bs4 import BeautifulSoup import csv import json import itertools import re file = {} final_data = [] final = [] textdata = [] def readfile(alldata, filename): with open ( "./" + filename, "w" ) as csvfile: csvfile = csv.writer(csvfile, delimiter = "," ) for i in range ( 0 , len (alldata)): csvfile.writerow(alldata[i]) def parsedata(url, values): r = requests.get(url, values) data = r.text return data def getresults(): global final_data, file with open ( "Mailers.csv" , "r" ) as f: reader = csv.reader(f) next (reader) for row in reader: ids = row[ 0 ] link = row[ 1 ] html = parsedata(link, {}) soup = BeautifulSoup(html, "html.parser" ) titles = soup.title.text td = soup.find_all( "td" ) for i in td: sublist = [] data = i.text pattern = r '(Possession:)(.)(.+)' x1 = re.findall(pattern, data) sublist.append(x1) sublist.append(link) final_data.append(sublist) print (final_data) return final_data def main(): getresults() readfile(final_data, "Data.csv" ) main() |