I want to be able to extract data from multiple pages. The pages are in the following format:
I have created code so far that exports into a results into a csv file. However this only works for 1 url:
How would I get these working so that it keeps going through all the numbers of urls?
I could create a textfile with the possible links but still not sure what to do to get this to work
I'm new to python
Output:https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=1&sort_order=price_asc
https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=2&sort_order=price_asc
https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=3&sort_order=price_asc
In these links the only thing that changes in the url is the number following page=I have created code so far that exports into a results into a csv file. However this only works for 1 url:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup # opening up connection, grabbing the page uClient = uReq(my_url) page_html = uClient.read() uClient.close() # html parser page_soup = soup(page_html, "html.parser" ) # grabs each property listings = page_soup.findAll( "div" ,{ "class" : "tmp-search-card-list-view__card-content" }) filename = "trademe.csv" f = open (filename, "w" ) headers = "title, price, area\n" f.write(headers) for listing in listings: title_listing = listing.findAll( "div" , { "class" : "tmp-search-card-list-view__title" }) price_listing = listing.findAll( "div" , { "class" : "tmp-search-card-list-view__price" }) area_listing = listing.findAll( "div" , { "class" : "tmp-search-card-list-view__subtitle" }) title = title_listing[ 0 ].text.strip() price = price_listing[ 0 ].text.strip() area = area_listing[ 0 ].text.strip() print ( "title: " + title) print ( "price: " + price) print ( "area: " + area) f.write(title.replace( "," , "^" ) + "," + price.replace( "," , " ") + " , " + area.replace(" , ", " ^ ") + " \n") f.close() |
I could create a textfile with the possible links but still not sure what to do to get this to work
I'm new to python