Python Forum
Dealing with duplicates to an Excel sheet
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Dealing with duplicates to an Excel sheet
#1
Alright, so, first time ever taking a programming course and I am just struggling with wrapping my head around Python. The following is Python 2.7.

import urllib
import win32com.client

def parse(searchstring,leader, trailer): # function to parse information between a leading string and a trailing string
        try:
            end_of_leader = searchstring.index(leader) + len(leader)
            start_of_trailer = searchstring.index(trailer, end_of_leader)
            return searchstring[end_of_leader:start_of_trailer]
        except ValueError:          
            return ""

def returnpages(s): # function to return the number of pages 
        try:
            returnpages = parse(s,'yp_searchTotalPages" : "','",')
            return int(returnpages)
        except:
            return 0



# Download the business entries 
page = 1 
fileobj = urllib.urlopen("http://canada411.yellowpages.ca/search/si/1/Cineplex/Canada") 

s = fileobj.read()              # Read the first page - storing the page's contents in a string called 's'.
resultlist = list()             # List that will be used to store our results

pages = returnpages(s)

for page in xrange(1,6):

    fileobj = urllib.urlopen("https://www.yellowpages.ca/search/si/"  + str(page) + "/Cineplex/Canada") 
    s = fileobj.read()
    
    addresssplit = s.split('<div class="listing__address address mainLocal noNum">') # split the page up into a list based on this delimiter
    
    for item in addresssplit:
        address = parse(item,'<span class="jsMapBubbleAddress"  itemprop="streetAddress" >','</span>') # extract address
        city = parse(item,'<span class="jsMapBubbleAddress"  itemprop="addressLocality" >','</span>')
        province = parse(item,'<span class="jsMapBubbleAddress"  itemprop="addressRegion" >','</span>')
        postal = parse(item,'<span class="jsMapBubbleAddress"  itemprop="postalCode" >','</span>')
        phone = parse(item,'data-phone="','">')
        
        if len(address) > 0:
            resultlist.append({'address':address,'city':city, 'province':province,'postal':postal})
            print address,city,province,postal,phone # take this out when you're finished your script - this is just for testing

    # Switch to pages 2,3,4,5 and so on based on variable page
    fileobj = urllib.urlopen("https://www.yellowpages.ca/search/si/" + str(page) + "/Cineplex/Canada")
    s = fileobj.read()              # Read the next page - storing the page's contents in a string called 's'.


row = 1



excelobj = win32com.client.Dispatch("Excel.Application")
excelobj.Visible = 1                                                            # make excel visible
excelobj.Workbooks.Add()                                                        # add a workbook
sh = excelobj.ActiveSheet                                                       # sh variable holds the active sheet
sh.Cells(1,1).Value = 'Address'                                           # excel headers
sh.Cells(1,2).Value = 'City'                                                 # excel headers
sh.Cells(1,3).Value = 'Province'                                                   # excel headers

for business in resultlist:                                                     # go through each dictionary item in the list resultlist
    sh.Cells(row,1).Value = business["address"]                            # put address from the dictionary in the 1st column of excel
    sh.Cells(row,2).Value = business["city"]                                 # put city from the dictionary in the 2nd column of excel    
    sh.Cells(row,3).Value = business["province"]                                   # put phone number from the dictionary in the 3rd column of excel   
    row +=1                                                                     # after all the information is input, switch to the next row

print str(len(resultlist)) + ' rows written to excel.'
When I run the program, it looks up all the addresses and puts them where they need to be in the Excel file, but I don't know how to deal with duplicates. I tried googling around and using different codes people suggested to others, but nothing works, so clearly I just do not know where to even begin.

So, for example, when the program takes the address and puts them into an Excel sheet two of the addresses are "Main-1303 Yonge St". I'm trying to figure out how to get the code to remove a duplication like this.

I also do not know how to replace accented characters with their closest equivalent, so that French does not appear weird in the list. Bit of a frustrated student here that isn't understanding how to even grasp the basics.
Reply


Messages In This Thread
Dealing with duplicates to an Excel sheet - by DistraughtMuffin - Oct-19-2020, 07:40 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  help with adding duplicates elements together in a list 2ECC3O 5 2,136 Sep-10-2022, 07:11 AM
Last Post: 2ECC3O
  Dealing with list coja56 2 2,886 Sep-18-2020, 09:32 PM
Last Post: perfringo
  App for excel sheet analysis awookado 6 2,900 Jan-27-2020, 09:02 AM
Last Post: awookado
  Help me with yahoo finance excel sheet Learner4life 2 2,316 Nov-27-2019, 12:55 PM
Last Post: Learner4life
  duplicates nsx200 3 2,500 Nov-12-2019, 08:55 AM
Last Post: nsx200
  Exercise list remove duplicates RavCOder 9 5,353 Oct-23-2019, 04:16 PM
Last Post: jefsummers
  Dealing with errors in input souprqtpie 7 3,532 Apr-10-2019, 11:04 AM
Last Post: souprqtpie
  how to take out duplicates from a list? helpme 2 2,550 Mar-08-2019, 02:27 PM
Last Post: ichabod801
  How to keep duplicates and remove all other items in list? student8 1 4,997 Oct-28-2017, 05:52 AM
Last Post: heiner55

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020