Python Forum
Obtaining values from columns
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Obtaining values from columns
#1
I want to obtain the top 3 cities and items based on their sales, but the only thing I can do now is return the all cities and items with their respective sales. Without using dict, how do I obtain my desired output? Or if I use dict, how do I obtain the desired output?

purchases.txt
    2012-01-01	09:00	San Jose	Men's Clothing	214.05	Amex
    2012-01-01	09:00	Fort Worth	Women's Clothing	153.57	Visa
    2012-01-01	09:00	San Diego	Music	66.08	Cash
    2012-01-01	09:00	Pittsburgh	Pet Supplies	493.51	Discover
    2012-01-01	09:00	Omaha	Children's Clothing	235.63	MasterCard
    2012-01-01	09:00	Stockton	Men's Clothing	247.18	MasterCard
    2012-01-01	09:00	Austin	Cameras	379.6	Visa
    2012-01-01	09:00	New York	Consumer Electronics	296.8	Cash
    2012-01-01	09:00	Corpus Christi	Toys	25.38	Discover
    2012-01-01	09:00	Fort Worth	Toys	213.88	Visa
test.py
        f = open ("purchases.txt")
            
        def separator():
            str = ("="*48)
            print (str)
            return;
        
        city_seen = set()
        item_seen = set()
        
        citysaleslist = []
        itemsaleslist= []
        
        for line in open(sys.argv[1]):
        
            sales=float(line.split()[-2]) 
              
            strsales=line.split()[-2]
        
            city=line.split('\t')[2] 
            item=line.split('\t')[3]
        
            if city not in city_seen: # if city is not a duplicate, add to city_seen set
        
                city_seen.add(city)
    
            #Pressing tab for the bottom 2 lines will remove duplicate but combining the sales for the duplicates is impossible here.
            citysales="{0:<29}{1:>18}".format(city,strsales)
            citysaleslist.append(citysales)
        
       
            if item not in item_seen: # if item is not a duplicate, add to item_seen set
        
                 item_seen.add(item)
     
            #Pressing tab for the bottom 2 lines will remove duplicate but combining the sales for the duplicates is impossible here.
            itemsales = "{0:<29}{1:>18}".format(item,strsales)
            itemsaleslist.append(itemsales)
    
    
         print("Top Three Cities \n")
         separator()
              
         for i in citysaleslist:
              
             print(i)
        
        
         separator()
        
        
         print("Bottom Three Cities \n")
         separator()
              
        
         separator()
        
              
         print("Top Three Item Categories")
         separator()
           
         for i in itemsaleslist:
              
         print(i)
        
         separator()
        
        
         print("\nBottom Three Item Categories")
         separator()
           
        
         separator()      
 
My output:
    Top Three Cities 
    
    ================================================
    San Jose                                 214.05
    Fort Worth                               153.57
    San Diego                                 66.08
    Pittsburgh                               493.51
    Omaha                                    235.63
    Stockton                                 247.18
    Austin                                    379.6
    New York                                  296.8
    Corpus Christi                            25.38
    Fort Worth                               213.88
    ================================================
    Bottom Three Cities 
    
    ================================================
    ================================================

   
    Top Three Item Categories
    ================================================
    Men's Clothing                           214.05
    Women's Clothing                         153.57
    Music                                     66.08
    Pet Supplies                             493.51
    Children's Clothing                      235.63
    Men's Clothing                           247.18
    Cameras                                   379.6
    Consumer Electronics                      296.8
    Toys                                      25.38
    Toys                                     213.88
    ================================================
    
    Bottom Three Item Categories
    ================================================
    ================================================

    
Desired output:
    Top Three Cities 
    
    ================================================
    Pittsburgh                               493.51
    Austin                                   379.60
    Fort Worth                               367.45
    ================================================
    Bottom Three Cities 
    
    ================================================
    Omaha                                     235.63
    San Jose                                  214.05
    San Diego                                  66.08
    ================================================


    Top Three Item Categories
    ================================================
    Pet Supplies                             493.51
    Men's Clothing                           461.23
    Cameras                                   379.6
    ================================================
    
    Bottom Three Item Categories
    ================================================
    Toys                                      239.26
    Children's Clothing                       235.63
    Women's Clothing                          153.57
    ================================================
    
Reply
#2
city_seen and item_seen variables are sets, so they keep track of duplicates by default, and you don't need the if statments (e.g. if item not in item_seen).
You open the file (f = open...) but not close it when you are done with using it. I recommend using a context manager, which will close the file automatically when you are done using it. Here is a simple example. You open it again in for loop, I assume (open(sys.argv[1])).

It is probably beyond your current course curriculum, but you could put cities and values in a list of tuples, and use a lambda function to sort them by sales:

sorted([('city1', 95.5),('city2', 26.7),('city3', 248.1), ('city4', 30.9)], key=lambda x: x[1], reverse=True)
I provided the list of tuples here directly, but it will not be hard to write code that will do it for you.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Need help getting unique values across two columns of a dataframe a_real_phoenix 1 1,811 Jun-30-2019, 01:46 AM
Last Post: scidam

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020