Simple List manipulation - bowled - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Simple List manipulation - bowled (/thread-4363.html) |
Simple List manipulation - bowled - radioactive9 - Aug-11-2017 Hello I have a simple list like below. Now I need to convert that list into unique technology list. For e.g in VMWare_Technology we have both yellow and red. Only red and yellow possible. Red has more priority than yellow. >>> list = ['Cert_Technology:red', 'Cluster_Technology:red', 'VMware_Technology:red', 'Lync_Technology:red', 'VMware_Technology:yellow', 'AD_DNS_Technology:yellow']Expected output - now only catch is any of the technology can get duplicate color. There can be 'n' number of technology with only two possible yellow / red combination >>> list ['Cert_Technology:red', 'Cluster_Technology:red', 'VMware_Technology:red', 'Lync_Technology:red', 'AD_DNS_Technology:yellow'] >>>I know it is not complex but I don't know how to go about it RE: Simple List manipulation - bowled - ichabod801 - Aug-11-2017 I would make it a list of lists, splitting each item at the colon. Then feed that into a dictionary, which will eliminate duplicate technologies. Then use dict.items to get the tuples back, and join them with a colon. If order is important use OrderedDict. If priorities are important, sort it in reverse priority before feeding it into the dict. RE: Simple List manipulation - bowled - radioactive9 - Aug-11-2017 Well we need to keep in mind priority of colors. If a technology : color is having both yellow and red we need to drop technology:yellow and keep technology:red Ok here is the full scenario. I have the following html file Quote:<html><body> From here I need to find out the unique Technology and color. Red get priority over yellow What I have reached so far is below appPath = 'D:\\Backup\\Drive_D\\W0rk\\Script\\Python\\HTMLTOCSV_Python\\Try\\' cols=[] newcols=[] from bs4 import BeautifulSoup import os soup = BeautifulSoup(open(appPath+'Cust\\consolidated_report_201708100600.html','rb'), 'lxml') h1 = soup.find_all("h1") for col in h1: cols.append(col.get_text()) for item in cols: technology = item.split(':')[0] color = item.split(':')[-1] if color in ["red","yellow"]: newcols.append(technology+":"+color) newcols = list(set(newcols)) print(newcols)Now I have got unique combination of technology:color. I don't know how to make red priority over yellow when there is a combination of both X_technology:yellow and X_technology:red RE: Simple List manipulation - bowled - ichabod801 - Aug-11-2017 Like I said, feed it into a dict. If you do: tech = dict([('spam', 'yellow'), ('spam', 'red'), ('eggs', 'red')])The 'spam' key in the tech dict will be 'red'. That's because the later 'spam': 'red' overwrites the earlier 'spam': 'yellow'. So you just need to make sure your list is sorted correctly, and that would just be in reverse order. To get the list to feed into the dict, just append (technology, color) to newcols instead of technology+":"+color. Combine it back into a string when you read it back out of dict.items(). RE: Simple List manipulation - bowled - radioactive9 - Aug-11-2017 Got you. Thanks Do you think we can improve the code little better. I do not want list comprehension as it is confusing :) appPath = 'D:\\Backup\\Drive_D\\W0rk\\Script\\Python\\HTMLTOCSV_Python\\Try\\' cols=[] newcols=[] from bs4 import BeautifulSoup import os soup = BeautifulSoup(open(appPath+'Cust\\consolidated_report_201708100600Copy.html','rb'), 'lxml') h1 = soup.find_all("h1") for col in h1: cols.append(col.get_text()) for item in cols: technology = item.split(':')[0] color = item.split(':')[-1] if color in ["red","yellow"]: newcols.append((technology,color)) newcols.sort(reverse=True) newcols = dict(newcols) print(newcols) RE: Simple List manipulation - bowled - ichabod801 - Aug-11-2017 You could use tuple assignment to clean up the split: technology, color = item.split(':')Also, you only need to sort newcols once, after the loop, so give it the same indentation as the dict cols. |