Posts: 29
Threads: 3
Joined: Aug 2017
Hello
I have a simple list like below. Now I need to convert that list into unique technology list. For e.g in VMWare_Technology we have both yellow and red. Only red and yellow possible. Red has more priority than yellow.
>>> list = ['Cert_Technology:red', 'Cluster_Technology:red', 'VMware_Technology:red', 'Lync_Technology:red', 'VMware_Technology:yellow', 'AD_DNS_Technology:yellow'] Expected output - now only catch is any of the technology can get duplicate color. There can be 'n' number of technology with only two possible yellow / red combination
>>> list
['Cert_Technology:red', 'Cluster_Technology:red', 'VMware_Technology:red', 'Lync_Technology:red', 'AD_DNS_Technology:yellow']
>>> I know it is not complex but I don't know how to go about it
Posts: 4,220
Threads: 97
Joined: Sep 2016
I would make it a list of lists, splitting each item at the colon. Then feed that into a dictionary, which will eliminate duplicate technologies. Then use dict.items to get the tuples back, and join them with a colon. If order is important use OrderedDict. If priorities are important, sort it in reverse priority before feeding it into the dict.
Posts: 29
Threads: 3
Joined: Aug 2017
Aug-11-2017, 02:47 AM
(This post was last modified: Aug-11-2017, 02:47 AM by radioactive9.)
Well we need to keep in mind priority of colors.
If a technology : color is having both yellow and red we need to drop technology:yellow and keep technology:red
Ok here is the full scenario. I have the following html file
Quote:<html><body>
<h1>Cluster_Technology:Cluster Group:red</h1>,
<h1>Cluster_Technology:red</h1>,
<h1>Cluster_Technology:red</h1>,
<h1>Cluster_Technology:<font color="red">Change in cluster state!</font></h1>,
<h1>Cluster_Technology:Cluster Group:red</h1>,
<h1>Cluster_Technology:Cluster Group:yellow</h1>,
<h1>Cert_Technology:ClusterReport:red</h1>,
<h1>VMware_Technology: Snapshots (Over 3 Days Old) : 9:red</h1>,
<h1>VMware_Technology: VMs Removed (Last 5 Day(s)) : 1:red</h1>,
<h1>VMware_Technology: VM Tools Issues: 2:red</h1>,
<h1>VMware_Technology: VM(s) Alarm(s): 1:red</h1>,
<h1>VMware_Technology: BusSharingMode - Physical and Virtual: 2:red</h1>,
<h1>VMware_Technology: s/vMotion Information (Over 5 Days Old) : 3:red</h1>,
<h1>VMware_Technology: Hardware status warnings/errors:red</h1>,
<h1>VMware_Technology: Datastores (Less than 20% Free) : 1:red</h1>,
<h1>VMware_Technology: VM Tools Issues: 3:red</h1>,
<h1>VMware_Technology: BusSharingMode - Physical and Virtual: 19:red</h1>,
<h1>VMware_Technology: Hardware status warnings/errors:red</h1>,
<h1>VMware_Technology: Snapshots (Over 3 Days Old) : 3:red</h1>,
<h1>VMware_Technology: VM Tools Issues: 3:red</h1>,
<h1>VMware_Technology: VM(s) Alarm(s): 1:red</h1>,
<h1>VMware_Technology: VMs needing snapshot consolidation 1:red</h1>,
<h1>VMware_Technology: BusSharingMode - Physical and Virtual: 4:red</h1>,
<h1>VMware_Technology: s/vMotion Information (Over 5 Days Old) : 7:yellow</h1>,
<h1>VMware_Technology: Hardware status warnings/errors:red</h1>,
<h1>VMware_Technology: Datastores (Less than 20% Free) : 2:red</h1>,
<h1>VMware_Technology: Snapshots (Over 3 Days Old) : 9:red</h1>,
<h1>VMware_Technology: VMs Ballooning or Swapping : 10:red</h1>,
<h1>VMware_Technology: VM Tools Issues: 1:red</h1>,
<h1>VMware_Technology: VMs needing snapshot consolidation 1:red</h1>,
<h1>VMware_Technology: BusSharingMode - Physical and Virtual: 25:yellow</h1>
</body></html>
From here I need to find out the unique Technology and color. Red get priority over yellow
What I have reached so far is below
appPath = 'D:\\Backup\\Drive_D\\W0rk\\Script\\Python\\HTMLTOCSV_Python\\Try\\'
cols=[]
newcols=[]
from bs4 import BeautifulSoup
import os
soup = BeautifulSoup(open(appPath+'Cust\\consolidated_report_201708100600.html','rb'), 'lxml')
h1 = soup.find_all("h1")
for col in h1:
cols.append(col.get_text())
for item in cols:
technology = item.split(':')[0]
color = item.split(':')[-1]
if color in ["red","yellow"]:
newcols.append(technology+":"+color)
newcols = list(set(newcols))
print(newcols) Now I have got unique combination of technology:color. I don't know how to make red priority over yellow when there is a combination of both X_technology:yellow and X_technology:red
Posts: 4,220
Threads: 97
Joined: Sep 2016
Like I said, feed it into a dict. If you do:
tech = dict([('spam', 'yellow'), ('spam', 'red'), ('eggs', 'red')]) The 'spam' key in the tech dict will be 'red'. That's because the later 'spam': 'red' overwrites the earlier 'spam': 'yellow'. So you just need to make sure your list is sorted correctly, and that would just be in reverse order. To get the list to feed into the dict, just append (technology, color) to newcols instead of technology+":"+color. Combine it back into a string when you read it back out of dict.items().
Posts: 29
Threads: 3
Joined: Aug 2017
Got you. Thanks Do you think we can improve the code little better. I do not want list comprehension as it is confusing :)
appPath = 'D:\\Backup\\Drive_D\\W0rk\\Script\\Python\\HTMLTOCSV_Python\\Try\\'
cols=[]
newcols=[]
from bs4 import BeautifulSoup
import os
soup = BeautifulSoup(open(appPath+'Cust\\consolidated_report_201708100600Copy.html','rb'), 'lxml')
h1 = soup.find_all("h1")
for col in h1:
cols.append(col.get_text())
for item in cols:
technology = item.split(':')[0]
color = item.split(':')[-1]
if color in ["red","yellow"]:
newcols.append((technology,color))
newcols.sort(reverse=True)
newcols = dict(newcols)
print(newcols)
Posts: 4,220
Threads: 97
Joined: Sep 2016
You could use tuple assignment to clean up the split:
technology, color = item.split(':') Also, you only need to sort newcols once, after the loop, so give it the same indentation as the dict cols.
|