Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Simple List manipulation - bowled
#1
Hello

I have a simple list like below. Now I need to convert that list into unique technology list. For e.g in VMWare_Technology we have both yellow and red. Only red and yellow possible. Red has more priority than yellow.

>>> list = ['Cert_Technology:red', 'Cluster_Technology:red', 'VMware_Technology:red', 'Lync_Technology:red', 'VMware_Technology:yellow', 'AD_DNS_Technology:yellow']
Expected output - now only catch is any of the technology can get duplicate color. There can be 'n' number of technology with only two possible yellow / red combination

>>> list
['Cert_Technology:red', 'Cluster_Technology:red', 'VMware_Technology:red', 'Lync_Technology:red', 'AD_DNS_Technology:yellow']
>>>
I know it is not complex but I don't know how to go about it
Reply
#2
I would make it a list of lists, splitting each item at the colon. Then feed that into a dictionary, which will eliminate duplicate technologies. Then use dict.items to get the tuples back, and join them with a colon. If order is important use OrderedDict. If priorities are important, sort it in reverse priority before feeding it into the dict.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
Well we need to keep in mind priority of colors.

If a technology : color is having both yellow and red we need to drop technology:yellow and keep technology:red

Ok here is the full scenario. I have the following html file

Quote:<html><body>
<h1>Cluster_Technology:Cluster Group:red</h1>,
<h1>Cluster_Technology:red</h1>,
<h1>Cluster_Technology:red</h1>,
<h1>Cluster_Technology:<font color="red">Change in cluster state!</font></h1>,
<h1>Cluster_Technology:Cluster Group:red</h1>,
<h1>Cluster_Technology:Cluster Group:yellow</h1>,
<h1>Cert_Technology:ClusterReport:red</h1>,
<h1>VMware_Technology: Snapshots (Over 3 Days Old) : 9:red</h1>,
<h1>VMware_Technology: VMs Removed (Last 5 Day(s)) : 1:red</h1>,
<h1>VMware_Technology: VM Tools Issues: 2:red</h1>,
<h1>VMware_Technology: VM(s) Alarm(s): 1:red</h1>,
<h1>VMware_Technology: BusSharingMode - Physical and Virtual: 2:red</h1>,
<h1>VMware_Technology: s/vMotion Information (Over 5 Days Old) : 3:red</h1>,
<h1>VMware_Technology: Hardware status warnings/errors:red</h1>,
<h1>VMware_Technology: Datastores (Less than 20% Free) : 1:red</h1>,
<h1>VMware_Technology: VM Tools Issues: 3:red</h1>,
<h1>VMware_Technology: BusSharingMode - Physical and Virtual: 19:red</h1>,
<h1>VMware_Technology: Hardware status warnings/errors:red</h1>,
<h1>VMware_Technology: Snapshots (Over 3 Days Old) : 3:red</h1>,
<h1>VMware_Technology: VM Tools Issues: 3:red</h1>,
<h1>VMware_Technology: VM(s) Alarm(s): 1:red</h1>,
<h1>VMware_Technology: VMs needing snapshot consolidation 1:red</h1>,
<h1>VMware_Technology: BusSharingMode - Physical and Virtual: 4:red</h1>,
<h1>VMware_Technology: s/vMotion Information (Over 5 Days Old) : 7:yellow</h1>,
<h1>VMware_Technology: Hardware status warnings/errors:red</h1>,
<h1>VMware_Technology: Datastores (Less than 20% Free) : 2:red</h1>,
<h1>VMware_Technology: Snapshots (Over 3 Days Old) : 9:red</h1>,
<h1>VMware_Technology: VMs Ballooning or Swapping : 10:red</h1>,
<h1>VMware_Technology: VM Tools Issues: 1:red</h1>,
<h1>VMware_Technology: VMs needing snapshot consolidation 1:red</h1>,
<h1>VMware_Technology: BusSharingMode - Physical and Virtual: 25:yellow</h1>
</body></html>

From here I need to find out the unique Technology and color. Red get priority over yellow

What I have reached so far is below

appPath = 'D:\\Backup\\Drive_D\\W0rk\\Script\\Python\\HTMLTOCSV_Python\\Try\\'

cols=[]
newcols=[]
from bs4 import BeautifulSoup
import os

soup = BeautifulSoup(open(appPath+'Cust\\consolidated_report_201708100600.html','rb'), 'lxml')
h1 = soup.find_all("h1")
for col in h1:
    cols.append(col.get_text())
for item in cols:
    technology = item.split(':')[0]
    color = item.split(':')[-1]
    if color in ["red","yellow"]:
        newcols.append(technology+":"+color)
newcols = list(set(newcols))
print(newcols)
Now I have got unique combination of technology:color. I don't know how to make red priority over yellow when there is a combination of both X_technology:yellow and X_technology:red
Reply
#4
Like I said, feed it into a dict. If you do:

tech = dict([('spam', 'yellow'), ('spam', 'red'), ('eggs', 'red')])
The 'spam' key in the tech dict will be 'red'. That's because the later 'spam': 'red' overwrites the earlier 'spam': 'yellow'. So you just need to make sure your list is sorted correctly, and that would just be in reverse order. To get the list to feed into the dict, just append (technology, color) to newcols instead of technology+":"+color. Combine it back into a string when you read it back out of dict.items().
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#5
Got you. Thanks Do you think we can improve the code little better. I do not want list comprehension as it is confusing :)
appPath = 'D:\\Backup\\Drive_D\\W0rk\\Script\\Python\\HTMLTOCSV_Python\\Try\\'

cols=[]
newcols=[]
from bs4 import BeautifulSoup
import os

soup = BeautifulSoup(open(appPath+'Cust\\consolidated_report_201708100600Copy.html','rb'), 'lxml')
h1 = soup.find_all("h1")
for col in h1:
    cols.append(col.get_text())
for item in cols:
    technology = item.split(':')[0]
    color = item.split(':')[-1]
    if color in ["red","yellow"]:
        newcols.append((technology,color))
        newcols.sort(reverse=True)
newcols = dict(newcols)

print(newcols)
Reply
#6
You could use tuple assignment to clean up the split:

technology, color = item.split(':')
Also, you only need to sort newcols once, after the loop, so give it the same indentation as the dict cols.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020