Posts: 67
Threads: 25
Joined: Jun 2018
Jun-13-2018, 05:36 AM
(This post was last modified: Jun-13-2018, 05:36 AM by 3Pinter.)
Guys,
Fairly new to python so for sure could use your expertise!
I have two lists, I want to check if the first list has values which match (partially) with list2.
And in the end I want
a list with the matching character(s) after "cookie."
a list with the value
neglect the " randomothertext" (yes, the space in front of random is always there)
output1, output2=[],[]
preset="cookie."
list1=['cookie.A001', 'cookie.H004', 'cookie.H004 andsomeothertext', 'cookie.ABC031', 'cookie.FAIL002']
list2=['A', 'H', 'ABC']
list2_total=[preset+x for x in list2]
(let's say i = cookie.H004 to illustrate expected values)
for i in list1:
if any(n in i for n in list2_total):
#kt should and values: 3
kt = len(preset)
#km should value: 3 + len(H) = 3+1 = 4
km = kt + len(n)
output1.append(i[kt])
output2.append(i[km:km+2])
# output1 after the full loop would be ["A", "H", "H", "ABC"]
# output2 ["001", "004", "004", "031"]
# so the SAME output would be if "i" is 'cookie.H004 andsomeothertext' and 'cookie.H004' Question
1. n is unknown in using it as parameter. How do I get that to act as a reusable parameter?
2. is this the way to go? Or would you suggest another (better) way?
Posts: 8,156
Threads: 160
Joined: Sep 2016
Jun-13-2018, 06:42 AM
(This post was last modified: Jun-13-2018, 06:43 AM by buran.)
something like
def lookup(lookup_item, reference, prefix):
for ref_item in reference:
full_ref = ''.join((prefix, ref_item))
if lookup_item.startswith(full_ref):
s = len(full_ref) - len(lookup_item)
return (ref_item, item[s:].split(' ')[0])
PREFIX = "cookie."
REFERENCE = ['A', 'H', 'ABC']
lookup_list = ['cookie.A001', 'cookie.H004', 'cookie.H004 andsomeothertext', 'cookie.ABC031', 'cookie.FAIL002']
result = [lookup(item, REFERENCE, PREFIX) for item in lookup_list]
output1, output2 = zip(*[item for item in result if item])
print(output1)
print(output2) Output: ('A', 'H', 'H', 'A')
('001', '004', '004', 'BC031')
Hope this could put you on the right path. As you can see there is problem with A/ABC match - you need to explain further how you process such cases.
Posts: 817
Threads: 1
Joined: Mar 2018
Another way is to use regular expressions:
import re
#output1, output2=[],[]
preset=r"cookie\."
list1=['cookie.A001', 'cookie.H004', 'cookie.H004 andsomeothertext', 'cookie.ABC031', 'cookie.FAIL002']
list2=['A', 'H', 'ABC']
#list2_total=[preset+x for x in list2]
list2_pats = map(lambda x: re.compile(preset + r'(%s)([0-9]+)' % x), list2)
result = sum([pat.findall(item) for pat in list2_pats for item in list1], [])
output1, output2 = zip(*result)
print(output1)
print(output2) There is no problem with A/ABC matching if xxxx in cookie.Axxxx (cookie.ABCxxxx) are digits.
Posts: 67
Threads: 25
Joined: Jun 2018
@ scidam,
ahhh re.compile is like a regex (which I'm familiar with)! That's interesting, I'll have a look at that.
Thanks for that suggestion!
And for clean coding: no need to declare output1 as empty list?
So I can declare new parameters on the fly?
Posts: 8,156
Threads: 160
Joined: Sep 2016
re is python standard library module for regex
https://docs.python.org/3/library/re.html
re.compile returns compiled regex object
https://docs.python.org/3/library/re.html#re.compile
(Jun-13-2018, 07:25 AM)3Pinter Wrote: And for clean coding: no need to declare output1 as empty list?
So I can declare new parameters on the fly?
it depend how you use it.
Posts: 67
Threads: 25
Joined: Jun 2018
Hi Buran,
Thanks for your follow up, and certainly awesome that python has re(gex)! I'll read more about this one.
Regarding declaring:
like I did in my first post: I declared output1, output2 as empty lists. Because if I wouldn't I won't be able to append values to it
If I would want to turn a number of results in a list, I could just define it on the spot ...
Right?
Posts: 8,156
Threads: 160
Joined: Sep 2016
Jun-13-2018, 08:20 AM
(This post was last modified: Jun-13-2018, 08:20 AM by buran.)
(Jun-13-2018, 07:48 AM)3Pinter Wrote: like I did in my first post: I declared output1, output2 as empty lists. Because if I wouldn't I won't be able to append values to it
If I would want to turn a number of results in a list, I could just define it on the spot ...
Right?
yes, that's correct.
note that result initially is list of lists, so you unpack the outer list in respective number of variables
result = [['a', 'b', 'c'], [1, 2, 3]]
o1, o2 = result
print(o1)
print(o2) Output: ['a', 'b', 'c']
[1, 2, 3]
>>>
Posts: 67
Threads: 25
Joined: Jun 2018
How do I catch the 'not matched ones'?
import re
preset=r"cookie\."
list1=['cookie.A001', 'cookie.H004', 'cookie.H004 andsomeothertext', 'cookie.ABC031', 'cookie.FAIL002']
list2=['A', 'H', 'ABC']
list2_pats = map(lambda x: re.compile(preset + r'(%s)([0-9]+)' % x), list2)
#result = sum([pat.findall(item) for pat in list2_pats for item in list1], [])
result =[]
for item in list1:
for pat in list2_pats:
if item in pat:
result.append(pat.findall(item))
else:
result.append("eeew")
output1, output2, output3 = zip(*result) Ideally I get another output (output3) which contains all the non-matches-inputs. Like "cookie.FAIL002".
Untangling the shorthand version is a bit of a struggle atm (still learning :) )
What am I doing wrong?
Posts: 8,156
Threads: 160
Joined: Sep 2016
(Jun-14-2018, 08:55 AM)3Pinter Wrote: What am I doing wrong? you misunderstood scidam's approach with regex
in slow motion:
import re
preset=r"cookie\."
list1=['cookie.A001', 'cookie.H004', 'cookie.H004 andsomeothertext', 'cookie.ABC031', 'cookie.FAIL002']
list2=['A', 'H', 'ABC']
list2_pats = map(lambda x: re.compile(preset + r'(%s)([0-9]+)' % x), list2)
result1 = [pat.findall(item) for pat in list2_pats for item in list1]
print('result1: {}\n'.format(result1))
result2 = sum(result1, [])
print('result2: {}\n'.format(result2))
result3 = zip(*result2)
print('result3: {}\n'.format(result3))
output1, output2 = result3
print('output1: {}\n'.format(output1))
print('output2: {}\n'.format(output2)) Output: result1: [[('A', '001')], [], [], [], [], [], [('H', '004')], [('H', '004')], []
, [], [], [], [], [('ABC', '031')], []]
result2: [('A', '001'), ('H', '004'), ('H', '004'), ('ABC', '031')]
result3: [('A', 'H', 'H', 'ABC'), ('001', '004', '004', '031')]
output1: ('A', 'H', 'H', 'ABC')
output2: ('001', '004', '004', '031')
your line 10: if item in pat never will be True
|