Sep-18-2024, 04:14 PM
I have a pandas dataframe where one of the columns is a list of values. I need to identify if any item in the list occurs in that list in other rows.
in other words, on the first row, take the first value in the list in the "items" column and check the "items" column in all the other rows to see if that value occurs. If it does, I want to know the "name" from the first row, the "name" from the row where the match occurs, and the value we were looking for. Repeat for the each value in the "items" list in the first row. Repeat for all the rows.
What should I be doing to achieve the result I'm looking for?
in other words, on the first row, take the first value in the list in the "items" column and check the "items" column in all the other rows to see if that value occurs. If it does, I want to know the "name" from the first row, the "name" from the row where the match occurs, and the value we were looking for. Repeat for the each value in the "items" list in the first row. Repeat for all the rows.
df = pd.DataFrame([{"name": "abc", "items": ["1234", "5678", "9012"]}, {"name": "def", "items": ["3456", "7890"]}, {"name": "ghi", "items": ["9876", "1234"]}, {"name": "jkl", "items": ["5678", "7890", "2468"]} ]) I want: [["abc", "ghi", "1234"], ["abc", "jkl", "5678"], ["def", "jkl", "7890"] ]I could iterate through the lists, iterating over all the rows, but I feel that pandas should be able to handle at least some of the work here.
What should I be doing to achieve the result I'm looking for?