Python Forum
Find a string from a column of one table in another table - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Find a string from a column of one table in another table (/thread-40674.html)



Find a string from a column of one table in another table - visedwings049 - Sep-05-2023

I am using Python 3.9

I have created two pandas data frames from csv files
product and supplier


1. I have created a product table that splits a description out into multiple columns.

2. I have created a supplier table that has the supplier product. Many times a supplier product code is in the description of a product.

3. I want to populate the product.supplier code column with any string that is contained in the supplier.product column. In this example we would have found the code widget in column 4 in the supplier table and returned the word widget in the supplier code table.

4. So i want to do this on a loop for column 0 then move on to column 1 and so on.

There will never be two examples of a supplier code in the same string, so i am not worried about overwriting a first instance with a second.

I have tried the str.contains function but this just returns true or false.

[Image: parse-example.png]


RE: Find a string from a column of one table in another table - deanhystad - Sep-05-2023

What is a table? Is it a spreadsheet? Is it a table in a PDF? Is it a CSV file? Is it a pandas DataFrame, Is it a table in a database?


RE: Find a string from a column of one table in another table - visedwings049 - Sep-05-2023

(Sep-05-2023, 03:03 PM)visedwings049 Wrote: I am using Python 3.9

1. I have created a product table that splits a description out into multiple columns.

2. I have created a supplier table that has the supplier product. Many times a supplier product code is in the description of a product.

3. I want to populate the product.supplier code column with any string that is contained in the supplier.product column. In this example we would have found the code widget in column 4 in the supplier table and returned the word widget in the supplier code table.

4. So i want to do this on a loop for column 0 then move on to column 1 and so on.

There will never be two examples of a supplier code in the same string, so i am not worried about overwriting a first instance with a second.

I have tried the str.contains function but this just returns true or false.

[Image: parse-example.png]

I apologize as this is my first post but they are both being read as pandas dataframes from CSV.


RE: Find a string from a column of one table in another table - deanhystad - Sep-05-2023

Something like this maybe?
import pandas as pd
from string import ascii_letters as letters
from random import choice, choices, randint


def find_supplier(description):
    """Return word if word in description matches a supplier code, else None."""
    intersection = set(description.split()) & suppliers
    return list(intersection)[0] if intersection else None


# Make some random table thing that we can use to search for words in the description
# that match a supplier code.
product_table = pd.DataFrame(
    [
        {
            "Product": i,
            "Supplier Code": choice("ABCDE"),
            "Description": " ".join(choices(letters, k=randint(5, 10))),
        }
        for i in range(100, 120)
    ]
)

# Get set of suppliers.
suppliers = set(product_table["Supplier Code"].values)

# Make supplier table.  Supplier table contains rows from product_table
# where one of the words in the description matches a supplier code.
supplier_table = product_table[["Description"]]
supplier_table["Product"] = supplier_table["Description"].map(find_supplier)
supplier_table = supplier_table[~supplier_table["Product"].isna()][
    ["Product", "Description"]
]
print(supplier_table)
Output:
Product Description 1 E T u V x Z E a k s 2 A K H a K P G z m l A 3 E Q L H E q J 5 B N X x b i B q D F M 8 D d U q K Y W I D 10 C U V H C f F n N z 14 C C o X u J 15 E D F e E Q u 18 B o f B P x O
This is easy to break up into individual supplier tables.
for supplier in suppliers:
    print(
        supplier,
        supplier_table[supplier_table["Product"] == supplier].reset_index(drop=True),
        sep="\n",
        end="\n\n",
    )
Output:
A Product Description 0 A d j A c S o F U 1 A o A w I W 2 A z s e j A 3 A c P R w Z M A V b D Product Description 0 D t u P r p R v G D O 1 D j P w D h v o m w C Product Description 0 C n j r C r R T B Product Description 0 B H O P B A c r C n 1 B B g Z r z 2 B r o y g u l B A E Product Description 0 E P E m t Z 1 E S E Y m F a K h Z T



RE: Find a string from a column of one table in another table - visedwings049 - Sep-06-2023

Wow that is an amazing way to do it. This is very helpful i am going to play with this code and see if i can duplicate it with my data set. Thank you so much for this example i had not done anything with defining a function prior to this and that is extremely cool. Cool


RE: Find a string from a column of one table in another table - visedwings049 - Sep-06-2023

(Sep-05-2023, 04:21 PM)visedwings049 Wrote:
(Sep-05-2023, 03:03 PM)visedwings049 Wrote: I am using Python 3.9

1. I have created a product table that splits a description out into multiple columns.

2. I have created a supplier table that has the supplier product. Many times a supplier product code is in the description of a product.

3. I want to populate the product.supplier code column with any string that is contained in the supplier.product column. In this example we would have found the code widget in column 4 in the supplier table and returned the word widget in the supplier code table.

4. So i want to do this on a loop for column 0 then move on to column 1 and so on.

There will never be two examples of a supplier code in the same string, so i am not worried about overwriting a first instance with a second.

I have tried the str.contains function but this just returns true or false.

[Image: parse-example.png]

I apologize as this is my first post but they are both being read as pandas dataframes from CSV.

Worked beautifully and its way faster than sending everything to a different column. Had no idea you could do this in python. I was able to compare thousands of rows of data in less than 30 seconds. Thanks Again!!


RE: Find a string from a column of one table in another table - visedwings049 - Sep-06-2023

(Sep-05-2023, 06:57 PM)deanhystad Wrote: Something like this maybe?
import pandas as pd
from string import ascii_letters as letters
from random import choice, choices, randint


def find_supplier(description):
    """Return word if word in description matches a supplier code, else None."""
    intersection = set(description.split()) & suppliers
    return list(intersection)[0] if intersection else None


# Make some random table thing that we can use to search for words in the description
# that match a supplier code.
product_table = pd.DataFrame(
    [
        {
            "Product": i,
            "Supplier Code": choice("ABCDE"),
            "Description": " ".join(choices(letters, k=randint(5, 10))),
        }
        for i in range(100, 120)
    ]
)

# Get set of suppliers.
suppliers = set(product_table["Supplier Code"].values)

# Make supplier table.  Supplier table contains rows from product_table
# where one of the words in the description matches a supplier code.
supplier_table = product_table[["Description"]]
supplier_table["Product"] = supplier_table["Description"].map(find_supplier)
supplier_table = supplier_table[~supplier_table["Product"].isna()][
    ["Product", "Description"]
]
print(supplier_table)
Output:
Product Description 1 E T u V x Z E a k s 2 A K H a K P G z m l A 3 E Q L H E q J 5 B N X x b i B q D F M 8 D d U q K Y W I D 10 C U V H C f F n N z 14 C C o X u J 15 E D F e E Q u 18 B o f B P x O
This is easy to break up into individual supplier tables.
for supplier in suppliers:
    print(
        supplier,
        supplier_table[supplier_table["Product"] == supplier].reset_index(drop=True),
        sep="\n",
        end="\n\n",
    )
Output:
A Product Description 0 A d j A c S o F U 1 A o A w I W 2 A z s e j A 3 A c P R w Z M A V b D Product Description 0 D t u P r p R v G D O 1 D j P w D h v o m w C Product Description 0 C n j r C r R T B Product Description 0 B H O P B A c r C n 1 B B g Z r z 2 B r o y g u l B A E Product Description 0 E P E m t Z 1 E S E Y m F a K h Z T
Worked beautifully and its way faster than sending everything to a different column. Had no idea you could do this in python. I was able to compare thousands of rows of data in less than 30 seconds. Thanks Again!!


RE: Find a string from a column of one table in another table - deanhystad - Sep-06-2023

Thousands of rows in 30 seconds is really slow. I modified my code to process 100,000 products and it did that in 0.1 seconds. So I doubled the number of suppliers, and that only increased the time about about 10% (0.11 seconds). Next I tripled the length of the description, and that doubled the time (0.22 seconds).

What could be making your program run so slow?


RE: Find a string from a column of one table in another table - deanhystad - Sep-07-2023

I was getting a warning that I ignored in my code. It doesn't cause any problems in my example, but it has potential for causing very confusing behaviors.

In my example I did this to make a new dataframe for suppliers:
supplier_table = product_table[["Description"]]
This does not create a new dataframe. It creates a slice of the product_table dataframe. What I should have done is make a copy of that slice so that supplier_table and product_table are independent.
supplier_table = product_table[["Description"]].copy()