Find a string from a column of one table in another table

visedwings049 · (This post was last modified: Sep-05-2023, 04:30 PM by visedwings049.)

I am using Python 3.9

I have created two pandas data frames from csv files
product and supplier

1. I have created a product table that splits a description out into multiple columns.

2. I have created a supplier table that has the supplier product. Many times a supplier product code is in the description of a product.

3. I want to populate the product.supplier code column with any string that is contained in the supplier.product column. In this example we would have found the code widget in column 4 in the supplier table and returned the word widget in the supplier code table.

4. So i want to do this on a loop for column 0 then move on to column 1 and so on.

There will never be two examples of a supplier code in the same string, so i am not worried about overwriting a first instance with a second.

I have tried the str.contains function but this just returns true or false.

**deanhystad** · Sep-05-2023, 03:55 PM

What is a table? Is it a spreadsheet? Is it a table in a PDF? Is it a CSV file? Is it a pandas DataFrame, Is it a table in a database?

visedwings049 · Sep-05-2023, 04:21 PM

(Sep-05-2023, 03:03 PM)visedwings049 Wrote: I am using Python 3.9

1. I have created a product table that splits a description out into multiple columns.

2. I have created a supplier table that has the supplier product. Many times a supplier product code is in the description of a product.

3. I want to populate the product.supplier code column with any string that is contained in the supplier.product column. In this example we would have found the code widget in column 4 in the supplier table and returned the word widget in the supplier code table.

4. So i want to do this on a loop for column 0 then move on to column 1 and so on.

There will never be two examples of a supplier code in the same string, so i am not worried about overwriting a first instance with a second.

I have tried the str.contains function but this just returns true or false.

I apologize as this is my first post but they are both being read as pandas dataframes from CSV.

**deanhystad** · (This post was last modified: Sep-05-2023, 06:57 PM by deanhystad.)

Something like this maybe?

        
          
          
              
              import pandas as pd
from string import ascii_letters as letters
from random import choice, choices, randint
 
 
def find_supplier(description):
    """Return word if word in description matches a supplier code, else None."""
    intersection = set(description.split()) & suppliers
    return list(intersection)[0] if intersection else None
 
 
# Make some random table thing that we can use to search for words in the description
# that match a supplier code.
product_table = pd.DataFrame(
    [
        {
            "Product": i,
            "Supplier Code": choice("ABCDE"),
            "Description": " ".join(choices(letters, k=randint(5, 10))),
        }
        for i in range(100, 120)
    ]
)
 
# Get set of suppliers.
suppliers = set(product_table["Supplier Code"].values)
 
# Make supplier table.  Supplier table contains rows from product_table
# where one of the words in the description matches a supplier code.
supplier_table = product_table[["Description"]]
supplier_table["Product"] = supplier_table["Description"].map(find_supplier)
supplier_table = supplier_table[~supplier_table["Product"].isna()][
    ["Product", "Description"]
]
print(supplier_table)

            

        
      

Output:   Product          Description
1        E    T u V x Z E a k s
2        A  K H a K P G z m l A
3        E          Q L H E q J
5        B  N X x b i B q D F M
8        D      d U q K Y W I D
10       C    U V H C f F n N z
14       C            C o X u J
15       E          D F e E Q u
18       B          o f B P x O

This is easy to break up into individual supplier tables.

        
              for supplier in suppliers:
    print(
        supplier,
        supplier_table[supplier_table["Product"] == supplier].reset_index(drop=True),
        sep="\n",
        end="\n\n",
    )

Output:A
  Product        Description
0       A    d j A c S o F U
1       A          o A w I W
2       A          z s e j A
3       A  c P R w Z M A V b

D
  Product          Description
0       D  t u P r p R v G D O
1       D    j P w D h v o m w

C
  Product    Description
0       C  n j r C r R T

B
  Product        Description
0       B  H O P B A c r C n
1       B          B g Z r z
2       B    r o y g u l B A

E
  Product          Description
0       E            P E m t Z
1       E  S E Y m F a K h Z T

visedwings049 · Sep-06-2023, 04:23 PM

Wow that is an amazing way to do it. This is very helpful i am going to play with this code and see if i can duplicate it with my data set. Thank you so much for this example i had not done anything with defining a function prior to this and that is extremely cool. Cool

visedwings049 · Sep-06-2023, 06:51 PM

(Sep-05-2023, 04:21 PM)visedwings049 Wrote:
(Sep-05-2023, 03:03 PM)visedwings049 Wrote: I am using Python 3.9

1. I have created a product table that splits a description out into multiple columns.

2. I have created a supplier table that has the supplier product. Many times a supplier product code is in the description of a product.

3. I want to populate the product.supplier code column with any string that is contained in the supplier.product column. In this example we would have found the code widget in column 4 in the supplier table and returned the word widget in the supplier code table.

4. So i want to do this on a loop for column 0 then move on to column 1 and so on.

There will never be two examples of a supplier code in the same string, so i am not worried about overwriting a first instance with a second.

I have tried the str.contains function but this just returns true or false.

I apologize as this is my first post but they are both being read as pandas dataframes from CSV.

Worked beautifully and its way faster than sending everything to a different column. Had no idea you could do this in python. I was able to compare thousands of rows of data in less than 30 seconds. Thanks Again!!

visedwings049 · Sep-06-2023, 06:53 PM

(Sep-05-2023, 06:57 PM)deanhystad Wrote: Something like this maybe?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

import pandas as pd
from string import ascii_letters as letters
from random import choice, choices, randint

def find_supplier(description):
    """Return word if word in description matches a supplier code, else None."""
    intersection = set(description.split()) & suppliers
    return list(intersection)[0] if intersection else None

# Make some random table thing that we can use to search for words in the description
# that match a supplier code.
product_table = pd.DataFrame(
    [
        {
            "Product": i,
            "Supplier Code": choice("ABCDE"),
            "Description": " ".join(choices(letters, k=randint(5, 10))),
        }
        for i in range(100, 120)
    ]
)

# Get set of suppliers.
suppliers = set(product_table["Supplier Code"].values)

# Make supplier table. Supplier table contains rows from product_table
# where one of the words in the description matches a supplier code.
supplier_table = product_table[["Description"]]
supplier_table["Product"] = supplier_table["Description"].map(find_supplier)
supplier_table = supplier_table[~supplier_table["Product"].isna()][
    ["Product", "Description"]
]
print(supplier_table)
Output:   Product          Description
1        E    T u V x Z E a k s
2        A  K H a K P G z m l A
3        E          Q L H E q J
5        B  N X x b i B q D F M
8        D      d U q K Y W I D
10       C    U V H C f F n N z
14       C            C o X u J
15       E          D F e E Q u
18       B          o f B P x O
This is easy to break up into individual supplier tables.

1
2
3
4
5
6
7

for supplier in suppliers:
    print(
        supplier,
        supplier_table[supplier_table["Product"] == supplier].reset_index(drop=True),
        sep="\n",
        end="\n\n",
    )
Output:A
  Product        Description
0       A    d j A c S o F U
1       A          o A w I W
2       A          z s e j A
3       A  c P R w Z M A V b

D
  Product          Description
0       D  t u P r p R v G D O
1       D    j P w D h v o m w

C
  Product    Description
0       C  n j r C r R T

B
  Product        Description
0       B  H O P B A c r C n
1       B          B g Z r z
2       B    r o y g u l B A

E
  Product          Description
0       E            P E m t Z
1       E  S E Y m F a K h Z T

Worked beautifully and its way faster than sending everything to a different column. Had no idea you could do this in python. I was able to compare thousands of rows of data in less than 30 seconds. Thanks Again!!

**deanhystad** · Sep-06-2023, 07:09 PM

Thousands of rows in 30 seconds is really slow. I modified my code to process 100,000 products and it did that in 0.1 seconds. So I doubled the number of suppliers, and that only increased the time about about 10% (0.11 seconds). Next I tripled the length of the description, and that doubled the time (0.22 seconds).

What could be making your program run so slow?

**deanhystad** · Sep-07-2023, 03:22 PM

I was getting a warning that I ignored in my code. It doesn't cause any problems in my example, but it has potential for causing very confusing behaviors.

In my example I did this to make a new dataframe for suppliers:

        
              supplier_table = product_table[["Description"]]

This does not create a new dataframe. It creates a slice of the product_table dataframe. What I should have done is make a copy of that slice so that supplier_table and product_table are independent.

        
              supplier_table = product_table[["Description"]].copy()

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Converting Pandas DataFrame to a table of hourly blocks	Abedin	1	661	Apr-24-2025, 01:05 PM Last Post: snippsat
	Pinball table CCC	fourbanks	0	462	Mar-09-2025, 03:10 PM Last Post: fourbanks
	working with pivot table	gunther	1	469	Jan-22-2025, 08:55 AM Last Post: Keville_35
	Convert Json to table format	python_student	4	15,196	Dec-05-2024, 04:32 PM Last Post: Larz60+
	Extracting table and table name from PDF	vinibhat	4	8,345	Aug-10-2024, 07:29 AM Last Post: Pedroski55
	drawing a table with the status of tasks in each thread	pyfoo	3	1,494	Mar-01-2024, 09:29 AM Last Post: nerdyaks
	How to create a table with different sizes of columns in MS word	pepe	8	8,871	Dec-08-2023, 07:31 PM Last Post: Pedroski55
	Trying to get counts/sum/percentages from pandas similar to pivot table	cubangt	6	3,556	Oct-06-2023, 04:32 PM Last Post: cubangt
	dict table	kucingkembar	4	1,895	Sep-30-2023, 03:53 PM Last Post: deanhystad
	Going through HTML table with selenium	emont	3	2,818	Sep-30-2023, 02:13 AM Last Post: emont

Find a string from a column of one table in another table

User Panel Messages

Announcements