Python Forum
Cleaning my code to make it more efficient
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Cleaning my code to make it more efficient
#1
Hi, as i`m new to python my code is not clean and not efficient and im looking towards faster performance while im loading my code.
Im using streamlit and pandas and i have filters like this 12 times for each filter option:
    # Create for foo
regionas = st.sidebar.multiselect("Pick your Regionas", options=df.sort_values(by="Regionas").Regionas.unique())
if not regionas:
    df2 = df.copy()
else:
    df2 = df[df["Regionas"].isin(regionas)]

x with each option available
if not regionas and not valstija and not marketas and not zmogus and not daiktas and not dienos and not menesiai and not seima and not augintinis and not miestas and not siurprizas:
    filtered_df = df
elif not valstija and not marketas and not zmogus and not daiktas and not dienos and not menesiai and not seima and not augintinis and not miestas and not siurprizas
    filtered_df = df[df["Regionas"].isin(regionas)]

then to step up filtering i add 2 filter option so i filtrate by 2 options instead of 1
elif valstija and regionas:
    filtered_df = df12[df["State"].isin(valstija) & df12["Regionas"].isin(regionas)]

after that i add 3 option i can choose from my filter and 4 and 5 and all the way to the end that i can choose all filters and my data will be filtered [ Example: A or B or C or D or E or F or G or H or I or J or K or L- then AB or AC or AD or AE or EF , EG or AG ... or KBC ADB or ABCD or DEFGH or ABGHJI or GHIABCD etc. .... and end looks like ABCDEFGHIJKL if i choose all. ]
Currently i have 8 filters working in specific order[ have to choose A if i want C but if i choose C it wont show A and i choose filter H its not filtering out my filter G but if i choose filter G i can choose filter H ] and it takes 300 LOC and it takes ages to read when i start streamlit app ( not considering uploading file and filtering ).
Where i have to start to clean , minimize my code and make it more efficient while keeping my 12 filter options and maybe i can add more in the future.
Im not really sure with it but i think if i have 12 filters and i want to filter out by any selection i have 12x12x11 options ...give a take ?
Thank You.
Reply
#2
Can clean up the and not mess to this.
if not any([regionas, valstija, marketas, zmogus, daiktas, dienos, menesiai, seima, augintinis, miestas, siurprizas]):
    filtered_df = df
else:
    filtered_df = df[df["Regionas"].isin(regionas)]
BSDevo likes this post
Reply
#3
(Sep-26-2023, 11:52 AM)snippsat Wrote: Can clean up the and not mess to this.
if not any([regionas, valstija, marketas, zmogus, daiktas, dienos, menesiai, seima, augintinis, miestas, siurprizas]):
    filtered_df = df
else:
    filtered_df = df[df["Regionas"].isin(regionas)]

This is kinda simple and elegant and way much less of code.
Thank You.
Reply
#4
I think your approach is all wrong. Instead of testing each column for a filter, you should make a list of filters to apply. Filtering can be done with a compact loop that loops through all the selected filters.

The example below uses a dictionary to map column names with column values.
import pandas as pd
from dataclasses import dataclass
from random import choices


# Make a dataframe for demonstrating filter.
letters = list("ABCDEIOU")
numbers = list(range(1, 10))
mixed = [f"{letter}{number}" for letter in letters for number in numbers]


df = pd.DataFrame(
    {
        "Letters": choices(letters, k=20),
        "Numbers": choices(numbers, k=20),
        "Mixed": choices(mixed, k=20),
    }
)


def filter(frame, filters):
    """Apply filters to dataframe.  filters is a dictionary of column: values pairs."""
    df = frame.copy()
    for key, value in filters.items():
        df = df[df[key].isin(value)]
    return df


print("Numbers = 7", filter(df, {"Numbers": [7]}), sep="\n")
odd_abcs = {"Numbers": [1, 3, 5, 7, 9], "Letters": ["A", "B", "C"]}
print("", "Odd ABC's", filter(df, odd_abcs), sep="\n")
even_vowels = {}
even_vowels["Numbers"] = [2, 4, 6, 8]
even_vowels["Letters"] = ["A", "E", "I", "O", "U"]
print("", "Even Vowels", filter(df, even_vowels), sep="\n")
BSDevo likes this post
Reply
#5
A quick and dirty (oh so dirty) example of using the filter idea above in a tkinter window that displays a dataframe.
import tkinter as tk
import pandas as pd
from dataclasses import dataclass
from random import choices


class Window(tk.Tk):
    def __init__(self, dataframe):
        super().__init__()
        self.df = dataframe
        self.columns = {}
        row = tk.Frame(self)
        row.pack(side=tk.TOP)
        for column in dataframe:
            col = tk.Frame(row)
            col.pack(side=tk.LEFT, padx=5, pady=5)
            tk.Label(col, text=column).pack()
            values = sorted(set(dataframe[column].values))
            var = tk.Variable(self, values)
            selector = tk.Listbox(
                col,
                width=10,
                height=10,
                listvariable=var,
                selectmode=tk.MULTIPLE,
                exportselection=False,
            )
            selector.pack()
            selector.bind("<<ListboxSelect>>", self.filter)
            selector.var = var
            self.columns[column] = selector
        self.table = tk.Label(self)
        self.table.pack(expand=True, fill=tk.BOTH)
        self.filter()

    def filter(self, *args):
        df = self.df.copy()
        for column, lbox in self.columns.items():
            if selection := lbox.curselection():
                choices = [lbox.get(index) for index in selection]
                df = df[df[column].isin(choices)]
        self.table["text"] = str(df)


# Make a dataframe for demonstrating filter.
letters = "ABCDEIOU"
numbers = [str(x) for x in range(1, 10)]
mixed = [f"{letter}{number}" for letter in letters for number in numbers]
df = pd.DataFrame(
    {
        "Letters": choices(letters, k=20),
        "Numbers": choices(numbers, k=20),
        "Mixed": choices(mixed, k=20),
    }
)

Window(df).mainloop()
Reply
#6
(Sep-26-2023, 05:58 PM)deanhystad Wrote: I think your approach is all wrong. Instead of testing each column for a filter, you should make a list of filters to apply. Filtering can be done with a compact loop that loops through all the selected filters.
import pandas as pd
from dataclasses import dataclass
from random import choices


# Make a dataframe for demonstrating filter.
letters = list("ABCDEIOU")
numbers = list(range(1, 10))
mixed = [f"{letter}{number}" for letter in letters for number in numbers]


df = pd.DataFrame(
    {
        "Letters": choices(letters, k=20),
        "Numbers": choices(numbers, k=20),
        "Mixed": choices(mixed, k=20),
    }
)


def filter(frame, filters):
    """Apply filters to dataframe.  filters is a dictionary of column: values pairs."""
    df = frame.copy()
    for key, value in filters.items():
        df = df[df[key].isin(value)]
    return df


print("Numbers = 7", filter(df, {"Numbers": [7]}), sep="\n")
odd_abcs = {"Numbers": [1, 3, 5, 7, 9], "Letters": ["A", "B", "C"]}
print("", "Odd ABC's", filter(df, odd_abcs), sep="\n")
even_vowels = {}
even_vowels["Numbers"] = [2, 4, 6, 8]
even_vowels["Letters"] = ["A", "E", "I", "O", "U"]
print("", "Even Vowels", filter(df, even_vowels), sep="\n")

Yes, my approach is primitive - monkey see monkey do type as im brand spanking new inside python world and im using examples from internet to understand the way.
I think i understand your idea , but i dont understand your code :)
Maybe i can mix with set filters + column filtering as to create filters for each column wont work for my needs as i dont want every time something new added to create filter or i misunderstand your idea with filers ?
So lets say if i have 10 people, i create filters for 10 people, create for regions, states etc as you using pandas.DataFrame.filter or im completely wrong ?
Reply
#7
How about something like this:
for column in df:
    choices = set(df[column].values)
    if len(choices) < 2:
        continue
    choices = st.sidebar.multiselect(f"Pick your {column}", sorted(choices))
    if choices:
        df = df[df[column].isin(choices)]
This loops through every column. If there is more than one value to choose from in the column, it does the sidebar thing to get the user selections. If the user makes a selection, the selection is applied to the dataframe, removing the non-matching rows.

It is difficult to know if this is a good fit for what you are trying to accomplish, because I really don't understand your code. You provide such narrow view, focusing on something that I don't think you even need to do. Could you explain what your program is supposed to do? Why would somebody use your code? What steps would the follow to produce the results they want, and what are those results?
BSDevo likes this post
Reply
#8
(Sep-26-2023, 11:53 PM)deanhystad Wrote: How about something like this:
for column in df:
    choices = set(df[column].values)
    if len(choices) < 2:
        continue
    choices = st.sidebar.multiselect(f"Pick your {column}", sorted(choices))
    if choices:
        df = df[df[column].isin(choices)]
This loops through every column. If there is more than one value to choose from in the column, it does the sidebar thing to get the user selections. If the user makes a selection, the selection is applied to the dataframe, removing the non-matching rows.

It is difficult to know if this is a good fit for what you are trying to accomplish, because I really don't understand your code. You provide such narrow view, focusing on something that I don't think you even need to do. Could you explain what your program is supposed to do? Why would somebody use your code? What steps would the follow to produce the results they want, and what are those results?

Sorry, i though i explained in understandable manner but looks like - im not , let me try to do a better job with it.
I have csv file containing various data and it contains many columns, but i choose 12 columns to filter what i need.
So i have some pie charts, line bars, histograms, bar charts and thats how i analyze my data also.

I create my select box for each column: .( Region would be west, east,midwest, south , market is Atlanta, Los Angeles, Houston etc )
Example for 3 columns instead of 12 columns.
# Filtrai
st.sidebar.header("Choose your filter")
    # Create for Region
regionas = st.sidebar.multiselect("Pick your Region", options=df.sort_values(by="Region").Region.unique())
if not regionas:
    df2 = df.copy()
else:
    df2 = df[df["Region"].isin(regionas)]
# Create for Market
marketas = st.sidebar.multiselect("Pick the Market Area", options=df2.sort_values(by="Market").Market.unique())
if not marketas:
    df3 = df2.copy()
else:
    df3 = df2[df2["Market"].isin(marketas)]
# Create for State
valstija = st.sidebar.multiselect("Pick the State",options=df3.sort_values(by="State").State.unique())
if not valstija:
    df4 = df3.copy()
else:
    df4 = df3[df3["State"].isin(valstija)]
Next step :
# Filter the data based on Region, State, Market

if not any ([regionas, valstija, marketas]):
    filtered_df = df
elif not any ([valstija, regionas]):
    filtered_df = df[df["Market"].isin(marketas)]
elif not any ([valstija, marketas]):
    filtered_df = df[df["Region"].isin(regionas)]
elif not any ([regionas, marketas]):
    filtered_df = df[df["State"].isin(valstija)]
elif valstija and marketas:
    filtered_df = df12[df["State"].isin(valstija) & df12["Marketas"].isin(marketas)]
elif valstija and marketas:
    filtered_df = df12[df["State"].isin(valstija) & df12["Market"].isin(marketas)]
elif regionas and marketas:
    filtered_df = df12[df["Region"].isin(regionas) & df12["Marketas"].isin(marketas)]
else:
    filtered_df = df12[df12["Market"].isin(marketas) & df12["Region"].isin(regionas) & df12["State"].isin(valstija) 
filtered_df = filtered_df.reset_index(drop=True)
As you can see 3 column code is kinda short and quick but if i use 12 column using same logic - have a lot of code for pandas/python/streamlit what ever reads this code to go trough.
So i though maybe i can remove a lot of code have same functionality and thats why i said mixing your logic with example i provided. I can use filter for few columns and for rest of it - logic i showed you.
Why im saying like this: i have Market column - market column contains 152 unique areas ( Atlanta, Houston, Los Angeles etc .. some major us cities ) so to write for each - i dont think its worth it as it would be 152 extra lines of code .... and to use my logic would be way much less but longer lines as i choose only Column MARKET instead of each from column MARKET or im completely wrong ?
P.s. ignore df4 / df12 - i did not changed for this code :)
Reply
#9
I'm not asking for a description of the code. I want to know what the program does. What is it going to be used for. I am more interested in why you need to filter the dataframe than how you filter the dataframe.

This part of your description is useful:
Quote:I have csv file containing various data and it contains many columns, but i choose 12 columns to filter what i need.
So i have some pie charts, line bars, histograms, bar charts and thats how i analyze my data also.
What is this filtering meant to accomplish? How do the filters relate to the charts? Is that related to this?
Quote:Why im saying like this: i have Market column - market column contains 152 unique areas ( Atlanta, Houston, Los Angeles etc .. some major us cities ) so to write for each - i dont think its worth it as it would be 152 extra lines of code .... and to use my logic would be way much less but longer lines as i choose only Column MARKET instead of each from column MARKET or im completely wrong
I am still stymied by this:
Quote:and to use my logic would be way much less but longer lines as i choose only Column MARKET instead of each from column MARKET or im completely wrong ?
Read that last quoted line. Does it make sense to you? What am I missing that prevents it from making sense to me? Before pressing the "post" button, read the post as if you had no other knowledge of the problem except the contents of the post. Your audience is vast in Python knowledge and completely ignorant about your program.

These are equivalent. Your approach:
if not any ([regionas, valstija, marketas]):
    filtered_df = df
elif not any ([valstija, regionas]):
    filtered_df = df[df["Market"].isin(marketas)]
elif not any ([valstija, marketas]):
    filtered_df = df[df["Region"].isin(regionas)]
elif not any ([regionas, marketas]):
    filtered_df = df[df["State"].isin(valstija)]
elif valstija and marketas:
    filtered_df = df12[df["State"].isin(valstija) & df12["Marketas"].isin(marketas)]
elif valstija and marketas:
    filtered_df = df12[df["State"].isin(valstija) & df12["Market"].isin(marketas)]
elif regionas and marketas:
    filtered_df = df12[df["Region"].isin(regionas) & df12["Marketas"].isin(marketas)]
else:
    filtered_df = df12[df12["Market"].isin(marketas) & df12["Region"].isin(regionas) & df12["State"].isin(valstija) 
Using if statements:
filtered_df = df.copy()
if marketas:
    filtered_df = df[df["Market"].isin(marketas)]
if regionas:
    filtered_df = df[df["Region"].isin(regionas)]
if valstija:
    filtered_df = df[df["State"].isin(valstija)]
Using a dictionary and a for loop.
filters = {"Market": marketas, "Region": regionas, "State": valstija}
filtered_df = df.copy()
for column, values in filters.items():
    if values:
        filtered_df = filtered_df[filtered_df.isin(values)]
Just using a for loop.
for column, values in zip(("Market", "Region", "State"), (marketas, regionas, valstija)):
    if values:
        filtered_df = filtered_df[filtered_df.isin(values)]
I like the dictionary approach myself as it makes it really easy to add new filters to your code. Referring back to your initial post:
filters = {}

def add_filter(df, column):
    """Add filter to filters dictionary.  Return dataframe with all filters applied."""
    options = sorted(set(df[column].values))
    filters[column] = st.sidebar.multiselect(f"Pick you {column}", options)

    filtered_df = df.copy()
    for column, values in filters:
        filtered_df = filtered_df[filtered_df[column].isin(values)]
    return filtered_df

df2 = add_filter(df, "Regionas")
That may look like a lot of extra code, but to add Marketas all you add is:
d2 = add_filter(df, "Marketas")
So having 12 filters all you need is one function and 12 lines of code that call that function

If you are worried that updating df_filtered over and over is inefficient, you can do this.
def add_filter(df, column):
    """Add filter to filters dictionary.  Return dataframe with all filters applied."""

    options = sorted(set(df[column].values))
    filters[column] = st.sidebar.multiselect(f"Pick you {column}", options)

    selected = pd.Series([True] * len(df)):
    for column, values in filters:
        selected &= df[column].isin(values)
    return df.copy()[selected]
BSDevo likes this post
Reply
#10
Yeah, i do have a problem when trying to explain things specially when English is not my main language.
My charts , pies etc will change on filter selection.
in the previous posts where snippsat cleaned my code
 if not any([regionas, valstija, marketas, zmogus, daiktas, dienos, menesiai, seima, augintinis, miestas, siurprizas]):
    filtered_df = df
 
regionas, valstija,marketas,zmogus,daiktas,dienos,menesiai,seima,augintinis, miestas,siurprizas - these are my columns in csv file.
i choose these columns as my filters and i want to mix and match them and each of the column contains its own select field
regionas = st.sidebar.multiselect("Pick your Region", options=df.sort_values(by="PuRegion").PuRegion.unique())
this is my selectbox/ field to chose what contains in my Region column and i have 12 columns 12 fields to select from and filter.
This link is the logic and function and idea behind it as i was copying it and extending to my needs https://youtu.be/7yAw1nPareM?si=1se8zb_-YN6-BV-y&t=1071 ( this is actual time where filtering i have is explained )
I think i needed to post this video right away and not try to explain my code and logic and functions behind.
Apologies !
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  hi need help to make this code work correctly atulkul1985 5 801 Nov-20-2023, 04:38 PM
Last Post: deanhystad
  newbie question - can't make code work tronic72 2 699 Oct-22-2023, 09:08 PM
Last Post: tronic72
  A more efficient code titanif 2 504 Oct-17-2023, 02:07 PM
Last Post: deanhystad
  how to make bot that sends instagram auto password reset code kraixx 2 1,393 Mar-04-2023, 09:59 PM
Last Post: jefsummers
  Make code non-blocking? Extra 0 1,146 Dec-03-2022, 10:07 PM
Last Post: Extra
  Making a function more efficient CatorCanulis 9 1,864 Oct-06-2022, 07:47 AM
Last Post: DPaul
  Apply textual data cleaning to several CSV files ErcoleL99 0 849 Jul-09-2022, 03:01 PM
Last Post: ErcoleL99
  Make the code shorter quest 2 1,526 Mar-14-2022, 04:28 PM
Last Post: deanhystad
  How would you (as an python expert) make this code more efficient/simple coder_sw99 3 1,820 Feb-21-2022, 10:52 AM
Last Post: Gribouillis
  Pyspark - my code works but I want to make it better Kevin 1 1,799 Dec-01-2021, 05:04 AM
Last Post: Kevin

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020