Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Noob question
#1
Hi, im new to Pandas, Python and all this awesomnes !

I learn hard way ...

I have few things im confused about.

Im trying to sort and visualize with graphs but separate by years.

df_mc = pd.DataFrame(df.groupby(['PMarket', 'PuYear']).size().reset_index())
        df_mc.columns = ['PMarket', 'PuYear', 'Count']

        fig1 = px.bar(df_mc, x="PMarket", y="Count", color='PMarket', range_y=[0,100])
But it shows me total of all years combined and selector wont work due to my "groupby function ?

Count column created by grouping. My range would be from 2000 but no end date and i want to do it for each month but cant find how to implement exact month as well.

So by the end of the day it should be - January - all data from January just need to be filtered by year with selector, next tab is February so same as January but February and so on. ( i have tabs already )

Also i would like to rename fields inside graph and be able to choose a year. I have a selector but it does nothing, my graph does not changes but if you look at THIS IMAGE it separates my bar with dotted line for each year.

Also - would it be safe to combine week 52 with week 53 if yes - how ? as far as i understand - week 52 is part of the week 53 or week 53 is part of week 52 and week 1 ?

Im also looking for info how to rename things inside charts as px.bar will show column name.
And also select Market and show on graph with all years . Lets say it shows all available years from earliest date available in my dat frame and i select area i want to look at to see difference between each year.
Add percentage difference from earliest year posible. I got game cd in 2000-1-1 next franchise came out 20010 and i got cd again but different price. i want to use my first CD as 100% and second cd price would show me a difference in percent - was it cheaper or more expensive .
I do all of this inside streamlit.

Thank You.
Reply
#2
You should post a sample of working DataFrame,
you do lot explaining for the 4 line of code,but it hard to make any sense of without a example.
Same problem your other Thread .
BSDevo likes this post
Reply
#3
(Sep-07-2023, 09:55 AM)snippsat Wrote: You should post a sample of working DataFrame,
you do lot explaining for the 4 line of code,but it hard to make any sense of without a example.
Same problem your other Thread .

Understood, as im beginner and barelly use forums i thought this should be enough.

My code:
Reading data from uploaded file
@st.cache_data
def load_data(path: str):
    df = pd.read_csv(path, converters={'PZip': str, 'PWeek': str})        # IMPORTANT !!! - to read DATA !!!!
    df = df.drop_duplicates()
    return df
df = load_data(uploaded_file)
Date column separation to Day, Week, Month, Year
df['PDate'] = pd.to_datetime(df['PDate'])     # skip if your Date column already in datetime format
df.insert(7, "PDay", "PDate")
df['PDay'] = df['PDate'].dt.day_name()
df.insert(7, "PMonth", "PDate")
df['PMonth'] = df['PDate'].dt.month_name()
df.insert(7, "PYear", "PDate")
df['PYear'] = df['PDate'].dt.year
df['PWeek'] = df['PDate'].dt.strftime('%V')   # Important to get day of the week but atm its 53 weeks instead of 52
Section with chart with Year selection
with features:

    df.column = ['Shop', 'PMarket', 'Receiver', 'Buyer', 'PDay', 'PYear', 'PMonth', 'PState', 'PWeek', 'Status']    # Buyer - Me, mom, sister... etc , Receiver - who received the aquired purchase
    
    st.write(df)

    January, February, March, April, May, June, July, August, September, October, November, December = st.tabs(["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"])

    with January:

        market_options = df['PMarket'].unique().tolist()
        market_date_options = df['PDate'].unique().tolist()
        market_date = st.selectbox('Choose Year', market_date_options, 3)   # only 1,2,3 works - dont khow what it is , left for later inspection
        market_list = st.multiselect('Choose market area', market_options, ['Atlanta'])

        df = df[df['PMarket'].isin(market_list)]

        df_mc = pd.DataFrame(df.groupby(['PMarket', 'PDate']).size().reset_index())
        df_mc.columns = ['PMarket', 'PDate', 'Count']

        fig1 = px.bar(df_mc, x="PMarket", y="Count", color='PMarket', range_y=[0,30], text_auto=True)

        fig1.update_layout(width=1000)

        st.write(fig1)
I know im missing Month tabs, i just dont know where to put them so it would count by month count . My main goal is to Choose a year and see results From selected year by months in each tab and at the moment it count all years and year selector does not do anything.
I posted only January tab as other tabs are empty no code as i think rest of the month should be same code just change code for specific months. ?

I hope this should help more to understand my code and help me to solve it.
Thank You.
Reply
#4
I sorted out how to choose by year as i missed one option.
        df = df[df['PMarket'].isin(market_list)]
        df = df[df['PYear']==market_date]
df = df[df['PYear']==market_date] <--- this one allows me to choose year from select list.

df_mc.columns = ['PMarket', 'PDate', 'Count']
df_mc.columns = ['PMarket', 'Count'] <--- to show on my chart.

But now i realised all of this could be done using date picker in streamlit as im having a hard time to converting month, day, week tu datetime.

with features:

        market_options = df['PMarket'].unique().tolist()
        min_date = pd.to_datetime(df['PuDate'], errors='coerce')                     # PDate renamed to PuDate [ purchase date ]
        max_date = pd.to_datetime(df['PuDate'], errors='coerce') 
        value=(min(df['PuDate']), max(df['PuDate'])),

        
        market_date = st.date_input(
            "Date picker",
            min_value=min(df['PuDate']),
            max_value=max(df['PuDate']),
            value=(min(df['PuDate']), max(df['PuDate'])),
            format="YYYY/MM/DD"
        )
        market_list = st.multiselect('Choose market area', market_options, ['Atlanta'])

        df = df[df['PMarket'].isin(market_list)]
        df = df[df['PuDate']==market_date]

        df_mc = df.groupby(df['PMarket'])['PuDate'].count().reset_index()
        df_mc.columns = ['PMarket', 'Count']

        fig1 = px.bar(df_mc, x="PMarket", y="Count", color='PMarket', range_y=[0,30], text_auto=True)

        fig1.update_layout(width=1000)

        st.write(fig1)
    
But im getting an error.
Error:
ValueError: Lengths must match Traceback: File "/home/evo/koala/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script exec(code, module.__dict__) File "/home/evo/koala/koala.py", line 183, in <module> df = df[df['PuDate']==market_date] ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/evo/koala/lib/python3.11/site-packages/pandas/core/ops/common.py", line 81, in new_method return method(self, other) ^^^^^^^^^^^^^^^^^^^ File "/home/evo/koala/lib/python3.11/site-packages/pandas/core/arraylike.py", line 40, in __eq__ return self._cmp_method(other, operator.eq) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/evo/koala/lib/python3.11/site-packages/pandas/core/series.py", line 6096, in _cmp_method res_values = ops.comparison_op(lvalues, rvalues, op) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/evo/koala/lib/python3.11/site-packages/pandas/core/ops/array_ops.py", line 279, in comparison_op res_values = op(lvalues, rvalues) ^^^^^^^^^^^^^^^^^^^^ File "/home/evo/koala/lib/python3.11/site-packages/pandas/core/ops/common.py", line 81, in new_method return method(self, other) ^^^^^^^^^^^^^^^^^^^ File "/home/evo/koala/lib/python3.11/site-packages/pandas/core/arraylike.py", line 40, in __eq__ return self._cmp_method(other, operator.eq) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/evo/koala/lib/python3.11/site-packages/pandas/core/arrays/datetimelike.py", line 935, in _cmp_method other = self._validate_comparison_value(other) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/evo/koala/lib/python3.11/site-packages/pandas/core/arrays/datetimelike.py", line 574, in _validate_comparison_value raise ValueError("Lengths must match")
Im trying to imply my previous code.
Reply
#5
From the documentation: https://docs.streamlit.io/library/api-re...date_input

value (datetime.date or datetime.datetime or list/tuple of datetime.date or datetime.datetime or None)

The value of this widget when it first renders. If a list/tuple with 0 to 2 date/datetime values is provided, the datepicker will allow users to provide a range. Defaults to today as a single-date picker.

st.date_input returns (datetime.date or a tuple with 0-2 dates)

I think the format of the return value is set by the format of the value argument. You provided a tuple as the value argument, you should expect a tuple as the return type. I think the return type might also be an empty tuple, so you need to account for that possibility.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020