dataframe logic issues - Printable Version

dataframe logic issues - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: dataframe logic issues (/thread-40723.html)

dataframe logic issues - mbrown009 - Sep-13-2023

It is setting all values for box4 to the combination of the two values so of the else. What am I doing wrong? Do I have to convert those fields to numeric and then do the math like I am?

I am having questions on this section of code

if (cleandata['owneroccupancycode'].to_string() in ("Yes")):
    cleandata['box4'] = results_df['base_cycle_data_circuit_breaker_value_sdat_field_157']
else:
    cleandata['box4'] = results_df['base_cycle_data_land_value_sdat_field_154'] + results_df['base_cycle_data_improvements_value_sdat_field_155']

The full code is below:

#!/usr/bin/env python

# make sure to install these packages before running:
# pip install pandas
# pip install sodapy

import datetime as dt

from auth import applicationtoken
from functions import owneroccupancycondition, yearcondition, homesteadqualiticationcondition, taxcalculation, \
    semiannualpayments, semiannualpayments1
from limits import statelimit, annearundelcountylimit
from rates import statetaxrate, annearundeltaxrate
import pandas as pd
from sodapy import Socrata
import platform
from icecream import ic

os = (platform.system())

# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("opendata.maryland.gov", applicationtoken)

# Example authenticated client (needed for non-public datasets):
# client = Socrata(opendata.maryland.gov,
#                  MyAppToken,
#                  username="[email protected]",
#                  password="AFakePassword")

# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("q4mw-f34p", limit=3000)

# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)
results_df.to_csv('output/results_df.csv')

# creating dataframe + adding first field from the originating data frame
cleandata = results_df[['legal_description_line_2_mdp_field_legal2_sdat_field_18']].copy()

# changing column name in data frame
cleandata.rename(columns={'legal_description_line_2_mdp_field_legal2_sdat_field_18': 'address'}, inplace=True)

# copying over columns from originating data frame
cleandata['county'] = results_df['jurisdiction_code_mdp_field_jurscode']
cleandata['accountnumber'] = results_df['record_key_account_number_sdat_field_3']
cleandata['opendataupdate'] = results_df['date_of_most_recent_open_data_portal_record_update']
cleandata['key'] = cleandata['accountnumber']
cleandata['housetype'] = results_df['mdp_street_address_type_code_mdp_field_resityp']
cleandata['landuse'] = results_df['land_use_code_mdp_field_lu_desclu_sdat_field_50']
cleandata['exemptclass'] = results_df['exempt_class_mdp_field_exclass_descexcl_sdat_field_49']
cleandata['assessmentyear'] = results_df['assessment_cycle_year_sdat_field_399']
cleandata['currentyeartotalassessment'] = results_df['current_assessment_year_total_phase_in_value_sdat_field_171']
cleandata['owneroccupancycode'] = results_df['record_key_owner_occupancy_code_mdp_field_ooi_sdat_field_6']
cleandata['homesteadcreditqualificationcode'] = results_df[
    'homestead_qualification_code_mdp_field_homqlcod_sdat_field_259']
cleandata['homesteadqualificationdate'] = results_df['homestead_qualification_date_mdp_field_homqldat_sdat_field_260']
cleandata['yearbuilt'] = results_df['c_a_m_a_system_data_year_built_yyyy_mdp_field_yearblt_sdat_field_235']
cleandata['datepurchased'] = results_df['sales_segment_1_transfer_date_yyyy_mm_dd_mdp_field_tradate_sdat_field_89']
cleandata['zoning'] = results_df['zoning_code_mdp_field_zoning_sdat_field_45']

cleandata['box2'] = results_df['prior_assessment_year_total_assessment_sdat_field_161']

# converting conditions
cleandata['owneroccupancycode'] = cleandata['owneroccupancycode'].apply(owneroccupancycondition)
cleandata['homesteadcreditqualificationcode'] = cleandata['homesteadcreditqualificationcode'].apply(
    homesteadqualiticationcondition)

if (cleandata['owneroccupancycode'].to_string() in ("Yes")):
    cleandata['box4'] = results_df['base_cycle_data_circuit_breaker_value_sdat_field_157']
else:
    cleandata['box4'] = results_df['base_cycle_data_land_value_sdat_field_154'] + results_df['base_cycle_data_improvements_value_sdat_field_155']


cleandata['box5'] = (
    results_df)['current_cycle_data_land_value_mdp_field_names_nfmlndvl_curlndvl_and_sallndvl_sdat_field_164']
cleandata['box6'] = (
    results_df)['current_cycle_data_improvements_value_mdp_field_names_nfmimpvl_curimpvl_and_salimpvl_sdat_field_165']
cleandata['box7'] = results_df['current_cycle_data_circuit_breaker_value_sdat_field_167']
cleandata['box8'] = cleandata['box2']
cleandata['box9'] = results_df['current_assessment_year_circuit_breaker_assessment_sdat_field_176']
cleandata['box10'] = cleandata['box7']

# Converting all to numeric
cleandata['box2'] = pd.to_numeric(cleandata['box2'])
cleandata['box4'] = pd.to_numeric(cleandata['box4'])
cleandata['box5'] = pd.to_numeric(cleandata['box5'])
cleandata['box6'] = pd.to_numeric(cleandata['box6'])
cleandata['box7'] = pd.to_numeric(cleandata['box7'])
cleandata['box8'] = pd.to_numeric(cleandata['box8'])
cleandata['box9'] = pd.to_numeric(cleandata['box9'])
cleandata['box10'] = pd.to_numeric(cleandata['box10'])

# DEBUGGING
# print columns & types
# print(cleandata.dtypes)

year1 = cleandata['assessmentyear']
# year2=year1+1
# year3=year2+1

cleandata['totalchange'] = cleandata['box7'] - cleandata['box4']

# converting
cleandata['assessmentyear'] = [dt.datetime.strptime(x, '%Y').year for x in cleandata['assessmentyear']]
cleandata['accountnumber'] = pd.to_numeric(cleandata['accountnumber'])
cleandata['key'] = pd.to_numeric(cleandata['key'])

# setting index
cleandata = cleandata.set_index('key')

# Removing bad data points
# These are common areas or open areas that are not owned by a resident but by the Association
cleandata = cleandata.drop(60590243392)
cleandata = cleandata.drop(60590243393)
cleandata = cleandata.drop(60590243394)
cleandata = cleandata.drop(79790223423)
cleandata = cleandata.drop(79790223432)
cleandata = cleandata.drop(79790223433)
cleandata = cleandata.drop(79790231070)
cleandata = cleandata.drop(79790231447)
cleandata = cleandata.drop(79790234449)
cleandata = cleandata.drop(79790235116)
cleandata = cleandata.drop(79790240734)
cleandata = cleandata.drop(79790240735)
cleandata = cleandata.drop(79790240744)
cleandata = cleandata.drop(79790245914)

# Modifying field value to be "TH"
# There is one townhome that is listed as SF and the rest are blank
cleandata.loc[cleandata["accountnumber"] == 60590243390, "housetype"] = "TH"
cleandata.loc[cleandata["accountnumber"] == 60590254221, "housetype"] = "TH"
cleandata.loc[cleandata["accountnumber"] == 60590254222, "housetype"] = "TH"
cleandata.loc[cleandata["accountnumber"] == 60590254223, "housetype"] = "TH"
cleandata.loc[cleandata["accountnumber"] == 60590254224, "housetype"] = "TH"

# Updating year built
cleandata.loc[cleandata["accountnumber"] == 79790230860, "yearbuilt"] = "2011"
cleandata.loc[cleandata["accountnumber"] == 79790230900, "yearbuilt"] = "2010"
cleandata.loc[cleandata["accountnumber"] == 79790234258, "yearbuilt"] = "2013"

cleandata['yearbuilt'] = cleandata['yearbuilt'].apply(yearcondition)

# Debugging to see number of unique housetypes and how many
ic(cleandata['housetype'].value_counts())

# Debugging to see owner occupancy status of all homes
ic(cleandata['owneroccupancycode'].value_counts())

# Debugging to see when homes were built
ic(cleandata['yearbuilt'].value_counts())

# year 1 calculation
cleandata['year1difference'] = cleandata['box8'] - cleandata['box4']
cleandata['year1countylimit'] = cleandata['box4'] + (cleandata['box4'] * annearundelcountylimit)
cleandata['year1statelimit'] = cleandata['box4'] + (cleandata['box4'] * statelimit)
cleandata['year1countydifference'] = cleandata['box8'] - cleandata['year1countylimit']
cleandata['year1statedifference'] = cleandata['box8'] - cleandata['year1statelimit']

# year1 county credit calculation
cleandata.loc[cleandata['year1countydifference'] < 0, 'year1countycredit'] = 0
cleandata.loc[cleandata['year1countydifference'] > 0, 'year1countycredit'] = (
        (cleandata['year1countydifference'] * annearundeltaxrate) / 100)

# year 1 state credit calculation
cleandata.loc[cleandata['year1statedifference'] < 0, 'year1statecredit'] = 0
cleandata.loc[cleandata['year1statedifference'] > 0, 'year1statecredit'] = (
        (cleandata['year1statedifference'] * statetaxrate) / 100)

cleandata['year1totalcredit'] = cleandata['year1countycredit'] + cleandata['year1statecredit']
# year 1 straight real estate tax payment without exempt class
cleandata['year1countyrealestate'] = (cleandata['box8'] * annearundeltaxrate) / 100
cleandata['year1staterealestate'] = (cleandata['box8'] * statetaxrate) / 100
cleandata["year1total"] = cleandata.apply(
    lambda x: taxcalculation(x["owneroccupancycode"], x["homesteadcreditqualificationcode"], x["exemptclass"],
                             x["year1countyrealestate"], x["year1staterealestate"], x["year1countycredit"],
                             x["year1statecredit"]), axis=1)

# TODO: Remove this call as this is only being used to verify the new function that was created
cleandata["year1paytest"] = cleandata.apply(
    lambda x: semiannualpayments1(x["owneroccupancycode"], x["year1total"], x["county"]),
    axis=1)

# Get the columns you want to pass to the function
year3 = cleandata[["owneroccupancycode", "year1total", "county"]]
# Apply the function
cleandata[["year1pay1", "year1pay2"]] = year3.apply(semiannualpayments, axis=1, result_type='expand')

# year 2 calculation
cleandata['year2countylimit'] = cleandata['year1countylimit'] + (cleandata['year1countylimit'] * annearundelcountylimit)
cleandata['year2statelmit'] = cleandata['box2'] + (cleandata['box2'] * statelimit)
cleandata['year2countydifference'] = cleandata['box9'] - cleandata['year2countylimit']
cleandata['year2statedifference'] = cleandata['box8'] - cleandata['year2statelmit']

# year 2 county credit calculation
cleandata.loc[cleandata['year2countydifference'] < 0, 'year2countycredit'] = 0
cleandata.loc[cleandata['year2countydifference'] > 0, 'year2countycredit'] = (
        (cleandata['year2countydifference'] * annearundeltaxrate) / 100)

# year 2 state credit calculation
cleandata.loc[cleandata['year2statedifference'] < 0, 'year2statecredit'] = 0
cleandata.loc[cleandata['year2statedifference'] > 0, 'year2statecredit'] = (
        (cleandata['year2statedifference'] * statetaxrate) / 100)

cleandata['year2totalcredit'] = cleandata['year2countycredit'] + cleandata['year2statecredit']


# year 2 straight real estate tax payment without exempt class
cleandata['year2countyrealestate'] = (cleandata['box9'] * annearundeltaxrate) / 100
cleandata['year2staterealestate'] = (cleandata['box9'] * statetaxrate) / 100
cleandata["year2total"] = cleandata.apply(
    lambda x: taxcalculation(x["owneroccupancycode"], x["homesteadcreditqualificationcode"], x["exemptclass"],
                             x["year2countyrealestate"], x["year2staterealestate"], x["year2countycredit"],
                             x["year2statecredit"]), axis=1)

# year 3 calculation
cleandata['year3countylimit'] = cleandata['year2countylimit'] + (cleandata['year2countylimit'] * annearundelcountylimit)
cleandata['year3statelimit'] = cleandata['box9'] + (cleandata['box9'] * statelimit)
cleandata['year3countydifference'] = cleandata['box10'] - cleandata['year3countylimit']
cleandata['year3statedifference'] = cleandata['box9'] - cleandata['year3statelimit']

# year 3 county credit calculation
cleandata.loc[cleandata['year3countydifference'] < 0, 'year3countycredit'] = 0
cleandata.loc[cleandata['year3countydifference'] > 0, 'year3countycredit'] = (
        (cleandata['year3countydifference'] * annearundeltaxrate) / 100)

# year 3 state credit calculation
cleandata.loc[cleandata['year3statedifference'] < 0, 'year3statecredit'] = 0
cleandata.loc[cleandata['year3statedifference'] > 0, 'year3statecredit'] = (
        (cleandata['year3statedifference'] * statetaxrate) / 100)

cleandata['year3totalcredit'] = cleandata['year3countycredit'] + cleandata['year3statecredit']


# year3 straight real estate tax payment without exempt class
cleandata['year3countyrealestate'] = (cleandata['box10'] * annearundeltaxrate) / 100
cleandata['year3staterealestate'] = (cleandata['box10'] * statetaxrate) / 100

# test = test(cleandata['owneroccupancycode'], cleandata['homesteadcreditqualificationcode'], cleandata['exemptclass'],
#            cleandata['year3countyrealestate'], cleandata['year3staterealestate'], cleandata['year3countycredit'],
#            cleandata['year3statecredit'])

cleandata["year3total"] = cleandata.apply(
    lambda x: taxcalculation(x["owneroccupancycode"], x["homesteadcreditqualificationcode"], x["exemptclass"],
                             x["year3countyrealestate"], x["year3staterealestate"], x["year3countycredit"],
                             x["year3statecredit"]), axis=1)

# TODO: Remove this call as this is only being used to verify the new function that was created
cleandata["year3paytest"] = cleandata.apply(
    lambda x: semiannualpayments1(x["owneroccupancycode"], x["year3total"], x["county"]),
    axis=1)

year3 = cleandata[["owneroccupancycode", "year3total", "county"]]
cleandata[["year3pay1", "year3pay2"]] = year3.apply(semiannualpayments, axis=1, result_type='expand')


townhomes = cleandata.copy()
townhomes.drop(townhomes[townhomes['housetype'] == "SF"].index, inplace=True)

singlefamily = cleandata.copy()
singlefamily.drop(singlefamily[singlefamily['housetype'] == "TH"].index, inplace=True)

# Overall
cleandata["totalchange"] = pd.to_numeric(cleandata["totalchange"])
overallaveragechange = cleandata["totalchange"].mean()
overallaveragechange = "${:,.2f}".format(overallaveragechange)
overallminimumchange = cleandata['totalchange'].min()
overallmaximumchange = cleandata['totalchange'].max()
overallminimumchange = "${:,.2f}".format(overallminimumchange)
overallmaximumchange = "${:,.2f}".format(overallmaximumchange)
print('')
print('The average home assessment change in Tanyard Springs is ', overallaveragechange, ' and ranges from ',
      overallminimumchange, ' and ', overallmaximumchange)

# Townhomes
townhomes["totalchange"] = pd.to_numeric(townhomes["totalchange"])
townhomesaveragechange = townhomes["totalchange"].mean()
townhomesaveragechange = "${:,.2f}".format(townhomesaveragechange)
townhouseminimumchange = townhomes['totalchange'].min()
townhousemaximumchange = townhomes['totalchange'].max()
townhouseminimumchange = "${:,.2f}".format(townhouseminimumchange)
townhousemaximumchange = "${:,.2f}".format(townhousemaximumchange)
print('')
print('The average TownHome assessment change in Tanyard Springs is ', townhomesaveragechange, ' and ranges from ',
      townhouseminimumchange, ' and ', townhousemaximumchange)

# SingleFamily
singlefamily["totalchange"] = pd.to_numeric(singlefamily["totalchange"])
singlefamilyaveragechange = singlefamily["totalchange"].mean()
singlefamilyaveragechange = "${:,.2f}".format(singlefamilyaveragechange)
singlefamilyminimumchange = singlefamily['totalchange'].min()
singlefamilymaximumchange = singlefamily['totalchange'].max()
singlefamilyminimumchange = "${:,.2f}".format(singlefamilyminimumchange)
singlefamilymaximumchange = "${:,.2f}".format(singlefamilymaximumchange)
print('')
print('The average SFH assessment change in Tanyard Springs is ', singlefamilyaveragechange, ' and ranges from ',
      singlefamilyminimumchange, ' and ', singlefamilymaximumchange)

townhomeyear1 = townhomes['year1total'].sum()
townhomeyear2 = townhomes['year2total'].sum()
townhomeyear3 = townhomes['year3total'].sum()
singlefamilyyear1 = singlefamily['year1total'].sum()
singlefamilyyear2 = singlefamily['year2total'].sum()
singlefamilyyear3 = singlefamily['year3total'].sum()

print('')
print('')
print('The townhome total year 1 is ', townhomeyear1)
print('The townhome total year 2 is ', townhomeyear2)
print('The townhome total year 3 is ', townhomeyear3)
print('')
print('The single family total year 1 is ', singlefamilyyear1)
print('The single family total year 2 is ', singlefamilyyear2)
print('The single family total year 3 is ', singlefamilyyear3)

# Saving to CSV
cleandata.to_csv('output/cleandata.csv')
townhomes.to_csv('output/townhomes.csv')
singlefamily.to_csv('output/singlefamily.csv'

RE: dataframe logic issues - deanhystad - Sep-13-2023

This is a series:

cleandata['owneroccupancycode']

It is never going to be in "Yes", It might be a series full of "Yes", but it is never going to be Y, e, s, Ye, es or Yes.

Are you wondering why I wrote it that way (Y, e, s, Ye, es or Yes)? I wrote it that way because in("Yes") does not mean what you think it does:

a in ("Yes") # Does not check if a == "Yes"

# Is interpreted as:
a in "Yes"  # Which works if a = "Y" or "e" or "s" or "Ye" or "es" or "Yes"

# Did you mean:
a in ("Yes",)  # Checking if a matches one of the strings in the tuple.

Which acts like this:
a in ["Yes"]

When using () to make a tuple, you must have at least 1 comma, or the parenthesis are used for grouping instead of making a tuple.

But even if that part was correct, this part is wrong:

    cleandata['box4'] = results_df['base_cycle_data_circuit_breaker_value_sdat_field_157']
else:
    cleandata['box4'] = results_df['base_cycle_data_land_value_sdat_field_154'] + results_df['base_cycle_data_improvements_value_sdat_field_155']

This might be a good place to use numpy. In particular numpy.where()

import pandas as pd
import numpy as np

# Make a dataframe
df = pd.DataFrame({"A": (1, 3, 5, 7), "B": (2, 4, 6, 8), "C": list("YNYN")})

# Add column that uses a condition to set the values
df["D"] = np.where(df["C"] == "Y", df["A"] + df["B"], df["A"])
print(df)

Output:   A  B  C   D
0  1  2  Y   3
1  3  4  N   3
2  5  6  Y  11
3  7  8  N   7

Or you could DataFrame.loc

import pandas as pd


def func(row):

df = pd.DataFrame({"A": (1, 3, 5, 7), "B": (2, 4, 6, 8), "C": list("YNYN")})
df["D"] = df["A"]  # Valuse if df["C"] != "Y"
df.loc[df["C"] == "Y", "D"] = df["A"] + df["B"]  # Only change rows where df["C"] == "Y"
print(df)

Or you could use DataFrame.apply()

import pandas as pd


def func(a, b, c):
    return a + b if c == "Y" else a


df = pd.DataFrame({"A": (1, 3, 5, 7), "B": (2, 4, 6, 8), "C": list("YNYN")})
df["D"] = df.apply(lambda x: func(x["A"], x["B"], x["C"]), axis=1)
print(df)

I advise against using apply for large dataframes as it has to call back into your python function for each row in the dataframe. The previous approaches run mostly inside pandas (or numpy).

RE: dataframe logic issues - mbrown009 - Sep-14-2023

Thank you. I have tried this but the same thing is happening for box4.Thank you. I have tried this but the same thing is happening for box4.

#!/usr/bin/env python

# make sure to install these packages before running:
# pip install pandas
# pip install sodapy

import datetime as dt

from auth import applicationtoken
from functions import owneroccupancycondition, yearcondition, homesteadqualiticationcondition, taxcalculation, \
    semiannualpayments, semiannualpayments1
from limits import statelimit, annearundelcountylimit
from rates import statetaxrate, annearundeltaxrate
import pandas as pd
from sodapy import Socrata
import platform
from icecream import ic
import numpy as np

os = (platform.system())

# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("opendata.maryland.gov", applicationtoken)

# Example authenticated client (needed for non-public datasets):
# client = Socrata(opendata.maryland.gov,
#                  MyAppToken,
#                  username="[email protected]",
#                  password="AFakePassword")

# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("q4mw-f34p", limit=3000)

# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)
results_df.to_csv('output/results_df.csv')

# creating dataframe + adding first field from the originating data frame
cleandata = results_df[['legal_description_line_2_mdp_field_legal2_sdat_field_18']].copy()

# changing column name in data frame
cleandata.rename(columns={'legal_description_line_2_mdp_field_legal2_sdat_field_18': 'address'}, inplace=True)

# copying over columns from originating data frame
cleandata['county'] = results_df['jurisdiction_code_mdp_field_jurscode']
cleandata['accountnumber'] = results_df['record_key_account_number_sdat_field_3']
cleandata['opendataupdate'] = results_df['date_of_most_recent_open_data_portal_record_update']
cleandata['key'] = cleandata['accountnumber']
cleandata['housetype'] = results_df['mdp_street_address_type_code_mdp_field_resityp']
cleandata['landuse'] = results_df['land_use_code_mdp_field_lu_desclu_sdat_field_50']
cleandata['exemptclass'] = results_df['exempt_class_mdp_field_exclass_descexcl_sdat_field_49']
cleandata['assessmentyear'] = results_df['assessment_cycle_year_sdat_field_399']
cleandata['currentyeartotalassessment'] = results_df['current_assessment_year_total_phase_in_value_sdat_field_171']
cleandata['owneroccupancycode'] = results_df['record_key_owner_occupancy_code_mdp_field_ooi_sdat_field_6']
cleandata['homesteadcreditqualificationcode'] = results_df[
    'homestead_qualification_code_mdp_field_homqlcod_sdat_field_259']
cleandata['homesteadqualificationdate'] = results_df['homestead_qualification_date_mdp_field_homqldat_sdat_field_260']
cleandata['yearbuilt'] = results_df['c_a_m_a_system_data_year_built_yyyy_mdp_field_yearblt_sdat_field_235']
cleandata['datepurchased'] = results_df['sales_segment_1_transfer_date_yyyy_mm_dd_mdp_field_tradate_sdat_field_89']
cleandata['zoning'] = results_df['zoning_code_mdp_field_zoning_sdat_field_45']

cleandata['box2'] = results_df['prior_assessment_year_total_assessment_sdat_field_161']

# converting conditions
cleandata['owneroccupancycode'] = cleandata['owneroccupancycode'].apply(owneroccupancycondition)
cleandata['homesteadcreditqualificationcode'] = cleandata['homesteadcreditqualificationcode'].apply(
    homesteadqualiticationcondition)

cleandata['box4'] = np.where(cleandata['owneroccupancycode'] == "No",results_df['base_cycle_data_land_value_sdat_field_154'] + results_df['base_cycle_data_improvements_value_sdat_field_155'], results_df['base_cycle_data_circuit_breaker_value_sdat_field_157'] )

cleandata['box5'] = (
    results_df)['current_cycle_data_land_value_mdp_field_names_nfmlndvl_curlndvl_and_sallndvl_sdat_field_164']
cleandata['box6'] = (
    results_df)['current_cycle_data_improvements_value_mdp_field_names_nfmimpvl_curimpvl_and_salimpvl_sdat_field_165']
cleandata['box7'] = results_df['current_cycle_data_circuit_breaker_value_sdat_field_167']
cleandata['box8'] = cleandata['box2']
cleandata['box9'] = results_df['current_assessment_year_circuit_breaker_assessment_sdat_field_176']
cleandata['box10'] = cleandata['box7']

# Converting all to numeric
cleandata['box2'] = pd.to_numeric(cleandata['box2'])
cleandata['box4'] = pd.to_numeric(cleandata['box4'])
cleandata['box5'] = pd.to_numeric(cleandata['box5'])
cleandata['box6'] = pd.to_numeric(cleandata['box6'])
cleandata['box7'] = pd.to_numeric(cleandata['box7'])
cleandata['box8'] = pd.to_numeric(cleandata['box8'])
cleandata['box9'] = pd.to_numeric(cleandata['box9'])
cleandata['box10'] = pd.to_numeric(cleandata['box10'])

# DEBUGGING
# print columns & types
# print(cleandata.dtypes)

year1 = cleandata['assessmentyear']
# year2=year1+1
# year3=year2+1

cleandata['totalchange'] = cleandata['box7'] - cleandata['box4']

# converting
cleandata['assessmentyear'] = [dt.datetime.strptime(x, '%Y').year for x in cleandata['assessmentyear']]
cleandata['accountnumber'] = pd.to_numeric(cleandata['accountnumber'])
cleandata['key'] = pd.to_numeric(cleandata['key'])

# setting index
cleandata = cleandata.set_index('key')

# Removing bad data points
# These are common areas or open areas that are not owned by a resident but by the Association
cleandata = cleandata.drop(60590243392)
cleandata = cleandata.drop(60590243393)
cleandata = cleandata.drop(60590243394)
cleandata = cleandata.drop(79790223423)
cleandata = cleandata.drop(79790223432)
cleandata = cleandata.drop(79790223433)
cleandata = cleandata.drop(79790231070)
cleandata = cleandata.drop(79790231447)
cleandata = cleandata.drop(79790234449)
cleandata = cleandata.drop(79790235116)
cleandata = cleandata.drop(79790240734)
cleandata = cleandata.drop(79790240735)
cleandata = cleandata.drop(79790240744)
cleandata = cleandata.drop(79790245914)

# Modifying field value to be "TH"
# There is one townhome that is listed as SF and the rest are blank
cleandata.loc[cleandata["accountnumber"] == 60590243390, "housetype"] = "TH"
cleandata.loc[cleandata["accountnumber"] == 60590254221, "housetype"] = "TH"
cleandata.loc[cleandata["accountnumber"] == 60590254222, "housetype"] = "TH"
cleandata.loc[cleandata["accountnumber"] == 60590254223, "housetype"] = "TH"
cleandata.loc[cleandata["accountnumber"] == 60590254224, "housetype"] = "TH"

# Updating year built
cleandata.loc[cleandata["accountnumber"] == 79790230860, "yearbuilt"] = "2011"
cleandata.loc[cleandata["accountnumber"] == 79790230900, "yearbuilt"] = "2010"
cleandata.loc[cleandata["accountnumber"] == 79790234258, "yearbuilt"] = "2013"

cleandata['yearbuilt'] = cleandata['yearbuilt'].apply(yearcondition)

# Debugging to see number of unique housetypes and how many
ic(cleandata['housetype'].value_counts())

# Debugging to see owner occupancy status of all homes
ic(cleandata['owneroccupancycode'].value_counts())

# Debugging to see when homes were built
ic(cleandata['yearbuilt'].value_counts())

# year 1 calculation
cleandata['year1difference'] = cleandata['box8'] - cleandata['box4']
cleandata['year1countylimit'] = cleandata['box4'] + (cleandata['box4'] * annearundelcountylimit)
cleandata['year1statelimit'] = cleandata['box4'] + (cleandata['box4'] * statelimit)
cleandata['year1countydifference'] = cleandata['box8'] - cleandata['year1countylimit']
cleandata['year1statedifference'] = cleandata['box8'] - cleandata['year1statelimit']

# year1 county credit calculation
cleandata.loc[cleandata['year1countydifference'] < 0, 'year1countycredit'] = 0
cleandata.loc[cleandata['year1countydifference'] > 0, 'year1countycredit'] = (
        (cleandata['year1countydifference'] * annearundeltaxrate) / 100)

# year 1 state credit calculation
cleandata.loc[cleandata['year1statedifference'] < 0, 'year1statecredit'] = 0
cleandata.loc[cleandata['year1statedifference'] > 0, 'year1statecredit'] = (
        (cleandata['year1statedifference'] * statetaxrate) / 100)

cleandata['year1totalcredit'] = cleandata['year1countycredit'] + cleandata['year1statecredit']
# year 1 straight real estate tax payment without exempt class
cleandata['year1countyrealestate'] = (cleandata['box8'] * annearundeltaxrate) / 100
cleandata['year1staterealestate'] = (cleandata['box8'] * statetaxrate) / 100
cleandata["year1total"] = cleandata.apply(
    lambda x: taxcalculation(x["owneroccupancycode"], x["homesteadcreditqualificationcode"], x["exemptclass"],
                             x["year1countyrealestate"], x["year1staterealestate"], x["year1countycredit"],
                             x["year1statecredit"]), axis=1)

# TODO: Remove this call as this is only being used to verify the new function that was created
cleandata["year1paytest"] = cleandata.apply(
    lambda x: semiannualpayments1(x["owneroccupancycode"], x["year1total"], x["county"]),
    axis=1)

# Get the columns you want to pass to the function
year3 = cleandata[["owneroccupancycode", "year1total", "county"]]
# Apply the function
cleandata[["year1pay1", "year1pay2"]] = year3.apply(semiannualpayments, axis=1, result_type='expand')

# year 2 calculation
cleandata['year2countylimit'] = cleandata['year1countylimit'] + (cleandata['year1countylimit'] * annearundelcountylimit)
cleandata['year2statelmit'] = cleandata['box2'] + (cleandata['box2'] * statelimit)
cleandata['year2countydifference'] = cleandata['box9'] - cleandata['year2countylimit']
cleandata['year2statedifference'] = cleandata['box8'] - cleandata['year2statelmit']

# year 2 county credit calculation
cleandata.loc[cleandata['year2countydifference'] < 0, 'year2countycredit'] = 0
cleandata.loc[cleandata['year2countydifference'] > 0, 'year2countycredit'] = (
        (cleandata['year2countydifference'] * annearundeltaxrate) / 100)

# year 2 state credit calculation
cleandata.loc[cleandata['year2statedifference'] < 0, 'year2statecredit'] = 0
cleandata.loc[cleandata['year2statedifference'] > 0, 'year2statecredit'] = (
        (cleandata['year2statedifference'] * statetaxrate) / 100)

cleandata['year2totalcredit'] = cleandata['year2countycredit'] + cleandata['year2statecredit']


# year 2 straight real estate tax payment without exempt class
cleandata['year2countyrealestate'] = (cleandata['box9'] * annearundeltaxrate) / 100
cleandata['year2staterealestate'] = (cleandata['box9'] * statetaxrate) / 100
cleandata["year2total"] = cleandata.apply(
    lambda x: taxcalculation(x["owneroccupancycode"], x["homesteadcreditqualificationcode"], x["exemptclass"],
                             x["year2countyrealestate"], x["year2staterealestate"], x["year2countycredit"],
                             x["year2statecredit"]), axis=1)

# year 3 calculation
cleandata['year3countylimit'] = cleandata['year2countylimit'] + (cleandata['year2countylimit'] * annearundelcountylimit)
cleandata['year3statelimit'] = cleandata['box9'] + (cleandata['box9'] * statelimit)
cleandata['year3countydifference'] = cleandata['box10'] - cleandata['year3countylimit']
cleandata['year3statedifference'] = cleandata['box9'] - cleandata['year3statelimit']

# year 3 county credit calculation
cleandata.loc[cleandata['year3countydifference'] < 0, 'year3countycredit'] = 0
cleandata.loc[cleandata['year3countydifference'] > 0, 'year3countycredit'] = (
        (cleandata['year3countydifference'] * annearundeltaxrate) / 100)

# year 3 state credit calculation
cleandata.loc[cleandata['year3statedifference'] < 0, 'year3statecredit'] = 0
cleandata.loc[cleandata['year3statedifference'] > 0, 'year3statecredit'] = (
        (cleandata['year3statedifference'] * statetaxrate) / 100)

cleandata['year3totalcredit'] = cleandata['year3countycredit'] + cleandata['year3statecredit']


# year3 straight real estate tax payment without exempt class
cleandata['year3countyrealestate'] = (cleandata['box10'] * annearundeltaxrate) / 100
cleandata['year3staterealestate'] = (cleandata['box10'] * statetaxrate) / 100

# test = test(cleandata['owneroccupancycode'], cleandata['homesteadcreditqualificationcode'], cleandata['exemptclass'],
#            cleandata['year3countyrealestate'], cleandata['year3staterealestate'], cleandata['year3countycredit'],
#            cleandata['year3statecredit'])

cleandata["year3total"] = cleandata.apply(
    lambda x: taxcalculation(x["owneroccupancycode"], x["homesteadcreditqualificationcode"], x["exemptclass"],
                             x["year3countyrealestate"], x["year3staterealestate"], x["year3countycredit"],
                             x["year3statecredit"]), axis=1)

# TODO: Remove this call as this is only being used to verify the new function that was created
cleandata["year3paytest"] = cleandata.apply(
    lambda x: semiannualpayments1(x["owneroccupancycode"], x["year3total"], x["county"]),
    axis=1)

year3 = cleandata[["owneroccupancycode", "year3total", "county"]]
cleandata[["year3pay1", "year3pay2"]] = year3.apply(semiannualpayments, axis=1, result_type='expand')


townhomes = cleandata.copy()
townhomes.drop(townhomes[townhomes['housetype'] == "SF"].index, inplace=True)

singlefamily = cleandata.copy()
singlefamily.drop(singlefamily[singlefamily['housetype'] == "TH"].index, inplace=True)

# Overall
cleandata["totalchange"] = pd.to_numeric(cleandata["totalchange"])
overallaveragechange = cleandata["totalchange"].mean()
overallaveragechange = "${:,.2f}".format(overallaveragechange)
overallminimumchange = cleandata['totalchange'].min()
overallmaximumchange = cleandata['totalchange'].max()
overallminimumchange = "${:,.2f}".format(overallminimumchange)
overallmaximumchange = "${:,.2f}".format(overallmaximumchange)
print('')
print('The average home assessment change in Tanyard Springs is ', overallaveragechange, ' and ranges from ',
      overallminimumchange, ' and ', overallmaximumchange)

# Townhomes
townhomes["totalchange"] = pd.to_numeric(townhomes["totalchange"])
townhomesaveragechange = townhomes["totalchange"].mean()
townhomesaveragechange = "${:,.2f}".format(townhomesaveragechange)
townhouseminimumchange = townhomes['totalchange'].min()
townhousemaximumchange = townhomes['totalchange'].max()
townhouseminimumchange = "${:,.2f}".format(townhouseminimumchange)
townhousemaximumchange = "${:,.2f}".format(townhousemaximumchange)
print('')
print('The average TownHome assessment change in Tanyard Springs is ', townhomesaveragechange, ' and ranges from ',
      townhouseminimumchange, ' and ', townhousemaximumchange)

# SingleFamily
singlefamily["totalchange"] = pd.to_numeric(singlefamily["totalchange"])
singlefamilyaveragechange = singlefamily["totalchange"].mean()
singlefamilyaveragechange = "${:,.2f}".format(singlefamilyaveragechange)
singlefamilyminimumchange = singlefamily['totalchange'].min()
singlefamilymaximumchange = singlefamily['totalchange'].max()
singlefamilyminimumchange = "${:,.2f}".format(singlefamilyminimumchange)
singlefamilymaximumchange = "${:,.2f}".format(singlefamilymaximumchange)
print('')
print('The average SFH assessment change in Tanyard Springs is ', singlefamilyaveragechange, ' and ranges from ',
      singlefamilyminimumchange, ' and ', singlefamilymaximumchange)

townhomeyear1 = townhomes['year1total'].sum()
townhomeyear2 = townhomes['year2total'].sum()
townhomeyear3 = townhomes['year3total'].sum()
singlefamilyyear1 = singlefamily['year1total'].sum()
singlefamilyyear2 = singlefamily['year2total'].sum()
singlefamilyyear3 = singlefamily['year3total'].sum()

print('')
print('')
print('The townhome total year 1 is ', townhomeyear1)
print('The townhome total year 2 is ', townhomeyear2)
print('The townhome total year 3 is ', townhomeyear3)
print('')
print('The single family total year 1 is ', singlefamilyyear1)
print('The single family total year 2 is ', singlefamilyyear2)
print('The single family total year 3 is ', singlefamilyyear3)

# Saving to CSV
cleandata.to_csv('output/cleandata.csv')
townhomes.to_csv('output/townhomes.csv')
singlefamily.to_csv('output/singlefamily.csv')

RE: dataframe logic issues - deanhystad - Sep-14-2023

And?? Is the screen shot supposed to show something is not working or that it is working?

RE: dataframe logic issues - mbrown009 - Sep-14-2023

I found the issue out. The issue was that the type was object vs int so the issue was it was concatinating two objects together vs adding them up. i had to convert it float and it worked so I should be good now.

RE: dataframe logic issues - deanhystad - Sep-14-2023

You should try DataFrame.inver_objects()

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.infer_objects.html

Quote:Attempt to infer better dtypes for object columns.

Attempts soft conversion of object-dtyped columns, leaving non-object and unconvertible columns unchanged. The inference rules are the same as during normal Series/DataFrame construction.