Python Forum

Pages: 1 2

The code shown below produces the following error code on line 168.

Error:File "C:\Users\Newport_j\Downloads\Formatting NFL data for doing data science with Python1.py", line 168, in formation
    form = row['offenseFormation'].strip()

AttributeError: 'float' object has no attribute 'strip'

            elif p[2:4] == 'DB':

                db_count = int(p[0])


    return pd.Series([

Please note on my post line 168 is 33.

Any help appreciated. Thanks in advance.

Respectfully,

ErnestTbass
            rb_count,
            te_count,
            wr_count,
            ol_count,
            dl_count,
            db_count,
    ])


# In[ ]:


df[[
    'rb_count',
    'te_count',
    'wr_count',
    'ol_count',
    'dl_count',
    'db_count',
    ]] = df.apply(transform_off_personnel, axis=1)

df['offenseFormation'] = df['offenseFormation'].map(lambda f: ('EMPTY' if pd.isna(False) else f))


def formation(row):

    form = row['offenseFormation'].strip()

    if form == 'SHOTGUN':

        return 0
    elif form == 'SINGLEBACK':

        return 1
    elif form == 'EMPTY':

Please note that line 168 onmy post is line 33.

Respectfully,

ErnestTBass

It means row['offenseFormation'] is a float number, not a string. You can quicly verify this by embedding some print statements in your code.

form = row['offenseFormation']
print(type(form), form)
form = form.strip()

Okay, I will try that. How did row become a float. You cannot have row 2.3, only integers can represent rows.
It is simple logic to any engineer.

Respectfully,

ErnestTBass

row is not a float, row['offenseFormation'] is a float. How did it get be a float? How am I supposed to know. You didn't include the code that calls formation(row). As far as I know row[] could be anything. Python says it is a float, so I am going to accept that.

You would have more luck if you stop expecting things to be a certain way and accept what actually happens. If Python says row[something] is a float, it is probably a float. A bug is when the program does not perform as expected. When debugging your code you need to be open to the idea that you screwed up and the program is not doing what it is supposed to do. You blind yourself with your stubbornness. The sooner you can accept that computers are not impressed by by your programming skills and enjoy making you feel like an idiot the sooner you will find your errors and correct them. It is a tough lesson.

I know that Python variables are dynamically typed, not statically typed. So somewhere in the code lead up to this runtime error it perceived that to be a float.

That is not my intent. Could you send me to a website that explains this type of error and how to fix?

I do not want this error. I have seen ones like it before.

I am not sure how to get rid of it.

I am just a product of the times. I have programed a lot it other languages, but never an interpreted that dynamically types variables.

Any help appreciated.

Respectfully,

ErnestTBass

There is no "way to fix it" for "this type of error". The way to fix it is understand the error and correct the code that caused the error. The way to understand it is trace the error back to the source. The error message only indicates where the Python could no longer do what was instructed. The actual error likely occurred elsewhere.

If row['offenseFormation'] is not what you expect why is that? Is row what you expect it to be? If not, find out where row got messed up. Trace row backwards to where something you did does not produce the results you expect. If row is correct then maybe row['offenseFormation'] doesn't do what you expect. Read up about what type "row" is and what the index operator does for that type.

The code to the program that I showed in my first post is incomplete. I will now show the code to the entire
program. What I showed before was only a fragment of the code. What is shown below is the whole program.

#!/usr/bin/env python
# coding: utf-8

# In[ ]:


# !/usr/bin/python
# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import xgboost as xgb

from sklearn import metrics

df = pd.read_csv('plays.csv')

print(len(df))
print(df.head())


# In[ ]:


# drop st plays

df = df[~df['isSTPlay']]
print(len(df))


# In[ ]:


# drop kneels

df = df[~df['playDescription'].str.contains('kneels')]
print(len(df))


# In[ ]:


# drop overtime

df = df[~(df['quarter'] == 5)]
print(len(df))


# In[ ]:


# convert time/quarters
def translate_game_clock(row):
    raw_game_clock = row['GameClock']
    quarter = row['quarter']
    minutes, seconds_raw = raw_game_clock.partition(':')[::2]

    seconds = seconds_raw.partition(':')[0]

    total_seconds_left_in_quarter = int(seconds) + (int(minutes) * 60)

    if quarter == 3 or quarter == 1:
        return total_seconds_left_in_quarter + 900
    elif quarter == 4 or quarter == 2:
        return total_seconds_left_in_quarter


if 'GameClock' in list(df.columns):
    df["secondsLeftInHalf"] = df.apply(translate_game_clock, axis=1)


if 'quarter' in list(df.columns):
    df["half"] = df['quarter'].map(lambda q: 2 if q > 2 else 1)


# In[ ]:


def yards_to_endzone(row):
    if row['possessionTeam'] == row['yardlineSide']:

        return 100 - row['yardlineNumber']

    else:

        return row['yardlineNumber']


df['yardsToEndzone'] = df.apply(yards_to_endzone, axis=1)


# In[ ]:


def transform_off_personnel(row):

    rb_count = 0

    te_count = 0

    wr_count = 0

    ol_count = 0

    dl_count = 0

    db_count = 0

    if not pd.isna(row['personnel.offense']):
        personnel = row['personnel.offense'].split(',')

        for p in personnel:

            if p[2:4] == 'RB':

                rb_count = int(p[0])

            elif p[2:4] == 'TE':

                te_count = int(p[0])

            elif p[2:4] == 'WR':

                wr_count = int(p[0])

            elif p[2:4] == 'OL':

                ol_count = int(p[0])

            elif p[2:4] == 'DL':

                dl_count = int(p[0])

            elif p[2:4] == 'DB':

                db_count = int(p[0])

    return pd.Series([
        rb_count,
        te_count,
        wr_count,
        ol_count,
        dl_count,
        db_count,
    ])


# In[ ]:


df[[
    'rb_count',
    'te_count',
    'wr_count',
    'ol_count',
    'dl_count',
    'db_count',
    ]] = df.apply(transform_off_personnel, axis=1)

df['offenseFormation'] = df['offenseFormation'].map(lambda f: ('EMPTY' if pd.isna(False) else f))


def formation(row):

    form = row['offenseFormation'].strip()

    if form == 'SHOTGUN':

        return 0
    elif form == 'SINGLEBACK':

        return 1
    elif form == 'EMPTY':

        return 2
    elif form == 'I_FORM':

        return 3
    elif form == 'PISTOL':

        return 4
    elif form == 'JUMBO':

        return 5
    elif form == 'WILDCAT':

        return 6
    elif form == 'ACE':

        return 7
    else:

        return -1


df['numericFormation'] = df.apply(formation, axis=1)

print(df.yardlineNumber.unique())


# In[ ]:


def play_type(row):
    if row['PassResult'] == 'I' or row['PassResult'] == 'C' or row['PassResult'] == 'S':

        return 'Passing'

    else:

        return 'Rushing'


df['play_type'] = df.apply(play_type, axis=1)
df['numeric_PlayType'] = df['play_type'] .map(lambda p: 1 if p == 'Passing' else 0)


# In[ ]:


df_final = df[['down', 'yardsToGo', 'yarsdtoEndzone', 'rb_count', 'te_count', 'wr_count', 'ol_count', 'db_count', 'secondsLeftInHalf',
               'half', 'numericPlayType', 'numericFormation', 'play_type']]


# In[ ]:


print(df.final.describe(include='all'))


# In[ ]:


print(df.yardlineNumber.unique())


# In[ ]:


df['yardlineNumber'] = df['yardlineNumber'].fillna(50)


# In[ ]:


sns.catplot(x='play_type', kind='count', data=df_final, orient='h')

plt.show()


# In[ ]:


sns.catplot(x="down", kind="count", hue='play_type', data=df_final)

plt.show()


# In[ ]:


sns.lmplot(x="yrdsToGo", y="numericPlayType", data=df_final, y_jitter=0.03, logistic=True, aspect=2)

plt.show()


# In[ ]:


train_df, validation_df, test_df = np.split(df_final.sample(frac=1), [int(0.7 * len(df)), int(0.9 * len(df))])

print("Training size is %d, validation size is %d, test_size is %d" % (len(train_df), len(validation_df), len(test_df)))


# In[ ]:


train_clean_df = train_df.drop(columns=['numericPlayType'])

d_train = xgb.DMatrix(train_clean_df, label=train_df['numericPlayType'], feature_names=list(train_clean_df))


# In[ ]:


val_clean_df = train_df.drop(columns=['numericPlayType'])

d_val = xgb.DMatrix(val_clean_df, label=validation_df['numericPlayType'], feature_names=list(val_clean_df))

eval_list = [(d_train, 'train'), (d_val, 'eval')]

results = {}


# In[ ]:


param = {

    'objective': 'binary:logistic',

    'eval_metric': 'auc',

    'max_depth': 5,

    'eta': 0.2,

    'rate_drop': 0.2,

    'min_child_weight': 6,

    'gama': 4,

    'subsample': 0.8,

    'alpha': 0.1

}


# In[ ]:


num_round = 250
xgb_model = xgb.train(param, d_train, num_round, eval_list, early_stopping_rounds=8)


# In[ ]:


test_clean_df = test_df.drop(columns=['numericPlayType'])
d_test = xgb.DMatrix(test_clean_df, label=test_df['numericPlayType'], feature_names=list(test_clean_df))


# In[ ]:


actual = test_df['numericPlayType']
predictions = xgb_model.predict(d_test)
print(predictions[:5])


# In[ ]:


rounded_predictions = np.round(predictions)
accuracy = metrics.accuracy_score(actual, rounded_predictions)
print("Metrics:\nAccuracy: % 4f" % (accuracy))


# In[ ]:

Something is causing the program to do what it is doing.

You may find this hard to believe, but in University I manned the help desk in the computer center. When someone came over with a print out of a program that is failing and asked why the mainframe was doing this, my usual reply was because you told it to.

I do know that it errors out on runtime, that is obvious. But how to alter code so that does not happen again is not something that I am able do now. Hence, this post. I do not know why the python program is dynamically typing row("offenseFormation") as a float. It is obviously in the code leading to the runtime error.

Call it what you will, but the error happens on runtime. So what to do to make the program not execute this error?

Any help appreciated.

Respectfully,

ErnestTBass

You bring up an interesting point in your last post - the one about tracing the program execution.

That would be my advice to anyone who brought me a program with an error in it and they could not find a solution.
However, there is a point of confusion here. That is the charm of python.

As you said trace what happens to row and what happens to row('offenseformation'). This is confusing to me about how row and
row('offenseformation') are related. As I said in my initial post row must be an integer it simply cannot be a float. But you said no row('offenseformation') is the float.

This is a point of confusion for me as it was in my previous post. Just how are row and row('offenseformation') related.

Again any help appreciated.

I am using Spyder to trace through this program, but row and row('offenseformation') are never the upper right hand window. They just are not in that window.

Respectfully,

ErnestTBass

A couple print statements might help. Line 162 print(df['offenseFormation']) and line 196 print(df). What do you see?

In order to provide meaningful help content (or structure) of the file should be supplied as well.

However, it is safe to assume that automatic conversion was made by pandas while importing csv-file (df = pd.read_csv('plays.csv'))

You can verify datatypes with df.dtypes

I am adding the two files from

line 162 print(df['offenseFormation'])

and

line 196 print(df)

I attaching the original python file. I tried to 7-zip it and this forum would not take it in that form. So I am sending the original python file. I cannot attach the plays.csv file since that is very large and I cannot 7-zip it and attach it.

I am not sure how to get the plays.csv file to you. Your forum will not take a 7-zip file.

So what can I do.

Any help appreciated. Thanks in advance.

Respectfully,

ErnestTBass

Her is an abbreviated version of plays.csv. It is the best that I can do.

Respectfully,

ErnestTBass

Pages: 1 2

ErnestTBass

deanhystad

ErnestTBass

deanhystad

ErnestTBass

deanhystad

ErnestTBass

jefsummers

perfringo

ErnestTBass