What does this error mean?

***snippsat*** · (This post was last modified: Aug-26-2020, 05:37 PM by snippsat.)

Why do not link to site where you got code from Dodgy

Code may have work at one time,but not it don't.
This is how it can be with many of these data science blogs trow together at one time to test some stuff.
You should look at method used,then write own code to test it out.
I can do fix so this mess work,but really not so much point in this,write own code looking at method used.

#!/usr/bin/env python
# coding: utf-8

# In[ ]:


# !/usr/bin/python
# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
#import xgboost as xgb

#from sklearn import metrics

df = pd.read_csv('plays.csv')

print(len(df))
print(df.head())


# In[ ]:


# drop st plays

df = df[~df['isSTPlay']]
print(len(df))


# In[ ]:


# drop kneels

df = df[~df['playDescription'].str.contains('kneels')]
print(len(df))


# In[ ]:


# drop overtime

df = df[~(df['quarter'] == 5)]
print(len(df))


# In[ ]:


# convert time/quarters
def translate_game_clock(row):
    raw_game_clock = row['GameClock']
    quarter = row['quarter']
    minutes, seconds_raw = raw_game_clock.partition(':')[::2]

    seconds = seconds_raw.partition(':')[0]

    total_seconds_left_in_quarter = int(seconds) + (int(minutes) * 60)

    if quarter == 3 or quarter == 1:
        return total_seconds_left_in_quarter + 900
    elif quarter == 4 or quarter == 2:
        return total_seconds_left_in_quarter


if 'GameClock' in list(df.columns):
    df["secondsLeftInHalf"] = df.apply(translate_game_clock, axis=1)


if 'quarter' in list(df.columns):
    df["half"] = df['quarter'].map(lambda q: 2 if q > 2 else 1)


# In[ ]:


def yards_to_endzone(row):
    if row['possessionTeam'] == row['yardlineSide']:

        return 100 - row['yardlineNumber']

    else:

        return row['yardlineNumber']


df['yardsToEndzone'] = df.apply(yards_to_endzone, axis=1)


# In[ ]:


def transform_off_personnel(row):

    rb_count = 0

    te_count = 0

    wr_count = 0

    ol_count = 0

    dl_count = 0

    db_count = 0

    if not pd.isna(row['personnel.offense']):
        personnel = row['personnel.offense'].split(',')

        for p in personnel:

            if p[2:4] == 'RB':

                rb_count = int(p[0])

            elif p[2:4] == 'TE':

                te_count = int(p[0])

            elif p[2:4] == 'WR':

                wr_count = int(p[0])

            elif p[2:4] == 'OL':

                ol_count = int(p[0])

            elif p[2:4] == 'DL':

                dl_count = int(p[0])

            elif p[2:4] == 'DB':

                db_count = int(p[0])

    return pd.Series([
        rb_count,
        te_count,
        wr_count,
        ol_count,
        dl_count,
        db_count,
    ])


# In[ ]:


df[[
    'rb_count',
    'te_count',
    'wr_count',
    'ol_count',
    'dl_count',
    'db_count',
    ]] = df.apply(transform_off_personnel, axis=1)

df['offenseFormation'] = df['offenseFormation'].map(lambda f: ('EMPTY' if pd.isna(False) else f))


def formation(row):
    try:
        form = row['offenseFormation'].strip()
    except AttributeError:
        form = row['offenseFormation']
    if form == 'SHOTGUN':
        return 0
    elif form == 'SINGLEBACK':

        return 1
    elif form == 'EMPTY':

        return 2
    elif form == 'I_FORM':

        return 3
    elif form == 'PISTOL':

        return 4
    elif form == 'JUMBO':

        return 5
    elif form == 'WILDCAT':

        return 6
    elif form == 'ACE':

        return 7
    else:

        return -1


df['numericFormation'] = df.apply(formation, axis=1)

print(df.yardlineNumber.unique())


# In[ ]:


def play_type(row):
    if row['PassResult'] == 'I' or row['PassResult'] == 'C' or row['PassResult'] == 'S':

        return 'Passing'

    else:

        return 'Rushing'


df['play_type'] = df.apply(play_type, axis=1)
df['numeric_PlayType'] = df['play_type'] .map(lambda p: 1 if p == 'Passing' else 0)


# In[ ]:


df_final = df[['down', 'yardsToGo', 'rb_count', 'te_count', 'wr_count', 'ol_count', 'db_count', 'secondsLeftInHalf',
               'half', 'numericFormation', 'play_type']]


# In[ ]:


#print(df.final.describe(include='all'))


# In[ ]:


print(df.yardlineNumber.unique())


# In[ ]:


df['yardlineNumber'] = df['yardlineNumber'].fillna(50)


# In[ ]:


sns.catplot(x='play_type', kind='count', data=df_final, orient='h')

plt.show()


# In[ ]:


sns.catplot(x="down", kind="count", hue='play_type', data=df_final)

plt.show()


# In[ ]:


#sns.lmplot(x="yrdsToGo", y="numericPlayType", data=df_final, y_jitter=0.03, logistic=True, aspect=2)

#plt.show()


# In[ ]:


train_df, validation_df, test_df = np.split(df_final.sample(frac=1), [int(0.7 * len(df)), int(0.9 * len(df))])

print("Training size is %d, validation size is %d, test_size is %d" % (len(train_df), len(validation_df), len(test_df)))


# In[ ]:


#train_clean_df = train_df.drop(columns=['numericPlayType'])

#d_train = xgb.DMatrix(train_clean_df, label=train_df['numericPlayType'], feature_names=list(train_clean_df))


# In[ ]:


#val_clean_df = train_df.drop(columns=['numericPlayType'])

#d_val = xgb.DMatrix(val_clean_df, label=validation_df['numericPlayType'], feature_names=list(val_clean_df))

#eval_list = [(d_train, 'train'), (d_val, 'eval')]

#results = {}


# In[ ]:


param = {

    'objective': 'binary:logistic',

    'eval_metric': 'auc',

    'max_depth': 5,

    'eta': 0.2,

    'rate_drop': 0.2,

    'min_child_weight': 6,

    'gama': 4,

    'subsample': 0.8,

    'alpha': 0.1

}


# In[ ]:


num_round = 250
#xgb_model = xgb.train(param, d_train, num_round, eval_list, early_stopping_rounds=8)


# In[ ]:


#test_clean_df = test_df.drop(columns=['numericPlayType'])
#d_test = xgb.DMatrix(test_clean_df, label=test_df['numericPlayType'], feature_names=list(test_clean_df))


# In[ ]:


#actual = test_df['numericPlayType']
#predictions = xgb_model.predict(d_test)
#print(predictions[:5])


# In[ ]:


#rounded_predictions = np.round(predictions)
#accuracy = metrics.accuracy_score(actual, rounded_predictions)
#print("Metrics:\nAccuracy: % 4f" % (accuracy))


# In[ ]:

ErnestTBass · Aug-26-2020, 07:31 PM

I made an error and included two screen captures of

Line 162 print(df['offenseFormation'])

and none of

Line 196 print(df)

I will correct that now. I will attach a screen capture that include both images in one single capture.

I hope that fixes the problem. Now this screenshot contains both outputs.

Any help appreciated. Thanks in advance.

Respectfully,

ErnestTBass

jefsummers · Aug-26-2020, 08:19 PM

So offenseformation is a column of objects (dtype=object). Object does not have the function "split".

ErnestTBass · Aug-27-2020, 05:37 PM

How did you know that offenseFormation is a column of objects?

Respectfully,

ErnestTBass

jefsummers · Aug-27-2020, 07:05 PM

dtype: Object

In the middle of your picture

ErnestTBass · Aug-27-2020, 07:18 PM

Sorry if my last post seemed dumb, as I said before I am just learning. I think I know what the problem is (in general), but not how to "fix" it.

Python infers what type of alpha/numerics it is reading when it reads in a dataset.

Somehow/someway it infers incorrectly the offenseFormation column when it reads it.

Now when it starts handling the data, the dthe program is expecting one type of data and gets another.

This is not the first time that has happened, I am sure.

Now what the .split has to do, I am not sure. Please give a reference to explain what the coding
phenom ois all about.

That part I just do not get. I am using the medium site and other sites for articles because that is where my interest lies.

I have tried to find a book about this subject (classification) in sports betting and have yet to find one.

I Really think as was told to me the error occurs when the contents of plays.csv is used in calls or when it is read. There is were the error is.

I just do see how you can tell by inspection that offenseFormaton is what it is.

It just seems to be not visible to me when I examine the code.

Respectfully,

ErnestTBass

Hence, I am at the mercy of what I can find online.

jefsummers · Aug-28-2020, 11:06 AM

split() is used to split a string on a character. Typically used to take a sentence and convert to a list of words. For this and other string methods see https://docs.python.org/3/library/stdtyp...e-type-str

From a high altitude - Everything in Python is an object. Functions, strings, floats, classes, everything. Many libraries, like Pandas, will try to be more specific (This column has floats, or this column has strings) but if it detects multiple types it will default to Object.

You can tell it that the column is strings by using the str(my_object) function. So

my_string = str(my_object)

will allow you to do all the string sorts of things on my_string that you could not do on my_object.

A way to think of it is that Object is like Animal. Animals have certain characteristics. Bird is a type of Animal, taking all the Animal characteristics and adding Bird characteristics. Robin is a type of Bird, inheriting all of Animal and Bird and adding the specific Robin characteristics. It's not called inheritance for nothin'

I do not know why your code uses split() in that location or whether that is necessary. I'm just trying to help you debug the code

ErnestTBass · Aug-28-2020, 02:02 PM

I appreciate your help. Please do not think that I do not. It is odd that for most people the problem in machine learning is statistics. I have a PhD.in Operations Research and I have taken about every graduate level stats course a major university can offer. Statistics is not a problem. This learning python is.

Again can you recommend a book that gives good machine examples for classification and ranking. I have looked on Amazon and trust me there are not any. A lot of books on Machine Learning/Data Science, but damn few on practical applications. I want to see a complete real world problem worked from soup to nuts.

Those books are hard to find and that is why I use medium.

Any recommendations.

Thanks in advance.

Respectfully,

ErnestTBass

jefsummers · Aug-28-2020, 07:58 PM

Got it. First, how I got where I am - self taught Python as my 20+ language, used Head First Python (book) to get started then videos (anything by Ned Batchelder) and hanging around here, posting when I knew the answer or watching those more experienced when I did not.

When I got comfortable with Python as a language, I then took a series of Coursera courses that got me comfortable with classification models (and other AI topics). I did the IBM Data analysis courses, a set of around 7 courses that included projects of various difficulty and from the ground up. For one of the capstones I wrote a program that looked at restaurants in Toronto and organized them into clusters, then found the distance from the centers to the nearest Ethiopian restaurants, finally recommending 3 sites for new Ethiopian restaurants in restaurant districts but without competition.

OK, so recommendations - Python for Data Analysis: Master the Basics really emphasizes the basics of working with Numpy, Pandas, matplotlib. It does not get into classifications or any analysis beyond std dev. Python Machine Learning: The Ultimate Beginners Guide... by Ryan Turner has chapters on K-nearest neighbor, K-Means Clustering, and does then get into neural nets and the like. It walks you through examples. I also got Python Data Science by Steve Blair but did not like it. Currently reading Hands On Machine Learning with Sciki-Learn, Keras, and TensorFlow by Aurelien Geran which is packed with info, but sounds like that is not the direction you are wanting.

All books listed above I purchased.

I hope some of that is helpful.

Oh, and stay out of Mayberry. Barney is looking for you.

What does this error mean?

User Panel Messages

Announcements