Python Forum
pandas read_csv can't handle missing data
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
pandas read_csv can't handle missing data
#1
I am trying to download data from football data

import os

import pandas as pd

GAMES = ['E0', 'E1', 'E2', 'E3']


def download_statistics():
    for year in range(2003, 2020):
        year_format = str(year)[-2:] + str(year+1)[-2:]
        for game in GAMES:
            previous_data = None
            file_name = f'{game}.csv'
            if os.path.isfile(file_name):
                previous_data = pd.read_csv(file_name)

            url_data = pd.read_csv(f'http://football-data.co.uk/mmz4281/{year_format}/{game}.csv')

            if previous_data is not None:
                combined_data = pd.concat([previous_data, url_data])
                combined_data.to_csv(file_name)
            else:
                url_data.to_csv(file_name)


if __name__ == '__main__':
    download_statistics()
I am aware that some cells missing data, but somehow pandas can't handle them and returning error.

Error:
Traceback (most recent call last): File "F:\Programowanie\GitHub Repositories\football_predict\venv\lib\site-packages\pandas\io\parsers.py", line 454, in _read data = parser.read(nrows) File "F:\Programowanie\GitHub Repositories\football_predict\venv\lib\site-packages\pandas\io\parsers.py", line 1133, in read ret = self._engine.read(nrows) File "F:\Programowanie\GitHub Repositories\football_predict\venv\lib\site-packages\pandas\io\parsers.py", line 2037, in read data = self._reader.read(nrows) File "pandas\_libs\parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read File "pandas\_libs\parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory File "pandas\_libs\parsers.pyx", line 929, in pandas._libs.parsers.TextReader._read_rows File "pandas\_libs\parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas\_libs\parsers.pyx", line 2071, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 57 fields in line 305, saw 72
Below code doesn't return error, but returning DataFrame with shape (380,1) and when trying to split data with comma I am receiving another error:

import os
import io

import pandas as pd
import requests

GAMES = ['E0', 'E1', 'E2', 'E3']


def download_statistics():
    for year in range(2003, 2020):
        year_format = str(year)[-2:] + str(year+1)[-2:]
        for game in GAMES:
            previous_data = None
            file_name = f'{game}.csv'
            if os.path.isfile(file_name):
                previous_data = pd.read_csv(file_name)

            response = requests.get(f'http://football-data.co.uk/mmz4281/{year_format}/{game}.csv')
            url_data = pd.read_csv(io.StringIO(response.text), sep='delimiter')
            url_data = url_data[0].str.split(',', expand=True)

            if previous_data is not None:
                combined_data = pd.concat([previous_data, url_data])
                combined_data.to_csv(file_name)
            else:
                url_data.to_csv(file_name)


if __name__ == '__main__':
    download_statistics()
Error:
F:/Programowanie/GitHub Repositories/football_predict/data.py:23: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'. url_data = pd.read_csv(io.StringIO(response.text), sep='delimiter') Traceback (most recent call last): File "F:\Programowanie\GitHub Repositories\football_predict\venv\lib\site-packages\pandas\core\indexes\base.py", line 2646, in get_loc return self._engine.get_loc(key) File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 0 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "D:\Program Files\PyCharm 2019.3.4\plugins\python\helpers\pydev\pydevd.py", line 1434, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "D:\Program Files\PyCharm 2019.3.4\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "F:/Programowanie/GitHub Repositories/football_predict/data.py", line 34, in <module> download_statistics() File "F:/Programowanie/GitHub Repositories/football_predict/data.py", line 24, in download_statistics url_data = url_data[0].str.split(',', expand=True) File "F:\Programowanie\GitHub Repositories\football_predict\venv\lib\site-packages\pandas\core\frame.py", line 2800, in __getitem__ indexer = self.columns.get_loc(key) File "F:\Programowanie\GitHub Repositories\football_predict\venv\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 0
Am I missing something here?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Grouping in pandas/multi-index data frame Aleqsie 3 669 Jan-06-2024, 03:55 PM
Last Post: deanhystad
Smile How to further boost the data read write speed using pandas tjk9501 1 1,266 Nov-14-2022, 01:46 PM
Last Post: jefsummers
  Pandas read_csv markf7319 0 1,263 Mar-03-2022, 04:59 AM
Last Post: markf7319
Thumbs Up can't access data from URL in pandas/jupyter notebook aaanoushka 1 1,863 Feb-13-2022, 01:19 PM
Last Post: jefsummers
Question Sorting data with pandas TheZaind 4 2,341 Nov-22-2021, 07:33 PM
Last Post: aserian
  Pandas Data frame column condition check based on length of the value aditi06 1 2,692 Jul-28-2021, 11:08 AM
Last Post: jefsummers
  [Pandas] Write data to Excel with dot decimals manonB 1 5,874 May-05-2021, 05:28 PM
Last Post: ibreeden
  pandas.to_datetime: Combine data from 2 columns ju21878436312 1 2,453 Feb-20-2021, 08:25 PM
Last Post: perfringo
  Pandas data frame creation from Kafka Topic vboppa 0 1,941 Jul-01-2020, 04:23 PM
Last Post: vboppa
  Generate Test data (.csv) using Pandas Ashley 5 3,052 Jun-15-2020, 02:51 PM
Last Post: jefsummers

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020