Python Forum
Confused by the different ways of extracting data in DataFrame
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Confused by the different ways of extracting data in DataFrame
#1
Suppose we have the following DataFrame recording the results of 5 students.
dic0 = {'Java':[87,65,26,89,67],
        'C++':[63,98,66,89,80],
        'Python':[78,25,76,43,69]}
d = pd.DataFrame(dic0)
d.index =  ['Tom', 'Bob', 'Tim', 'Wien', 'Lily']
d[1:] should give the rows from index 1 to the end. The number inside square brackets is referring to the rows.
Output:
Java C++ Python Bob 65 98 25 Tim 26 66 76 Wien 89 89 43 Lily 67 80 69
But if I write d['Java'], the system will consider the thing inside sqaure brackets as column index. The output is:
Output:
Tom 87 Bob 65 Tim 26 Wien 89 Lily 67 Name: Java, dtype: int64
I am confused here. In both examples, I just insert one 'item' in the square brackets, but the system regards the first one as row index while the second one column index. What is the rule behind this? How can I know in what ways the system will interpret my input?

I then try to get Tom's Java result. I thought I should give the row index first and write d['Tom']['Java'] but it turns out that I should write d['Java']['Tom']. In other words, I should extract the 'Java' columns, then look for 'Tom'.
Output:
87
My next job is to get Tim's result of all three subjects, and I try to use d.loc[]. This time, I need to give the the row index first and the colum index. Hence, I should write d.loc['Tim',:].
Output:
Java 26 C++ 66 Python 76 Name: Tim, dtype: int64
But now I am confused. When we try to extract elements from DataFrame, sometimes we give the row index first while sometimes it's the opposite. Is that a general rule behind this? Or do I just have to memorize the different requirements of the ways of extracting elements?
Reply
#2
[1] is a single index. [1:] is a list of slice, a list of indices starting at 1 and going to the end. Slices are part of Python, not something specific to pandas.

Indexing is easier to understand if you do things in stages.

When I call d['Tom'] I get an error
Error:
indexer = self.columns.get_loc(key) ^^^^^^^^^^^^^^^^^^^^^^^^^ KeyError: 'Tom'
This tells me that "Tom" is not a column index in d. I can verify.
print(d.columns)
Output:
Index(['Java', 'C++', 'Python'], dtype='object')
However, if I call d["Java"] I should get a column, since "Java" appears in the list of columns.
Tom     87
Bob     65
Tim     26
Wien    89
Lily    67
Name: Java, dtype: int64
d["Java"] returns a series, a single column of d. I can index he series just like I indexed the dataframe. I can print a list of index keys like this.
print(d["Java"].index)
Output:
Index(['Tom', 'Bob', 'Tim', 'Wien', 'Lily'], dtype='object')
"Tom" appears in the list of index keys, so I can call d["Java"]["Tom"]

If you really wanted to call d["Tom"]["Java"] you could , with a slight change. You use loc or iloc to select a row from the dataframe.
print(d.loc["Tom"])
print(d.loc["Tom"]["Java"]
Output:
Java 87 C++ 63 Python 78 Name: Tom, dtype: int64 87
So dataframes are indexed by columns unless you use loc/iloc which lets you index by rows. This makes sense as most operations performed in pandas are on columns of data instead of rows of data. I find myself almost never using loc/iloc to get data for a row, primarily because when I perform an operation in pandas I usually want to apply the operation to all rows, and secondly because pandas is really fast at doing things with columns and relatively glacial when performing operations on individual rows.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Extracting data from bank statement PDFs (Accountant) a4avinash 4 4,931 Feb-27-2025, 01:53 PM
Last Post: griffinhenry
  Extracting the correct data from a CSV file S2G 6 1,736 Jun-03-2024, 04:50 PM
Last Post: snippsat
  Different Ways to Import Modules RockBlok 2 1,339 Dec-11-2023, 04:29 PM
Last Post: deanhystad
  Filter data into new dataframe as main dataframe is being populated cubangt 8 2,955 Oct-23-2023, 12:43 AM
Last Post: cubangt
  String int confused janeik 7 2,493 Aug-02-2023, 01:26 AM
Last Post: deanhystad
  I am confused with the key and value thing james1019 3 1,830 Feb-22-2023, 10:43 PM
Last Post: deanhystad
  Extracting Data into Columns using pdfplumber arvin 17 14,949 Dec-17-2022, 11:59 AM
Last Post: arvin
  Seeing al the data in a dataframe or numpy.array Led_Zeppelin 1 1,688 Jul-11-2022, 08:54 PM
Last Post: Larz60+
  Need help formatting dataframe data before saving to CSV cubangt 16 10,741 Jul-01-2022, 12:54 PM
Last Post: cubangt
  Extracting Data from tables DataExtrator 0 1,604 Nov-02-2021, 12:24 PM
Last Post: DataExtrator

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020