Python Forum

Full Version: Remove some columns
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi there,

I want to remove some columns keep only Age Sex BMI Region Charges and then display the new table.
Please help me because I cannot display the new table as I expected.

Age Sex BMI Smoker Region Children Charges
21 male 25.75 no northeast 2 3279.87
37 female 25.74 yes southeast 3 21454.49
18 male 30.03 no southeast 1 1720.35
37 male 30.68 no northeast 3 6801.44
58 male 32.01 no southeast 1 11946.63
46 male 26.62 no southeast 1 7742.11
25 male 31.19 no northeast 4 21736.33
Do you have any code?
(Dec-13-2023, 12:48 AM)menator01 Wrote: [ -> ]Do you have any code?

=======================================================

import pandas as pd

# Specified list of names
specified_names = ['Age','Sex','BMI','Region','Charges']

# Read the CSV file into a pandas DataFrame
df = pd.read_csv('medical_insurance.csv')

# Get the column names from the first row of the DataFrame
columns = df.columns.tolist()

# Create a list of columns to keep (columns whose first row entries match the specified names)
columns_to_keep = [col for col in columns if df[col][0] in specified_names]

# Create a new DataFrame with only the columns to keep
filtered_df = df[columns_to_keep]

# Save the filtered DataFrame to a new CSV file
filtered_df.to_csv('filtered_medical_insurance', index=False)

#FL: Why these 2 lines don't work
#new_df = pd.read_csv('filtered_medical_insurance.csv')
#new_df.head()

filtered_df.head()
Can use the usecols

import pandas as pd

df = pd.read_csv('test.csv', usecols=[0,1,2,4,6])

print(df)
Output
Output:
Age Sex BMI Region Charges 0 21 male 25.75 northeast 3279.87 1 37 female 25.74 southeast 21454.49 2 18 male 30.03 southeast 1720.35 3 37 male 30.68 northeast 6801.44 4 58 male 32.01 southeast 11946.63 5 46 male 26.62 southeast 7742.11 6 25 male 31.19 northeast 21736.33

Using tabulate for display
# Do the imports
import pandas as pd
from tabulate import tabulate 

# Read the csv file with selected columns
df = pd.read_csv('test.csv', usecols=[0,1,2,4,6])

# Create new csv file
df.to_csv('newcsv.csv', index=False)

# Read new csv file
new_df = pd.read_csv('newcsv.csv')

# Using tabulate to format display output
new_df = tabulate(new_df, headers=list(new_df), showindex=False, tablefmt='pretty')

# Print new_df
print(new_df)
Output
Output:
+-----+--------+-------+-----------+----------+ | Age | Sex | BMI | Region | Charges | +-----+--------+-------+-----------+----------+ | 21 | male | 25.75 | northeast | 3279.87 | | 37 | female | 25.74 | southeast | 21454.49 | | 18 | male | 30.03 | southeast | 1720.35 | | 37 | male | 30.68 | northeast | 6801.44 | | 58 | male | 32.01 | southeast | 11946.63 | | 46 | male | 26.62 | southeast | 7742.11 | | 25 | male | 31.19 | northeast | 21736.33 | +-----+--------+-------+-----------+----------+
Thank you Menator01