Python Forum

Hi there,

I want to remove some columns keep only Age Sex BMI Region Charges and then display the new table.
Please help me because I cannot display the new table as I expected.

Age Sex BMI Smoker Region Children Charges
21 male 25.75 no northeast 2 3279.87
37 female 25.74 yes southeast 3 21454.49
18 male 30.03 no southeast 1 1720.35
37 male 30.68 no northeast 3 6801.44
58 male 32.01 no southeast 1 11946.63
46 male 26.62 no southeast 1 7742.11
25 male 31.19 no northeast 4 21736.33

Do you have any code?

(Dec-13-2023, 12:48 AM)menator01 Wrote: [ -> ]Do you have any code?

=======================================================

import pandas as pd

# Specified list of names
specified_names = ['Age','Sex','BMI','Region','Charges']

# Read the CSV file into a pandas DataFrame
df = pd.read_csv('medical_insurance.csv')

# Get the column names from the first row of the DataFrame
columns = df.columns.tolist()

# Create a list of columns to keep (columns whose first row entries match the specified names)
columns_to_keep = [col for col in columns if df[col][0] in specified_names]

# Create a new DataFrame with only the columns to keep
filtered_df = df[columns_to_keep]

# Save the filtered DataFrame to a new CSV file
filtered_df.to_csv('filtered_medical_insurance', index=False)

#FL: Why these 2 lines don't work
#new_df = pd.read_csv('filtered_medical_insurance.csv')
#new_df.head()

filtered_df.head()

Can use the usecols

import pandas as pd

df = pd.read_csv('test.csv', usecols=[0,1,2,4,6])

print(df)

Output

Output:  Age      Sex    BMI      Region   Charges
0   21     male  25.75   northeast   3279.87
1   37   female  25.74   southeast  21454.49
2   18     male  30.03   southeast   1720.35
3   37     male  30.68   northeast   6801.44
4   58     male  32.01   southeast  11946.63
5   46     male  26.62   southeast   7742.11
6   25     male  31.19   northeast  21736.33

Using tabulate for display

# Do the imports
import pandas as pd
from tabulate import tabulate 

# Read the csv file with selected columns
df = pd.read_csv('test.csv', usecols=[0,1,2,4,6])

# Create new csv file
df.to_csv('newcsv.csv', index=False)

# Read new csv file
new_df = pd.read_csv('newcsv.csv')

# Using tabulate to format display output
new_df = tabulate(new_df, headers=list(new_df), showindex=False, tablefmt='pretty')

# Print new_df
print(new_df)

Output

Output:+-----+--------+-------+-----------+----------+
| Age |  Sex   |  BMI  |  Region   | Charges  |
+-----+--------+-------+-----------+----------+
| 21  |  male  | 25.75 | northeast | 3279.87  |
| 37  | female | 25.74 | southeast | 21454.49 |
| 18  |  male  | 30.03 | southeast | 1720.35  |
| 37  |  male  | 30.68 | northeast | 6801.44  |
| 58  |  male  | 32.01 | southeast | 11946.63 |
| 46  |  male  | 26.62 | southeast | 7742.11  |
| 25  |  male  | 31.19 | northeast | 21736.33 |
+-----+--------+-------+-----------+----------+

Thank you Menator01

James_S

menator01

James_S

menator01

James_S