Hi there,
I want to remove some columns keep only Age Sex BMI Region Charges and then display the new table.
Please help me because I cannot display the new table as I expected.
Age Sex BMI Smoker Region Children Charges
21 male 25.75 no northeast 2 3279.87
37 female 25.74 yes southeast 3 21454.49
18 male 30.03 no southeast 1 1720.35
37 male 30.68 no northeast 3 6801.44
58 male 32.01 no southeast 1 11946.63
46 male 26.62 no southeast 1 7742.11
25 male 31.19 no northeast 4 21736.33
(Dec-13-2023, 12:48 AM)menator01 Wrote: [ -> ]Do you have any code?
=======================================================
import pandas as pd
# Specified list of names
specified_names = ['Age','Sex','BMI','Region','Charges']
# Read the CSV file into a pandas DataFrame
df = pd.read_csv('medical_insurance.csv')
# Get the column names from the first row of the DataFrame
columns = df.columns.tolist()
# Create a list of columns to keep (columns whose first row entries match the specified names)
columns_to_keep = [col for col in columns if df[col][0] in specified_names]
# Create a new DataFrame with only the columns to keep
filtered_df = df[columns_to_keep]
# Save the filtered DataFrame to a new CSV file
filtered_df.to_csv('filtered_medical_insurance', index=False)
#FL: Why these 2 lines don't work
#new_df = pd.read_csv('filtered_medical_insurance.csv')
#new_df.head()
filtered_df.head()
Can use the usecols
import pandas as pd
df = pd.read_csv('test.csv', usecols=[0,1,2,4,6])
print(df)
Output
Output:
Age Sex BMI Region Charges
0 21 male 25.75 northeast 3279.87
1 37 female 25.74 southeast 21454.49
2 18 male 30.03 southeast 1720.35
3 37 male 30.68 northeast 6801.44
4 58 male 32.01 southeast 11946.63
5 46 male 26.62 southeast 7742.11
6 25 male 31.19 northeast 21736.33
Using tabulate for display
# Do the imports
import pandas as pd
from tabulate import tabulate
# Read the csv file with selected columns
df = pd.read_csv('test.csv', usecols=[0,1,2,4,6])
# Create new csv file
df.to_csv('newcsv.csv', index=False)
# Read new csv file
new_df = pd.read_csv('newcsv.csv')
# Using tabulate to format display output
new_df = tabulate(new_df, headers=list(new_df), showindex=False, tablefmt='pretty')
# Print new_df
print(new_df)
Output
Output:
+-----+--------+-------+-----------+----------+
| Age | Sex | BMI | Region | Charges |
+-----+--------+-------+-----------+----------+
| 21 | male | 25.75 | northeast | 3279.87 |
| 37 | female | 25.74 | southeast | 21454.49 |
| 18 | male | 30.03 | southeast | 1720.35 |
| 37 | male | 30.68 | northeast | 6801.44 |
| 58 | male | 32.01 | southeast | 11946.63 |
| 46 | male | 26.62 | southeast | 7742.11 |
| 25 | male | 31.19 | northeast | 21736.33 |
+-----+--------+-------+-----------+----------+