Aug-12-2019, 07:40 PM
Hi again
You talk about variables, but is it not a dataframe, filtered_df? Which is the variable that you're mentioning, the dataframe itself?
What I did so far was only:
Import some of the libraries needed
Many thanks

You talk about variables, but is it not a dataframe, filtered_df? Which is the variable that you're mentioning, the dataframe itself?
What I did so far was only:
Import some of the libraries needed
import pandas as pd import numpy as np import re from IPython.display import displayImport the file
xlsx = pd.ExcelFile("excelfile.xlsx")Read each of the tabs in the file
df1 = pd.read_excel(xlsx, "Tab1") df2 = pd.read_excel(xlsx, "Tab2") df3 = pd.read_excel(xlsx, "Tab3")Concatenate the three tabs from the excel file and create a dataframe with the data
dataframe = [df1,df2,df3] df = pd.concat(dataframe, ignore_index=True) df.head() Then I'm filtering the data contained in the dataframe [python] df_filtered = df[(df['Destination'].str.contains("website.com",regex=True)==True)&(df['Source'].str.contains("website.com",regex=True)==True)&(df['Type']== "AHREF")] df_filtered.head(2)Then I'm trying to created the categories in the filtered dataframe
def categories(Source): if '/string1' in Source: return 'Category 1' elif '/string1' in Source: return 'Category 2' else: return 'other' df_filtered.loc[:, 'Category'] = df_filtered.Source.apply(categories)Which course could give me some understanding of all those rules? I'm reading "Python for Data Analysis" but it is hard to remember all of this. I guess I need to keep practising.
Many thanks