Oct-11-2022, 12:25 PM
Hello,
I have some basic questions about using Pandas:
-- Structured and Unstructured Data: my understanding is that Pandas deals only with tabular 2D data (dataframes) or 1D data (dataseries) and can only convert data that is structured and tabular (csv tables, excel file, etc.) into either of two two data structures. I guess a JSON file is an example of semi-structure data and can be converted into a dataframe too, correct? Pandas cannot handle unstructured data at all...
-- Pandas and Numpy: Pandas does not have the mathematical functionalities of the package Numpy in the sense that Numpy is more "mathematical". However, numpy requires its elements to be all numbers while pandas dataframe are more heterogeneous (except for requiring the same data type in the same column).
If both pandas and numpy are imported, is it possible to apply pandas methods directly to numpy arrays and numpy methods to pandas dataframes? Or do we need to first convert pandas dataframes into numpy arrays, perform mathematical calculations, and then reconvert to pandas dataframes?
-- Categorical Data: In pandas, all categorical data must always need to be converted into a numerical form, even if that numerical form is not a real number but just a code, using approaches like one-hot keying, correct? Straight categorical data can only be used in visualizations (box graphs, where the x-axis is the categorical labels) in which case we can keep the categorical data as string data types...Is that correct?
Thank you!
bytecrunch
I have some basic questions about using Pandas:
-- Structured and Unstructured Data: my understanding is that Pandas deals only with tabular 2D data (dataframes) or 1D data (dataseries) and can only convert data that is structured and tabular (csv tables, excel file, etc.) into either of two two data structures. I guess a JSON file is an example of semi-structure data and can be converted into a dataframe too, correct? Pandas cannot handle unstructured data at all...
-- Pandas and Numpy: Pandas does not have the mathematical functionalities of the package Numpy in the sense that Numpy is more "mathematical". However, numpy requires its elements to be all numbers while pandas dataframe are more heterogeneous (except for requiring the same data type in the same column).
If both pandas and numpy are imported, is it possible to apply pandas methods directly to numpy arrays and numpy methods to pandas dataframes? Or do we need to first convert pandas dataframes into numpy arrays, perform mathematical calculations, and then reconvert to pandas dataframes?
-- Categorical Data: In pandas, all categorical data must always need to be converted into a numerical form, even if that numerical form is not a real number but just a code, using approaches like one-hot keying, correct? Straight categorical data can only be used in visualizations (box graphs, where the x-axis is the categorical labels) in which case we can keep the categorical data as string data types...Is that correct?
Thank you!
bytecrunch