Jan-08-2021, 04:29 PM
Hi all,
I am new to python.
I am reading a datafile where there is timestamp values as string.
I want to ger distinct years from this dataframe and keep them in an array.
I have some trial below that don't work.
Could you give a help about how to do it
I am new to python.
I am reading a datafile where there is timestamp values as string.
I want to ger distinct years from this dataframe and keep them in an array.
I have some trial below that don't work.
Could you give a help about how to do it
import pyspark from pyspark.sql import SparkSession spark = SparkSession.builder.appName('pyspark-by-examples').getOrCreate() from pyspark.sql.types import StructType,StructField, StringType, IntegerType,ArrayType from pyspark.sql.functions import split, explode import pyspark.sql.types import calendar import datetime import pandas as pd from pyspark.sql import functions as F from pyspark.sql import types as T import datetime as dt arrayData = spark.read.format("delta").load("/mnt/datalake/....something") #arraySchema = StructType([ \ # StructField("repair_year",StringType(),True), \ #]) arrayData['repair_year']= arrayData.select('repair_date').withColumn("repair_date", F.col("repair_date").cast(T.TimestampType())) #df = arraySchema #df.printSchema() #df.show() arraySchema.show()