Python Forum

Full Version: get year information from a timestamp data frame
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi all,
I am new to python.
I am reading a datafile where there is timestamp values as string.
I want to ger distinct years from this dataframe and keep them in an array.
I have some trial below that don't work.
Could you give a help about how to do it

import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('pyspark-by-examples').getOrCreate()
from pyspark.sql.types import StructType,StructField, StringType, IntegerType,ArrayType
from pyspark.sql.functions import split, explode
import pyspark.sql.types 

import calendar
import datetime
import pandas as pd
from pyspark.sql import functions as F
from pyspark.sql import types as T
import datetime as dt 

arrayData = spark.read.format("delta").load("/mnt/datalake/....something")
#arraySchema = StructType([ \
   # StructField("repair_year",StringType(),True), \
  #])

arrayData['repair_year']= arrayData.select('repair_date').withColumn("repair_date", F.col("repair_date").cast(T.TimestampType()))



#df = arraySchema
#df.printSchema()
#df.show()

arraySchema.show()
import datetime
import time

# create a timestamp -- you won't have to do this as you already have timestamp
timestamp = time.time()
print(f"\n\ntimestamp: {timestamp}")

year = datetime.date.fromtimestamp(timestamp).year
print(f"Year: {year}")
Output:
timestamp: 1610140280.269422 Year: 2021