Python Forum
get year information from a timestamp data frame - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: get year information from a timestamp data frame (/thread-31895.html)



get year information from a timestamp data frame - asli - Jan-08-2021

Hi all,
I am new to python.
I am reading a datafile where there is timestamp values as string.
I want to ger distinct years from this dataframe and keep them in an array.
I have some trial below that don't work.
Could you give a help about how to do it

import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('pyspark-by-examples').getOrCreate()
from pyspark.sql.types import StructType,StructField, StringType, IntegerType,ArrayType
from pyspark.sql.functions import split, explode
import pyspark.sql.types 

import calendar
import datetime
import pandas as pd
from pyspark.sql import functions as F
from pyspark.sql import types as T
import datetime as dt 

arrayData = spark.read.format("delta").load("/mnt/datalake/....something")
#arraySchema = StructType([ \
   # StructField("repair_year",StringType(),True), \
  #])

arrayData['repair_year']= arrayData.select('repair_date').withColumn("repair_date", F.col("repair_date").cast(T.TimestampType()))



#df = arraySchema
#df.printSchema()
#df.show()

arraySchema.show()



RE: get year information from a timestamp data frame - Larz60+ - Jan-08-2021

import datetime
import time

# create a timestamp -- you won't have to do this as you already have timestamp
timestamp = time.time()
print(f"\n\ntimestamp: {timestamp}")

year = datetime.date.fromtimestamp(timestamp).year
print(f"Year: {year}")
Output:
timestamp: 1610140280.269422 Year: 2021