How to add multiple tables to pyspark sql - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: How to add multiple tables to pyspark sql (/thread-11864.html) |
How to add multiple tables to pyspark sql - cpatte7372 - Jul-29-2018 Hello community, Can someone let me know how to add multiple tables to a my query? As you can see from the code below I have two tables i) Person_Person ii) appl_stock. The problem is the code won't work with the two tables. It will only work with single table. I have tried the following but it didn't work. df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/"Person_Person.csv", "appl_stock.csv"',inferSchema=True,header=True) #%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') import pyspark from pyspark.sql import SparkSession spark = SparkSession.builder.appName('ops').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/Person_Person.csv, appl_stock.csv',inferSchema=True,header=True) df.createOrReplaceTempView('Person_Person, appl_stock') results = spark.sql("SELECT \ appl_stock.Open\ , appl_stock.Close\ FROM appl_stock\ WHERE appl_stock.Close < 500") carl = spark.sql("SELECT * FROM Person_Person") results.show()Any help will be greatly appreciated. Cheers Carlton RE: How to add multiple tables to pyspark sql - Larz60+ - Jul-29-2018 Normal SQL would be to use an Inner Join. You can find a spark example here: http://bailiwick.io/2015/07/12/joining-data-frames-in-spark-sql/ RE: How to add multiple tables to pyspark sql - cpatte7372 - Jul-30-2018 Larz60+ Thanks for reaching out. I will check out the link you provided.. In the meantime, I'm happy for this question to be closed. Cheers |