Python Forum
How to add multiple tables to pyspark sql - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: How to add multiple tables to pyspark sql (/thread-11864.html)



How to add multiple tables to pyspark sql - cpatte7372 - Jul-29-2018

Hello community,

Can someone let me know how to add multiple tables to a my query?

As you can see from the code below I have two tables i) Person_Person ii) appl_stock. The problem is the code won't work with the two tables. It will only work with single table. I have tried the following but it didn't work.

df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/"Person_Person.csv", "appl_stock.csv"',inferSchema=True,header=True)
#%%
import findspark
findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7')
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('ops').getOrCreate()
df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/Person_Person.csv, appl_stock.csv',inferSchema=True,header=True)
df.createOrReplaceTempView('Person_Person, appl_stock')
results = spark.sql("SELECT \
appl_stock.Open\
, appl_stock.Close\
 FROM appl_stock\
 WHERE appl_stock.Close < 500")
carl = spark.sql("SELECT * FROM Person_Person")
results.show()
Any help will be greatly appreciated.

Cheers

Carlton


RE: How to add multiple tables to pyspark sql - Larz60+ - Jul-29-2018

Normal SQL would be to use an Inner Join.
You can find a spark example here: http://bailiwick.io/2015/07/12/joining-data-frames-in-spark-sql/


RE: How to add multiple tables to pyspark sql - cpatte7372 - Jul-30-2018

Larz60+

Thanks for reaching out. I will check out the link you provided..

In the meantime, I'm happy for this question to be closed.

Cheers