Python Forum
pyspark parallel write operation not working
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
pyspark parallel write operation not working
#1
I want pyspark code to use parallel threads when connecting to the database when i am inserting into a table but its not.

I have tried splitting the DF , also used numPartitions atribute in the write call but nothing helping .

The following code works and it writes to the table but with a single database connection .


mport os
import io
import findspark
import pandas as pd
import boto3
import awswrangler as wr
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder \
    .master('local[*]') \
    .config("spark.driver.memory", "25g") \
    .appName('my-cool-app') \
    .getOrCreate()
myDF=spark.read.format('jdbc').options(
   url='jdbc:redshift://hostname.com:5439/dev',
   driver='com.amazon.redshift.jdbc42.Driver',
   dbtable='schema1.table1',
   user='awsuser',
   password='securepassword').load()
myDF.count()
myDF_part = myDF.repartition(16)
myDF_part.write.format('jdbc').options(
   url='jdbc:oracle:thin:@oraclehost:1521/iINST1',
   driver='oracle.jdbc.driver.OracleDriver',
   dbtable='test',
   batchsize=10000,
   numPartitions=16,
   user='someuser',
   password='somepassword').mode('append').save()
Reply
#2
There must be many people who are writing to the database from python , no one ever wanted to use more than one session to do this?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  PySpark Equivalent Code cpatte7372 0 138 Jan-14-2022, 08:59 PM
Last Post: cpatte7372
  Pyspark - my code works but I want to make it better Kevin 1 329 Dec-01-2021, 05:04 AM
Last Post: Kevin
  pyspark creating temp files in /tmp folder aliyesami 1 448 Oct-16-2021, 05:15 PM
Last Post: aliyesami
Photo Integration of apache spark and Kafka on eclipse pyspark aupres 1 1,568 Feb-27-2021, 08:38 AM
Last Post: Serafim
  KafkaUtils module not found on spark 3 pyspark aupres 2 2,393 Feb-17-2021, 09:40 AM
Last Post: Larz60+
  PySpark Coding Challenge cpatte7372 3 1,963 Feb-14-2021, 04:49 PM
Last Post: ndc85430
  pyspark dataframe to json without header vijz 0 763 Nov-28-2020, 05:36 PM
Last Post: vijz
  Pyspark SQL Error - mismatched input 'FROM' expecting <EOF> Ariean 3 16,592 Nov-20-2020, 03:49 PM
Last Post: Ariean
  file.write not working properly mnh001 11 1,967 Nov-09-2019, 10:20 PM
Last Post: mnh001
  Pyspark "mismatched input FIELDS" Mabooka 1 2,200 Aug-31-2019, 08:51 AM
Last Post: Mabooka

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020