Python Forum
SQL Error Message with PySpark
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
SQL Error Message with PySpark
#1
Hello Community,

I'm extremely green to PySpark. I have issued the following command in sql (because I don't know PySpark or Python) and I know that PySpark is built on top of SQL (and I understand SQL).

I am using Jupyter Notebook to run the command.

As you can see from the following command it is written in SQL.

results7 = spark.sql("SELECT\
appl_stock.[Open]\
,appl_stock.[Close]\
FROM dbo.appl_stock\
WHERE appl_stock.[Close] < 500")

The problem is I'm getting the following error when I hit shift + Enter I get the following error:
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
~/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:

~/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    318                     "An error occurred while calling {0}{1}{2}.\n".
--> 319                     format(target_id, ".", name), value)
    320             else:

Py4JJavaError: An error occurred while calling o19.sql.
: org.apache.spark.sql.catalyst.parser.ParseException: 
no viable alternative at input 'appl_stock.['(line 1, pos 19)

== SQL ==
SELECT  appl_stock.[Open] ,appl_stock.[Close]FROM dbo.appl_stockWHERE appl_stock.[Close] < 500
-------------------^^^

	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
	at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
	at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)


During handling of the above exception, another exception occurred:

ParseException                            Traceback (most recent call last)
<ipython-input-52-345afe195a8f> in <module>()
----> 1 results7 = spark.sql("SELECT  appl_stock.[Open] ,appl_stock.[Close]FROM dbo.appl_stockWHERE appl_stock.[Close] < 500")

~/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/session.py in sql(self, sqlQuery)
    539         [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]
    540         """
--> 541         return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
    542 
    543     @since(2.0)

~/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1131         answer = self.gateway_client.send_command(command)
   1132         return_value = get_return_value(
-> 1133             answer, self.gateway_client, self.target_id, self.name)
   1134 
   1135         for temp_arg in temp_args:

~/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
     71                 raise AnalysisException(s.split(': ', 1)[1], stackTrace)
     72             if s.startswith('org.apache.spark.sql.catalyst.parser.ParseException: '):
---> 73                 raise ParseException(s.split(': ', 1)[1], stackTrace)
     74             if s.startswith('org.apache.spark.sql.streaming.StreamingQueryException: '):
     75                 raise StreamingQueryException(s.split(': ', 1)[1], stackTrace)

ParseException: "\nno viable alternative at input 'appl_stock.['(line 1, pos 19)\n\n== SQL ==\nSELECT  appl_stock.[Open] ,appl_stock.[Close]FROM dbo.appl_stockWHERE appl_stock.[Close] < 500\n-------------------^^^\n"
I apologise if I'm not providing enough detail to help me with this problem.

Regards

Carlton
Reply
#2
Any help with this will be greatly appreciated.
Reply
#3
Hello community,

I've nearly figured this out for myself.

I had to change the code as follows:

results7 = spark.sql("SELECT\
appl_stock.[Open]\
,appl_stock.[Close]\
FROM dbo.appl_stock\
WHERE appl_stock.[Close] < 500")
However, that still produced the following error:

Py4JJavaError                             Traceback (most recent call last)
~/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:

~/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    318                     "An error occurred while calling {0}{1}{2}.\n".
--> 319                     format(target_id, ".", name), value)
    320             else:

Py4JJavaError: An error occurred while calling o19.sql.
: org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input '.' expecting {<EOF>, ',', 'FROM', 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 35)

== SQL ==
SELECT  appl_stock.Open  appl_stock.CloseFROM appl_stockWHERE appl_stock.Close < 500
-----------------------------------^^^

	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
	at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)


During handling of the above exception, another exception occurred:

ParseException                            Traceback (most recent call last)
<ipython-input-37-4920f7e68c0e> in <module>()
----> 1 results5 = spark.sql("SELECT  appl_stock.Open  appl_stock.CloseFROM appl_stockWHERE appl_stock.Close < 500")

~/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/session.py in sql(self, sqlQuery)
    539         [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]
    540         """
--> 541         return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
    542 
    543     @since(2.0)

~/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1131         answer = self.gateway_client.send_command(command)
   1132         return_value = get_return_value(
-> 1133             answer, self.gateway_client, self.target_id, self.name)
   1134 
   1135         for temp_arg in temp_args:

~/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
     71                 raise AnalysisException(s.split(': ', 1)[1], stackTrace)
     72             if s.startswith('org.apache.spark.sql.catalyst.parser.ParseException: '):
---> 73                 raise ParseException(s.split(': ', 1)[1], stackTrace)
     74             if s.startswith('org.apache.spark.sql.streaming.StreamingQueryException: '):
     75                 raise StreamingQueryException(s.split(': ', 1)[1], stackTrace)

ParseException: "\nmismatched input '.' expecting {<EOF>, ',', 'FROM', 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 35)\n\n== SQL ==\nSELECT  appl_stock.Open  appl_stock.CloseFROM appl_stockWHERE appl_stock.Close < 500\n-----------------------------------^^^\n"
However, when I executed the command the following code, (same code but all on one line) it was a success:

results7 = spark.sql("SELECT appl_stock.Open ,appl_stock.Close FROM appl_stock WHERE appl_stock.Close < 500")
Can someone shed some light on why the prior code produces the error?

Cheers

Carlton
Reply
#4
This is just an FYI, as I am not familiar with pyspark.

You need to be patient. First the responder has to know about pyspark which limits the possibilities.
Second, when you respond to your own thread, the view count increments, most moderators (and you have to understand this as there are so many posts in a single day) will look at that number and service requests with 0 views first. Seeing a higher number is indicative of the thread being responded to.

just look at the stats so far today: 112 users active in the past 5 minutes (6 members, 0 of whom are invisible, and 103 guests).
Reply
#5
Hi community,

I figured it out myself. The following worked fine.

results5 = spark.sql("SELECT \
appl_stock.Open \
,appl_stock.Close \
FROM appl_stock \
WHERE appl_stock.Close < 500")
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Error message about iid from RandomizedSearchCV Visiting 2 1,038 Aug-17-2023, 07:53 PM
Last Post: Visiting
  PySpark Coding Challenge cpatte7372 4 6,114 Jun-25-2023, 12:56 PM
Last Post: prajwal_0078
  Pyspark dataframe siddhi1919 3 1,235 Apr-25-2023, 12:39 PM
Last Post: snippsat
  Another Error message. the_jl_zone 2 992 Mar-06-2023, 10:23 PM
Last Post: the_jl_zone
  pyspark help lokesh 0 766 Jan-03-2023, 04:34 PM
Last Post: lokesh
  Mysql error message: Lost connection to MySQL server during query tomtom 6 16,141 Feb-09-2022, 09:55 AM
Last Post: ibreeden
  How to iterate Groupby in Python/PySpark DrData82 2 2,849 Feb-05-2022, 09:59 PM
Last Post: DrData82
  PySpark Equivalent Code cpatte7372 0 1,271 Jan-14-2022, 08:59 PM
Last Post: cpatte7372
  Pyspark - my code works but I want to make it better Kevin 1 1,798 Dec-01-2021, 05:04 AM
Last Post: Kevin
  understanding error message krlosbatist 1 1,919 Oct-24-2021, 08:34 PM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020