SQL Error Message with PySpark

cpatte7372 · Jul-25-2018, 10:10 PM

Hello Community,

I'm extremely green to PySpark. I have issued the following command in sql (because I don't know PySpark or Python) and I know that PySpark is built on top of SQL (and I understand SQL).

I am using Jupyter Notebook to run the command.

As you can see from the following command it is written in SQL.

results7 = spark.sql("SELECT\
appl_stock.[Open]\
,appl_stock.[Close]\
FROM dbo.appl_stock\
WHERE appl_stock.[Close] < 500")

The problem is I'm getting the following error when I hit shift + Enter I get the following error:

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
~/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:

~/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    318                     "An error occurred while calling {0}{1}{2}.\n".
--> 319                     format(target_id, ".", name), value)
    320             else:

Py4JJavaError: An error occurred while calling o19.sql.
: org.apache.spark.sql.catalyst.parser.ParseException: 
no viable alternative at input 'appl_stock.['(line 1, pos 19)

== SQL ==
SELECT  appl_stock.[Open] ,appl_stock.[Close]FROM dbo.appl_stockWHERE appl_stock.[Close] < 500
-------------------^^^

	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
	at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
	at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)


During handling of the above exception, another exception occurred:

ParseException                            Traceback (most recent call last)
<ipython-input-52-345afe195a8f> in <module>()
----> 1 results7 = spark.sql("SELECT  appl_stock.[Open] ,appl_stock.[Close]FROM dbo.appl_stockWHERE appl_stock.[Close] < 500")

~/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/session.py in sql(self, sqlQuery)
    539         [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]
    540         """
--> 541         return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
    542 
    543     @since(2.0)

~/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1131         answer = self.gateway_client.send_command(command)
   1132         return_value = get_return_value(
-> 1133             answer, self.gateway_client, self.target_id, self.name)
   1134 
   1135         for temp_arg in temp_args:

~/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
     71                 raise AnalysisException(s.split(': ', 1)[1], stackTrace)
     72             if s.startswith('org.apache.spark.sql.catalyst.parser.ParseException: '):
---> 73                 raise ParseException(s.split(': ', 1)[1], stackTrace)
     74             if s.startswith('org.apache.spark.sql.streaming.StreamingQueryException: '):
     75                 raise StreamingQueryException(s.split(': ', 1)[1], stackTrace)

ParseException: "\nno viable alternative at input 'appl_stock.['(line 1, pos 19)\n\n== SQL ==\nSELECT  appl_stock.[Open] ,appl_stock.[Close]FROM dbo.appl_stockWHERE appl_stock.[Close] < 500\n-------------------^^^\n"

I apologise if I'm not providing enough detail to help me with this problem.

Regards

Carlton

cpatte7372 · Jul-26-2018, 07:38 AM

Any help with this will be greatly appreciated.

cpatte7372 · Jul-26-2018, 08:51 AM

Hello community,

I've nearly figured this out for myself.

I had to change the code as follows:

results7 = spark.sql("SELECT\
appl_stock.[Open]\
,appl_stock.[Close]\
FROM dbo.appl_stock\
WHERE appl_stock.[Close] < 500")

However, that still produced the following error:

Py4JJavaError                             Traceback (most recent call last)
~/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:

~/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    318                     "An error occurred while calling {0}{1}{2}.\n".
--> 319                     format(target_id, ".", name), value)
    320             else:

Py4JJavaError: An error occurred while calling o19.sql.
: org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input '.' expecting {<EOF>, ',', 'FROM', 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 35)

== SQL ==
SELECT  appl_stock.Open  appl_stock.CloseFROM appl_stockWHERE appl_stock.Close < 500
-----------------------------------^^^

	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:99)
	at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:45)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)


During handling of the above exception, another exception occurred:

ParseException                            Traceback (most recent call last)
<ipython-input-37-4920f7e68c0e> in <module>()
----> 1 results5 = spark.sql("SELECT  appl_stock.Open  appl_stock.CloseFROM appl_stockWHERE appl_stock.Close < 500")

~/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/session.py in sql(self, sqlQuery)
    539         [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]
    540         """
--> 541         return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
    542 
    543     @since(2.0)

~/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1131         answer = self.gateway_client.send_command(command)
   1132         return_value = get_return_value(
-> 1133             answer, self.gateway_client, self.target_id, self.name)
   1134 
   1135         for temp_arg in temp_args:

~/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
     71                 raise AnalysisException(s.split(': ', 1)[1], stackTrace)
     72             if s.startswith('org.apache.spark.sql.catalyst.parser.ParseException: '):
---> 73                 raise ParseException(s.split(': ', 1)[1], stackTrace)
     74             if s.startswith('org.apache.spark.sql.streaming.StreamingQueryException: '):
     75                 raise StreamingQueryException(s.split(': ', 1)[1], stackTrace)

ParseException: "\nmismatched input '.' expecting {<EOF>, ',', 'FROM', 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 35)\n\n== SQL ==\nSELECT  appl_stock.Open  appl_stock.CloseFROM appl_stockWHERE appl_stock.Close < 500\n-----------------------------------^^^\n"

However, when I executed the command the following code, (same code but all on one line) it was a success:

results7 = spark.sql("SELECT appl_stock.Open ,appl_stock.Close FROM appl_stock WHERE appl_stock.Close < 500")

Can someone shed some light on why the prior code produces the error?

Cheers

Carlton

**Larz60+** · (This post was last modified: Jul-26-2018, 09:36 AM by Larz60+.)

This is just an FYI, as I am not familiar with pyspark.

You need to be patient. First the responder has to know about pyspark which limits the possibilities.
Second, when you respond to your own thread, the view count increments, most moderators (and you have to understand this as there are so many posts in a single day) will look at that number and service requests with 0 views first. Seeing a higher number is indicative of the thread being responded to.

just look at the stats so far today: 112 users active in the past 5 minutes (6 members, 0 of whom are invisible, and 103 guests).

cpatte7372 · Jul-28-2018, 12:08 PM

Hi community,

I figured it out myself. The following worked fine.

results5 = spark.sql("SELECT \
appl_stock.Open \
,appl_stock.Close \
FROM appl_stock \
WHERE appl_stock.Close < 500")

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Error message about iid from RandomizedSearchCV	Visiting	2	2,271	Aug-17-2023, 07:53 PM Last Post: Visiting
	PySpark Coding Challenge	cpatte7372	4	8,477	Jun-25-2023, 12:56 PM Last Post: prajwal_0078
	Pyspark dataframe	siddhi1919	3	2,148	Apr-25-2023, 12:39 PM Last Post: snippsat
	Another Error message.	the_jl_zone	2	1,645	Mar-06-2023, 10:23 PM Last Post: the_jl_zone
	pyspark help	lokesh	0	1,193	Jan-03-2023, 04:34 PM Last Post: lokesh
	Mysql error message: Lost connection to MySQL server during query	tomtom	6	21,342	Feb-09-2022, 09:55 AM Last Post: ibreeden
	How to iterate Groupby in Python/PySpark	DrData82	2	3,957	Feb-05-2022, 09:59 PM Last Post: DrData82
	PySpark Equivalent Code	cpatte7372	0	1,724	Jan-14-2022, 08:59 PM Last Post: cpatte7372
	Pyspark - my code works but I want to make it better	Kevin	1	2,370	Dec-01-2021, 05:04 AM Last Post: Kevin
	understanding error message	krlosbatist	1	2,670	Oct-24-2021, 08:34 PM Last Post: Gribouillis

SQL Error Message with PySpark

User Panel Messages

Announcements