This guide should help you to setup PyCharm CE to work with Python3 and Apache Spark (tested with version 2.1)
First, Create a new Pure Python PyCharm project.
Now copy the content of https://github.com/apache/spark/blob/master/examples/src/main/python/wordcount.py to your project. Your IDE should complain at the following line
from pyspark.sql import SparkSession
because it doesn’t know where is pyspark.sql which is part of the Python Spark library.
In order to tell PyCharm where the Python Spark libraries are, you need to go to Preferences->Project->Project Structure and add the zip files under $SPARK_HOME/python/lib to the content root. $SPARK_HOME is the location of your Apache Spark directory. If you haven’t downloaded Apache Spark, you can download it here http://spark.apache.org/downloads.html
Next, go to Run -> Edit Configurations
and create a new configuration using the default Python configuration profile and add the following environment variables,
PYSPARK_PYTHON=python3 SPARK_HOME=<your spark home dir> PYTHONPATH=<your spark home dir>/python
Then specify the name of your main .py script and the location of your text file where you want the words to be counted.
Finally, run your new configuration and it should do a word count job using Apache Spark.