Kafka Python client

Gave this a go recently and it seems quite decent, feel free to check this out on Github Trying out the Python client with Kafka by ingesting and processing tweets from Twitterhttps://github.com/shenghuahe/tweets-on-kafka0 forks.1 stars.0 open issues.Recent commits: Consumer example and README, Shenghua He Producer working, Shenghua He Initial commit, Richard He Initial commit, GitHub

Read More »

Configure PyCharm CE to work with Apache Spark

This guide should help you to setup PyCharm CE to work with Python3 and Apache Spark (tested with version 2.1) First, Create a new Pure Python PyCharm project. Now copy the content of https://github.com/apache/spark/blob/master/examples/src/main/python/wordcount.py to your project. Your IDE should complain at the following line from pyspark.sql import SparkSession because it doesn’t know where is pyspark.sql which […]

Read More »

Boilerplate – Apache Spark with Spring profile

This is a boilerplate you can try out to get started with Apache Spark (version 2.10) with Spring profile quickly: https://github.com/shenghuahe/sparkwithspringprofile This should allow you to configure environment specific properties (i.e. path to read some input file) really easily.

Read More »

Boilerplate – Groovy and Spock

This is a boilerplate you can try out to get started with Groovy and Spock quickly https://github.com/shenghuahe/groovywithspock. I created this because it can be tricky to find all the right dependencies & plugins to get started with Spock. This should get you started in no time.

Read More »

MySQL in Docker without losing data after rebuild

When it comes down to running database services or anything that has states in it with docker containers, the first question is often “how about my data” after the container is destroyed or rebuilt? The simple answer is you can use Docker Data Volumes. After reading a few articles as well as trying it out […]

Read More »

Run Docker and Docker Compose in a Vagrant box

Created https://github.com/richardhe-awin/vagrant-docker recently which provisions a Vagrant VM (Ubuntu trusty) with everything necessary installed to run docker & docker compose. The main reason I created this is because it gives you an isolated environment to run things without going through the hassle of installing docker & docker compose which can be quite annoying if you are […]

Read More »

How to: Install a Virtual Apache Hadoop Cluster with Vagrant and Cloudera Manager on a Mac

Feel free to skip some of the steps if you already have certain packages installed Get Cask brew install caskroom/cask/brew-cask Get Vagrant & Vagrant plugins brew cask install virtualbox brew cask install vagrant brew cask install vagrant-manager vagrant plugin install vagrant–hostmanager Install Hadoop git clone [email protected]:richardhe-awin/vagrant-hadoop-cluster.git cd vagrant-hadoop-cluster vagrant up Configure Cloudera Manager (mostly referenced from http://blog.cloudera.com/blog/2014/06/how-to-install-a-virtual-apache-hadoop-cluster-with-vagrant-and-cloudera-manager/) […]

Read More »