Useful Links – Distributed Computing

Distributed Computing (Data, Hadoop, Elasticsearch and more…)


My Talk about building scalable reporting solutions using Elasticsearch:

Interesting article about running Elasticsearch on AWS:

Good read about Scaling Elasticsearch Writes:

Apache Spark

Spark Architecture (very important to understand):

Spark SQL, we use this a lot to analyse data on HDFS:

Spark Streaming, try it out for real-time stream data processing that requires medium latency:

How to submit Spark jobs from a remote host:


HBase Schema Design by Lars George: