Just enough configurations for Hadoop applications

DISCLAIMER: as of May 25, 2017, I have moved the instructions of creating a 3-node CDH Hadoop cluster to this wiki. Please follow this link if you want to setup. Below I will re-purpose this blog for illustrating the examples of arena-dev-cdh-hadoop with minimal configurations.

This blog is for developers who write application code (client) that accesses CDH Hadoop system.

Making a client application talk to a CDH Hadoop system, can be a small or big effort, depending on what you want to achieve. If you just want your connect to work with Hadoop, a simple approach is to deploy your client application on a Hadoop gateway node so you get all Hadoop services’ client configurations, as well as refreshed configurations when they change, and just include all configurations files in your classpath (or just utilize the system environment variables). You do not have to understand how your client is able to talk to HBase, it just does, because all Hadoop services’ client configurations are available to your client.

But, many developers are not satisfied that the client is able to talk to Hadoop, but also want to know exactly what configurations makes that happen.

Here is the purpose of this blog: We will illustrate a series of examples. Each example has one specific purpose (such as to create HBase table and read and write, with simple authentication etc.). For each example, we demonstrate and explain the set of configurations that are just enough to make the example work.

Of course, we can’t do this without a CDH Hadoop cluster, for you and me to understand each other, please follow the wiki to setup a 3-node CDH Hadoop cluster first.

Pig Example (Kerberos)

[TODO]

Spark Example

[TODO]

Spark Example (Kerberos)

[TODO]

Spark Remote Client Example

[TODO]

Spark Remote Client Example (Kerberos)

[TODO]

Just enough configurations for Hadoop applications

HBase Client Example

HBase Client Example (Kerberos)

Pig Example

Pig Example (Kerberos)

Spark Example

Spark Example (Kerberos)

Spark Remote Client Example

Spark Remote Client Example (Kerberos)