Tagged: hadoop

0

Lookup YARN Acls capacity scheduler queue users from /etc/passwd

Following is an awk script that I use in TextWrangler as a Text Filter. This script generates the required awk and grep commands to lookup /etc/passwd file. #!/bin/sh # gawk ‘{match($0,”([a-zA-Z]+).acl_submit_applications=(.*)”,a); if(a[1] != “”) print a[1] “\t” a[2] }’ #...

0

A Secure HDFS Client Example

Source: http://henning.kropponline.de/2016/02/14/a-secure-hdfs-client-example/ It takes about 3 lines of Java code to write a simple HDFS client that can further be used to upload, read or list files. Here is an example: Configuration conf = new Configuration(); conf.set(“fs.defaultFS”,”hdfs://one.hdp:8020″); FileSystem fs = FileSystem.get(conf);...

0

Clean UNinstall Hortonworks HDP 2.2

Source: https://pulsatingtaurus.wordpress.com/2015/02/15/clean-uninstall-hortonworks-hdp-2-2/   I love Hadoop and Hortonworks is one of my favored Hadoop distributuion. However while experimenting with the hadoop installation, I had many instances when I needed to start afresh on the set of physical as well as virtual Hadoop cluster. Hortonworks provide...

0

How to configure Hue for your Hadoop cluster

Source: http://gethue.com/how-to-configure-hue-in-your-hadoop-cluster/   ue is a lightweight Web server that lets you use Hadoop directly from your browser. Hue is just a ‘view on top of any Hadoop distribution’ and can be installed on any machine. There are multiples ways...

0

How-to: Install Hue on a Mac

Source: http://blog.cloudera.com/blog/2015/04/how-to-install-hue-on-a-mac/ Learn how to set up Hue, the open source GUI that makes Apache Hadoop easier to use, on your Mac. You might have already all the prerequisites installed but we are going to show how to start from...

0

Uninstall Hortonworks HDP 2.2

Source: https://pulsatingtaurus.wordpress.com/2015/02/15/clean-uninstall-hortonworks-hdp-2-2/   I love Hadoop and Hortonworks is one of my favored Hadoop distributuion. However while experimenting with the hadoop installation, I had many instances when I needed to start afresh on the set of physical as well as...

0

Hadoop ecosystem

Source: https://hadoopecosystemtable.github.io/ Distributed Filesystem Apache HDFS The Hadoop Distributed File System (HDFS) offers a way to store large files across multiple machines. Hadoop and HDFS was derived from Google File System (GFS) paper. Prior to Hadoop 2.0.0, the NameNode was...

0

YARN – Overview

Source: http://pivotalhd.cfapps.io/introduction/yarn.html YARN Overview Apache Hadoop has two main components: Distributed Storage Distributed computation The distributed storage is provided by the HDFS, and the MapReduce provides the distributed computation. About YARN YARN (Yet-Another-Resource-Negotiator) is the next-generation Hadoop data-processing framework. YARN...

0

Remotely debug hadoop

Source: http://www.gluster.org/2013/07/deep-dive-into-hadoop-with-bigtop-and-eclipse-remote-debuggers/ Deep dive into Hadoop with Bigtop and Eclipse Remote Debuggers Thanks to a little hack session with bradley childs over at Red Hat this week, I learned a new trick: Remote debugging of JVM (Hadoop + MR2) apps in...

0

mapred Vs. mapreduce

mapred Vs. mapreduce Resources http://stackoverflow.com/questions/7598422/is-it-better-to-use-the-mapred-or-the-mapreduce-package-to-create-a-hadoop-job http://stackoverflow.com/questions/10986633/hadoop-configuration-mapred-vs-mapreduce http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api Related posts: Remotely debug hadoop Creating Hive tables on compressed files Hadoop Hive UDTF Tutorial – Extending Apache Hive with Table Functions ​DistCp Between HA Clusters