Category: Bigdata

0

Hadoop ecosystem

Sources http://www.revelytix.com/?q=content/hadoop-ecosystem https://hadoopecosystemtable.github.io/ Cloudstory.com: 3-part series on Hadoop ecosystem Part 1 Part 2 Part 3 http://thinkbig.teradata.com/leading_big_data_technologies/hadoop/     Related posts: mapred Vs. mapreduce Remotely debug hadoop YARN – Overview Cascading

0

Hadoop ecosystem

Source: https://hadoopecosystemtable.github.io/ Distributed Filesystem Apache HDFS The Hadoop Distributed File System (HDFS) offers a way to store large files across multiple machines. Hadoop and HDFS was derived from Google File System (GFS) paper. Prior to Hadoop 2.0.0, the NameNode was...

0

Open Source Tools

Source: http://www.bigdata-startups.com/open-source-tools/   Bigdata whitepapers http://www.bigdata-startups.com/big-data-white-papers/   Related posts: Cascading A vendor-independent comparison of NoSQL databases: Cassandra, HBase, MongoDB, Riak bigdata tools comparision Hadoop ecosystem

0

Cascading

Source: http://www.cascading.org Cascading is a proven application development platform for building Data applications on Apache Hadoop. Whether solving simple or complex data problems, Cascading balances an optimal level of abstraction with the necessary degrees of freedom through a computation engine, systems...

0

YARN – Overview

Source: http://pivotalhd.cfapps.io/introduction/yarn.html YARN Overview Apache Hadoop has two main components: Distributed Storage Distributed computation The distributed storage is provided by the HDFS, and the MapReduce provides the distributed computation. About YARN YARN (Yet-Another-Resource-Negotiator) is the next-generation Hadoop data-processing framework. YARN...

0

Remotely debug hadoop

Source: http://www.gluster.org/2013/07/deep-dive-into-hadoop-with-bigtop-and-eclipse-remote-debuggers/ Deep dive into Hadoop with Bigtop and Eclipse Remote Debuggers Thanks to a little hack session with bradley childs over at Red Hat this week, I learned a new trick: Remote debugging of JVM (Hadoop + MR2) apps in...

0

mapred Vs. mapreduce

mapred Vs. mapreduce Resources http://stackoverflow.com/questions/7598422/is-it-better-to-use-the-mapred-or-the-mapreduce-package-to-create-a-hadoop-job http://stackoverflow.com/questions/10986633/hadoop-configuration-mapred-vs-mapreduce http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api Related posts: Remotely debug hadoop Creating Hive tables on compressed files Hadoop Hive UDTF Tutorial – Extending Apache Hive with Table Functions ​DistCp Between HA Clusters

0

Sample test data generators

Resources http://www.webresourcesdepot.com/test-sample-data-generators/ http://databene.org/databene-benerator/similar-products.html Tools GenerateData GenerateData is a free, open source script written in JavaScript, PHP and MySQL that lets you quickly generate large volumes of custom data in a variety of formats for use in testing software, populating databases....

0

Name node is in safe mode

Are you seeing something similar to –  mkdir: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/robin/sort-job-in. Name node is in safe mode. Here is what I did, issued following command to take namenode out of safe mode. >hadoop dfsadmin -safemode leave Safe mode...