Tagged: hadoop

August 31, 2017

Hive export to CSV

Bash function to export Hive table data to local CSV file Usage: hive_export_csv <db.table> <output.csv> [queue] Recommendation: Add to .bash_profile hive_export_csv () { if [ -z “$2” ]; then echo “Bad arguments. Usage: ${FUNCNAME[0]} <db.table> <output.csv> [queue]” else uuid=$(uuidgen)...

Bigdata / Hadoop

August 29, 2017

HiveAccessControlException Permission denied user [user] does not have [WRITE] privilege on …

Source: https://community.hortonworks.com/questions/112754/insert-overwrite-directory-beeline.html Error Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [user] does not have [WRITE] privilege on [/tmp/*] (state=42000,code=40000) The above error appears, even though you’ve setup ranger policies, hdfs policies are set up. You’ve checked everything and...

Bigdata / Hadoop

August 16, 2017

How a newline can ruin your Hive

Source: http://marcel.is/how-newline-can-ruin-your-hive/ If you do not fully understand how Hive/Impala stores your data, it might cost you badly. Symptom #1: Weird values in ingested Hive table You double-checked with select distinct(gender) from customers that the gender column in your source RDBMS really contains only values male, female and NULL....

Hadoop

August 15, 2017

Hadoop hdfs tips and tricks

Finding active namenode in a cluster Active namenode in a cluster # lookup active nn nn_list=`hdfs getconf -namenodes` echo Namenodes found: $nn_list active_node=‘’ #for nn in $( hdfs getconf -namenodes ); do for nn in $nn_list ; do echo...

Hadoop

July 25, 2017

Fastest way of compressing file(s) in Hadoop

Compressing files in hadoop Okay, well.. It may or may not be the fastest. Email me if you find a better alternate 😉 Short background, The technique uses simple Pig script Make Pig use tez engine (set the queue name...

Hadoop

June 22, 2017

Ambari REST Api

Ambari configuration over REST Delete user from Ambari Ambari configuration over REST API Need to login to ambari Access below URL,http://ambari-host:8080/api/v1/services/AMBARI/components/AMBARI_SERVER Delete user from Ambari Related posts: Adding compression codec to Hortonworks data platform Permanently add jars to hadoop Best...

Bigdata / Hadoop / Uncategorized

June 14, 2017

Computing memory parameters for Namenode

Source: https://discuss.pivotal.io/hc/en-us/articles/203272527-Namenode-failed-while-loading-fsimage-with-GC-overhead-limit-exceeded Namenode failed while loading fsimage with GC overhead limit exceeded Problem During startup namenode failed to load fsimage into memory 2014-05-14 17:36:56,806 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loading image file /data/hadoop/nn/dfs/name/current/fsimage_0000000000252211550 using no compression 2014-05-14 17:36:56,806 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files =...

Bigdata

May 22, 2017

Public datasets

Source: https://github.com/caesar0301/awesome-public-datasets Awesome Public Datasets This list of public data sources are collected and tidied from blogs, answers, and user responses. Most of the data sets listed below are free, however, some are not. Other amazingly awesome lists can be found...

Bigdata / Hadoop / Programming

May 9, 2017

Parsing sqoop logs for stats analysis

Below python code will help you extract statistics from a set of Sqoop log files for transfer analysis, #!/usr/bin/env python import fnmatch import os import datetime def find_files(directory, pattern): for root, dirs, files in os.walk(directory): for basename in files: if...

Hadoop

April 25, 2017

Moving the Storm Nimbus service from one node to another using Ambari

The current version of Ambari (2.4.2) does not provide an option to move the Storm Nimbus service from one node to another. However, the steps mentioned the following workaround can be used to perform this. Workaround To move Nimbus service...

Tagged: hadoop

Tags

Archives