Tagged: hadoop

September 8, 2016

Hive ORC files – Pro Tips

Extract text from ORC files (source) Hive (0.11 and up) comes with ORC file dump utility. dump can be invoked by following command, $ hive –orcfiledump <location-of-orc-file> Create hive table definition using ORC files on HDFS $ hive –orcfiledump hdfs:///data/location/of/the/ORC/file.orc...

Hadoop

September 2, 2016

Bash alias and functions for hadoop users

Functions Beeline Usage: Beeline username [queuename] export beeline_jdbc=”jdbc:hive2://servername.fqdn:10000″ Beeline(){ if [ -z “$beeline_jdbc” ]; then echo “Error: beeline_jdbc var not available” fi if [ -z “$1” ]; then echo -e “No user specified.\nUsage: Beeline <user> [<queue>]” return 1 fi queue=”default”...

Bigdata / Hadoop

September 2, 2016

Deleting users from Ranger database (mysql)

Once you sync users in Apache Ranger they will stay in the database even if we sync ranger users from a different source. All those users will clutter up the Ranger user interface. Following two scripts will help in deleting...

Hadoop

August 31, 2016

Compiling Hue on CentOS

Tested on CentOS 6.8 Minimal ISO install with Hue 3.11 Downloads Hue 3.11 Steps Download Hue tarball Install dependencies yum install python-devel libffi-devel gcc openldap-devel openssl-devel libxml2-devel libxslt-devel mysql-devel gmp-devel sqlite-devel openldap-devel gcc-c++ rsync Compile You can either compile...

Hadoop

August 30, 2016

Performance of Hive tables with Parquet & ORC

Source: http://stackoverflow.com/questions/32373460/parquet-vs-orc-vs-orc-with-snappy Datasets Table A – Text File Format- 2.5GB Table B – ORC – 652MB Table C – ORC with Snappy – 802MB Table D – Parquet – 1.9 GB Parquet was worst as far as compression for my table...

Bigdata / Hadoop

August 17, 2016

Hadoop security practices

References http://hortonworks.com/hadoop-tutorial/manage-security-policy-hive-hbase-knox-ranger/ http://hortonworks.com/blog/best-practices-in-hdfs-authorization-with-apache-ranger/ http://hortonworks.com/blog/best-practices-for-hive-authorization-using-apache-ranger-in-hdp-2-2/ http://hortonworks.com/blog/author/balajiganesan03/ http://www.slideshare.net/hortonworks/ops-workshop-asrunon20150112 Related posts: Deleting users from Ranger database (mysql) Apache Ranger tips and tid bits Remotely debug hadoop Good looking .hiverc file

Bigdata / Hadoop

August 5, 2016

Connecting SQuirrel SQL to Hive

Pre-requisites In order to connect SQuirrel SQL client we need the following prerequisites, Client – http://squirrel-sql.sourceforge.net/ Hive connection JARs (found in lib directories) Hive JDBC JAR – hive-jdbc-1.2.1-standalone.jar Hadoop common JAR (for ) – hadoop-common-2.7.2.jar Running HiveServer2 instance For connections use the following...

Bigdata / Hadoop

July 28, 2016

Creating Hive tables on compressed files

Stuck with creating Hive tables on compressed files? Well the documentation on apache.org suggests that Hive natively supports compressed file – https://cwiki.apache.org/confluence/display/Hive/CompressedStorage Lets try that out. Store a snappy compressed file on HDFS. … thinking, I do not have such file… Wait!...

Bigdata / Hadoop

July 27, 2016

Query escaped JSON string in Hive

There are times when we want to parse a string that is actually a JSON. Usually that could be done with built in functions of Hive such as get_json_object(). Though get_json_object cannot parse JSON Array from my experience. These array...

Bigdata / Hadoop

July 8, 2016

HDFS disk consumption – Find what is taking hdfs space

Source: https://community.hortonworks.com/articles/16846/how-to-identify-what-is-consuming-space-in-hdfs.html Script #!/usr/bin/env bash max_depth=5 largest_root_dirs=$(hdfs dfs -du -s ‘/*’ | sort -nr | perl -ane ‘print “$F[1] “‘) printf “%15s %s\n” “bytes” “directory” for ld in $largest_root_dirs; do printf “%15.0f %s\n” $(hdfs dfs -du -s $ld| cut -d’ ‘...

Tagged: hadoop

Tags

Archives