Tagged: hadoop

0

Hive ORC files – Pro Tips

Extract text from ORC files (source) Hive (0.11 and up) comes with ORC file dump utility. dump can be invoked by following command, $ hive –orcfiledump <location-of-orc-file> Create hive table definition using ORC files on HDFS $ hive –orcfiledump hdfs:///data/location/of/the/ORC/file.orc...

0

Bash alias and functions for hadoop users

Functions Beeline Usage: Beeline username [queuename] export beeline_jdbc=”jdbc:hive2://servername.fqdn:10000″ Beeline(){ if [ -z “$beeline_jdbc” ]; then echo “Error: beeline_jdbc var not available” fi if [ -z “$1” ]; then echo -e “No user specified.\nUsage: Beeline <user> [<queue>]” return 1 fi queue=”default”...

0

Deleting users from Ranger database (mysql)

Once you sync users in Apache Ranger they will stay in the database even if we sync ranger users from a different source. All those users will clutter up the Ranger user interface. Following two scripts will help in deleting...

0

Compiling Hue on CentOS

Tested on CentOS 6.8 Minimal ISO install with Hue 3.11 Downloads Hue 3.11   Steps Download Hue tarball Install dependencies yum install python-devel libffi-devel gcc openldap-devel openssl-devel libxml2-devel libxslt-devel mysql-devel gmp-devel sqlite-devel openldap-devel gcc-c++ rsync Compile You can either compile...

0

Performance of Hive tables with Parquet & ORC

Source: http://stackoverflow.com/questions/32373460/parquet-vs-orc-vs-orc-with-snappy Datasets Table A – Text File Format- 2.5GB Table B – ORC – 652MB Table C – ORC with Snappy – 802MB Table D – Parquet – 1.9 GB Parquet was worst as far as compression for my table...

0

Hadoop security practices

References http://hortonworks.com/hadoop-tutorial/manage-security-policy-hive-hbase-knox-ranger/ http://hortonworks.com/blog/best-practices-in-hdfs-authorization-with-apache-ranger/ http://hortonworks.com/blog/best-practices-for-hive-authorization-using-apache-ranger-in-hdp-2-2/ http://hortonworks.com/blog/author/balajiganesan03/ http://www.slideshare.net/hortonworks/ops-workshop-asrunon20150112 Related posts: Deleting users from Ranger database (mysql) Apache Ranger tips and tid bits Remotely debug hadoop Good looking .hiverc file

0

Connecting SQuirrel SQL to Hive

Pre-requisites In order to connect SQuirrel SQL client we need the following prerequisites, Client – http://squirrel-sql.sourceforge.net/ Hive connection JARs (found in lib directories) Hive JDBC JAR – hive-jdbc-1.2.1-standalone.jar Hadoop common JAR (for ) – hadoop-common-2.7.2.jar Running HiveServer2 instance For connections use the following...

0

Creating Hive tables on compressed files

Stuck with creating Hive tables on compressed files? Well the documentation on apache.org suggests that Hive natively supports compressed file – https://cwiki.apache.org/confluence/display/Hive/CompressedStorage Lets try that out. Store a snappy compressed file on HDFS. … thinking, I do not have such file… Wait!...

0

Query escaped JSON string in Hive

There are times when we want to parse a string that is actually a JSON. Usually that could be done with built in functions of Hive such as get_json_object(). Though get_json_object cannot parse JSON Array from my experience. These array...

0

HDFS disk consumption – Find what is taking hdfs space

Source: https://community.hortonworks.com/articles/16846/how-to-identify-what-is-consuming-space-in-hdfs.html Script #!/usr/bin/env bash max_depth=5 largest_root_dirs=$(hdfs dfs -du -s ‘/*’ | sort -nr | perl -ane ‘print “$F[1] “‘) printf “%15s %s\n” “bytes” “directory” for ld in $largest_root_dirs; do printf “%15.0f %s\n” $(hdfs dfs -du -s $ld| cut -d’ ‘...