Category: Bigdata

0

Can’t connect Excel to Hive using ODBC driver on MAC

So you done everything right and can’t connect Excel to Hive using ODBC driver on your macOS? Let’s see what is going on. Are you running El Capitan on Sierra? Well I was running Sierra and tried connecting before while...

0

Connecting SQuirrel SQL to Hive

Pre-requisites In order to connect SQuirrel SQL client we need the following prerequisites, Client – http://squirrel-sql.sourceforge.net/ Hive connection JARs (found in lib directories) Hive JDBC JAR – hive-jdbc-1.2.1-standalone.jar Hadoop common JAR (for ) – hadoop-common-2.7.2.jar Running HiveServer2 instance For connections use the following...

0

Creating Hive tables on compressed files

Stuck with creating Hive tables on compressed files? Well the documentation on apache.org suggests that Hive natively supports compressed file – https://cwiki.apache.org/confluence/display/Hive/CompressedStorage Lets try that out. Store a snappy compressed file on HDFS. … thinking, I do not have such file… Wait!...

0

Query escaped JSON string in Hive

There are times when we want to parse a string that is actually a JSON. Usually that could be done with built in functions of Hive such as get_json_object(). Though get_json_object cannot parse JSON Array from my experience. These array...

0

Using JSON SerDe in Hive

Using JsonSerDe in Hive Download JSON Serde – https://github.com/rcongiu/Hive-JSON-Serde Compile command for hive 1.2.1 – “mvn -Pcdh5 -Dcdh5.hive.version=1.2.1 clean package” . change hive version per the environment Copy json-serde/target/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar (or similar) to hive/lib Restart hive Sample JSON with test HiveQLs...

0

HDFS disk consumption – Find what is taking hdfs space

Source: https://community.hortonworks.com/articles/16846/how-to-identify-what-is-consuming-space-in-hdfs.html Script #!/usr/bin/env bash max_depth=5 largest_root_dirs=$(hdfs dfs -du -s ‘/*’ | sort -nr | perl -ane ‘print “$F[1] “‘) printf “%15s %s\n” “bytes” “directory” for ld in $largest_root_dirs; do printf “%15.0f %s\n” $(hdfs dfs -du -s $ld| cut -d’ ‘...

0

Hive statistics using beeline and expect script

Following expect script uses beeline interface to fetch statistics of tables within a database. Use username and queuename with your environment values. #!/usr/bin/expect -f # hive_statistics, v0.1, 2016-05, [email protected] # Usage: ./hive_statistics [database_name] set _database [lindex $argv 0] if {...

0

Fetching Hive schema definitions using Webhcat

Following shell script will get the schema information from Hive using WebHCat server.   #!/bin/sh # fetch_webhcat.sh, v0.1, 2016-04-00, [email protected] # Pre-requisites: jq, curl, python (json.tool) _WEBHCAT_SERVER=”server:50111″ _USER_NAME=”JohnDoe” while [[ $# > 1 ]] do key=”$1″ case $key in -u|–user)...

0

Clean UNinstall Hortonworks HDP 2.2

Source: https://pulsatingtaurus.wordpress.com/2015/02/15/clean-uninstall-hortonworks-hdp-2-2/   I love Hadoop and Hortonworks is one of my favored Hadoop distributuion. However while experimenting with the hadoop installation, I had many instances when I needed to start afresh on the set of physical as well as virtual Hadoop cluster. Hortonworks provide...

0

Uninstall Hortonworks HDP 2.2

Source: https://pulsatingtaurus.wordpress.com/2015/02/15/clean-uninstall-hortonworks-hdp-2-2/   I love Hadoop and Hortonworks is one of my favored Hadoop distributuion. However while experimenting with the hadoop installation, I had many instances when I needed to start afresh on the set of physical as well as...