Hive query tips

by robin · Published September 20, 2016 · Updated December 25, 2019

Date operations
Data operations
Headers in Beeline
Unlock hive tables
Check partitions used in hive query

Debugging Hive

Long (query length) queries submitted to Hive
Occurrence of thread printing in hiveserver2 log file
Capture classes used in hiveserver2 log files
Fetch Hive queries executed from hiveserver2.log file
Hive query commands by count, by user, by type
Errors and warning counts by type

Date operations

When date is stored is YYYY-MM-DD it makes date operations very easy. For example the date comparisons can use string comparison!.

Date comparison

where to_date(your_date_field) >= '2016-07-01' AND to_date(your_date_field) <= '2016-08-31'

Last X days

Use something similar like below in where clause

where to_date(your_date_field) >= date_sub(current_date, 7)

Data operations

Changing delimiter upon export

INSERT OVERWRITE DIRECTORY '/user/hadoop/output'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY "sq|"
SELECT * FROM graph_edges;

Headers in Beeline

To enable headers > !set showHeader false

To set header width > !set headerInterval 50

Unlock hive table

set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager; 
unlock table tablename;

Check partitions used in hive query

> explain dependency <query that runs on a table with partitions>

Long queries submitted to Hive

Search in hiveserver2.log files in bash using awk

$ awk -F": " '/Starting command\(/  {print length ($3) "\t" $2}' hiveserver2.log | sort -nr | head -5

Occurence of thread printing in hiveserver2.log file

$ awk 'match($0,"(Thread-[0-9]+)",a) {print a[1]}' hiveserver2.log | sort | uniq -c | sort -n | less

Capture classes used in hiveserver2 log files

$ awk '/^20/ && $4 ~ /^org/ {print $1,$2,$3,$4 }'

Fetch Hive queries executed from hiveserver2.log file

$ awk '/Starting command\(/{print}' hiveserver2.log

Hive query commands by count, by user, by type

$ awk '/org.apache.hadoop.hive.metastore.HiveMetaStore.audit/ &&  match($0,/^([^ ]+) ([0-9][0-9]).* ugi=([a-z0-9]+).*cmd=([^:]+).*$/,a) { print a[1],a[2], a[3], a[4] }' hiveserver2.log  | head -n 1000 | sort | uniq -c

Errors and warning counts by type

$ awk '/ERROR|WARN/ &&  match($0,/^([^ ]+) ([0-9][0-9])[^ ]+ ([A-Z]+)/,a) {print a[3],a[1], a[2]}' hiveserver2.log | sort | uniq -c

More tricks

http://www.wmanalytics.io/blog/four-useful-tricks-working-hive
Set number of reducers – http://stackoverflow.com/questions/8762064/hive-unable-to-manually-set-number-of-reducers
Performance tuning Hive on Tez – https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.html

Tags: hadoop hive

Hive query tips

Date operations

Data operations

Headers in Beeline

Unlock hive table

Check partitions used in hive query

Long queries submitted to Hive

Occurence of thread printing in hiveserver2.log file

Capture classes used in hiveserver2 log files

Fetch Hive queries executed from hiveserver2.log file

Hive query commands by count, by user, by type

Errors and warning counts by type

More tricks

You may also like...

Leave a Reply Cancel reply

Archives

Hive query tips

Date operations

Data operations

Headers in Beeline

Unlock hive table

Check partitions used in hive query

Long queries submitted to Hive

Occurence of thread printing in hiveserver2.log file

Capture classes used in hiveserver2 log files

Fetch Hive queries executed from hiveserver2.log file

Hive query commands by count, by user, by type

Errors and warning counts by type

More tricks

Related posts:

You may also like...

mapred Vs. mapreduce

Hadoop security practices

Hadoop Hive UDTF Tutorial – Extending Apache Hive with Table Functions

Leave a Reply Cancel reply

Tags

Archives