Hive query tips
- Date operations
- Data operations
- Headers in Beeline
- Unlock hive tables
- Check partitions used in hive query
Debugging Hive
- Long (query length) queries submitted to Hive
- Occurrence of thread printing in hiveserver2 log file
- Capture classes used in hiveserver2 log files
- Fetch Hive queries executed from hiveserver2.log file
- Hive query commands by count, by user, by type
- Errors and warning counts by type
Date operations
When date is stored is YYYY-MM-DD it makes date operations very easy. For example the date comparisons can use string comparison!.
Date comparison
where to_date(your_date_field) >= '2016-07-01' AND to_date(your_date_field) <= '2016-08-31'
Last X days
Use something similar like below in where clause
where to_date(your_date_field) >= date_sub(current_date, 7)
Data operations
Changing delimiter upon export
INSERT OVERWRITE DIRECTORY '/user/hadoop/output' ROW FORMAT DELIMITED FIELDS TERMINATED BY "sq|" SELECT * FROM graph_edges;
Headers in Beeline
To enable headers > !set showHeader false
> !set headerInterval 50
Unlock hive table
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager; unlock table tablename;
Check partitions used in hive query
> explain dependency <query that runs on a table with partitions>
Long queries submitted to Hive
Search in hiveserver2.log files in bash using awk
$ awk -F": " '/Starting command\(/ {print length ($3) "\t" $2}' hiveserver2.log | sort -nr | head -5
Occurence of thread printing in hiveserver2.log file
$ awk 'match($0,"(Thread-[0-9]+)",a) {print a[1]}' hiveserver2.log | sort | uniq -c | sort -n | less
Capture classes used in hiveserver2 log files
$ awk '/^20/ && $4 ~ /^org/ {print $1,$2,$3,$4 }'
Fetch Hive queries executed from hiveserver2.log file
$ awk '/Starting command\(/{print}' hiveserver2.log
Hive query commands by count, by user, by type
$ awk '/org.apache.hadoop.hive.metastore.HiveMetaStore.audit/ && match($0,/^([^ ]+) ([0-9][0-9]).* ugi=([a-z0-9]+).*cmd=([^:]+).*$/,a) { print a[1],a[2], a[3], a[4] }' hiveserver2.log | head -n 1000 | sort | uniq -c
Errors and warning counts by type
$ awk '/ERROR|WARN/ && match($0,/^([^ ]+) ([0-9][0-9])[^ ]+ ([A-Z]+)/,a) {print a[3],a[1], a[2]}' hiveserver2.log | sort | uniq -c
More tricks
- http://www.wmanalytics.io/blog/four-useful-tricks-working-hive
- Set number of reducers – http://stackoverflow.com/questions/8762064/hive-unable-to-manually-set-number-of-reducers
- Performance tuning Hive on Tez – https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.html