Tagged: hive

October 25, 2016

Hadoop Hive UDTF Tutorial – Extending Apache Hive with Table Functions

Source: http://beekeeperdata.com/posts/hadoop/2015/07/26/Hive-UDTF-Tutorial.html Author: Matthew Rathbone Co-author: Elena Akhmatova Article Hadoop Hive UDTF Tutorial – Extending Apache Hive with Table Functions While working with both Primitive types and Embedded Data Structures was discussed in part one, the UDF interfaces are limited to...

Bigdata / Hadoop / Programming

October 14, 2016

How to create a Hive UDF in Scala

Source: https://community.hortonworks.com/articles/42695/how-to-create-a-hive-udf-in-scala.html This article will focus on creating a custom HIVE UDF in the Scala programming language. Intellij IDEA 2016 was used to create the project and artifacts. Creation and testing of the UDF was performed on the Hortonworks...

Bigdata / Hadoop

September 22, 2016

Hive on Tez Performance Tuning – Determining Reducer Counts

Source: https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.html Short Description: Some practical steps in Hive Tez tuning Article How Does Tez determine the number of reducers? How can I control this for performance? In this article, I will attempt to answer this while executing and tuning...

Bigdata / Hadoop

September 20, 2016

Hive query tips

Date operations Data operations Headers in Beeline Unlock hive tables Check partitions used in hive query Debugging Hive Long (query length) queries submitted to Hive Occurrence of thread printing in hiveserver2 log file Capture classes used in hiveserver2 log...

Bigdata / Hadoop

September 16, 2016

Good looking .hiverc file

Following is the .hiverc from one of the hadoop environments I work on, — additional .jar includes like the one below — add jar hdfs://ualprod/tmp/json-serde-1.3.7-jar-with-dependencies.jar; set hive.exec.dynamic.partition.mode=nonstrict; set hive.auto.convert.join.noconditionaltask=true; set hive.optimize.sort.dynamic.partition=true; set hive.exec.max.dynamic.partitions=100000; set hive.exec.max.dynamic.partitions.pernode=10000; — large mem?? set hive.tez.container.size=10240;...

Bigdata / Hadoop

September 8, 2016

Hive ORC files – Pro Tips

Extract text from ORC files (source) Hive (0.11 and up) comes with ORC file dump utility. dump can be invoked by following command, $ hive –orcfiledump <location-of-orc-file> Create hive table definition using ORC files on HDFS $ hive –orcfiledump...

Hadoop

August 30, 2016

Performance of Hive tables with Parquet & ORC

Source: http://stackoverflow.com/questions/32373460/parquet-vs-orc-vs-orc-with-snappy Datasets Table A – Text File Format- 2.5GB Table B – ORC – 652MB Table C – ORC with Snappy – 802MB Table D – Parquet – 1.9 GB Parquet was worst as far as compression for my table...

Bigdata / Hadoop / Mac

August 8, 2016

Can’t connect Excel to Hive using ODBC driver on MAC

So you done everything right and can’t connect Excel to Hive using ODBC driver on your macOS? Let’s see what is going on. Are you running El Capitan on Sierra? Well I was running Sierra and tried connecting before while...

Bigdata / Hadoop

August 5, 2016

Connecting SQuirrel SQL to Hive

Pre-requisites In order to connect SQuirrel SQL client we need the following prerequisites, Client – http://squirrel-sql.sourceforge.net/ Hive connection JARs (found in lib directories) Hive JDBC JAR – hive-jdbc-1.2.1-standalone.jar Hadoop common JAR (for ) – hadoop-common-2.7.2.jar Running HiveServer2 instance For connections use the following...

Bigdata / Hadoop

July 28, 2016

Creating Hive tables on compressed files

Stuck with creating Hive tables on compressed files? Well the documentation on apache.org suggests that Hive natively supports compressed file – https://cwiki.apache.org/confluence/display/Hive/CompressedStorage Lets try that out. Store a snappy compressed file on HDFS. … thinking, I do not have such file… Wait!...

Tagged: hive

Tags

Archives