Category: Bigdata

0

HiveAccessControlException Permission denied user [user] does not have [WRITE] privilege on …

Source: https://community.hortonworks.com/questions/112754/insert-overwrite-directory-beeline.html Error Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [user] does not have [WRITE] privilege on [/tmp/*] (state=42000,code=40000) The above error appears, even though you’ve setup ranger policies, hdfs policies are set up. You’ve checked everything and...

0

How a newline can ruin your Hive

Source: http://marcel.is/how-newline-can-ruin-your-hive/ If you do not fully understand how Hive/Impala stores your data, it might cost you badly. Symptom #1: Weird values in ingested Hive table You double-checked with select distinct(gender) from customers that the gender column in your source RDBMS really contains only values male, female and NULL....

0

Elasticsearch tips and tricks

Find record having max value for a field Get latest record from Elasticsearch Latest record with ES _timestamp value in results Get record count from last x mins Max value GET http://elasticsearch-server:9200/my_index_name_*/_search?size=0 { “aggs” : { “max_timestamp” : { “max”...

0

SmartSense SSL Troubleshooting

Source: https://docs.hortonworks.com/HDPDocuments/SS1/SmartSense-1.4.0/bk_installation/content/SSL_troubleshooting.html SmartSense SSL Troubleshooting SmartSense components use SSL for protecting communications between the HST server and agents, and between the HST server and SmartSense Gateway. If installation issues arise, you can reset these SSL certificates. HST Server To reset the...

0

Computing memory parameters for Namenode

Source: https://discuss.pivotal.io/hc/en-us/articles/203272527-Namenode-failed-while-loading-fsimage-with-GC-overhead-limit-exceeded Namenode failed while loading fsimage with GC overhead limit exceeded Problem During startup namenode failed to load fsimage into memory 2014-05-14 17:36:56,806 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loading image file /data/hadoop/nn/dfs/name/current/fsimage_0000000000252211550 using no compression 2014-05-14 17:36:56,806 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files =...

0

Public datasets

Source: https://github.com/caesar0301/awesome-public-datasets Awesome Public Datasets This list of public data sources are collected and tidied from blogs, answers, and user responses. Most of the data sets listed below are free, however, some are not. Other amazingly awesome lists can be found...

0

Parsing sqoop logs for stats analysis

Below python code will help you extract statistics from a set of Sqoop log files for transfer analysis,   #!/usr/bin/env python import fnmatch import os import datetime def find_files(directory, pattern): for root, dirs, files in os.walk(directory): for basename in files: if...

0

Hive datatype mappings

Hive meaning Teradata meaning .net BCL Type TINYINT 1-byte signed integer, from -128 to 127 ByteInt Represents a 8-bit (1-byte) signed integer. Range: -128 to 127 System.Int16 SMALLINT  2-byte signed integer, from -32,768 to 32,767  SmallInt Represents a 16-bit (2-Byte) signed integer. Range: -32,768 to 32,767...

0

Troubleshooting Hadoop services

Hive Lookup what killed Hive server $ grep –color=always -nr -B 1 ‘Exception|Service:HiveServer2 is started|java.lang.OutOfMemoryError’ /var/log/hive/hiveserver2.log | less -N Above command looks up the log file for exceptions and startup of hive and print one line above the search term....

0

Nested collections in Hive

1, 2 & 3 .. Lets go! 1. SHELL echo “1345653,110909316904:1341894546|221065796761:1341887508” > /tmp/20170317_array_inputfile.txt hdfs dfs -mkdir -p /tmp/20170317/array_test/input hdfs dfs -put /tmp/20170317_array_inputfile.txt /tmp/20170317/array_test/input rm /tmp/20170317_array_inputfile.txt 2. HIVE drop table SAMPLE; CREATE external TABLE SAMPLE( id BIGINT, record array<struct<col1:string,col2:string>> )row format...