Tagged: bigdata

0

Computing memory parameters for Namenode

Source: https://discuss.pivotal.io/hc/en-us/articles/203272527-Namenode-failed-while-loading-fsimage-with-GC-overhead-limit-exceeded Namenode failed while loading fsimage with GC overhead limit exceeded Problem During startup namenode failed to load fsimage into memory 2014-05-14 17:36:56,806 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loading image file /data/hadoop/nn/dfs/name/current/fsimage_0000000000252211550 using no compression 2014-05-14 17:36:56,806 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files =...

0

Public datasets

Source: https://github.com/caesar0301/awesome-public-datasets Awesome Public Datasets This list of public data sources are collected and tidied from blogs, answers, and user responses. Most of the data sets listed below are free, however, some are not. Other amazingly awesome lists can be found...

0

Parsing sqoop logs for stats analysis

Below python code will help you extract statistics from a set of Sqoop log files for transfer analysis,   #!/usr/bin/env python import fnmatch import os import datetime def find_files(directory, pattern): for root, dirs, files in os.walk(directory): for basename in files: if...

0

Hive datatype mappings

Hive meaning Teradata meaning .net BCL Type TINYINT 1-byte signed integer, from -128 to 127 ByteInt Represents a 8-bit (1-byte) signed integer. Range: -128 to 127 System.Int16 SMALLINT  2-byte signed integer, from -32,768 to 32,767  SmallInt Represents a 16-bit (2-Byte) signed integer. Range: -32,768 to 32,767...

0

Nested collections in Hive

1, 2 & 3 .. Lets go! 1. SHELL echo “1345653,110909316904:1341894546|221065796761:1341887508” > /tmp/20170317_array_inputfile.txt hdfs dfs -mkdir -p /tmp/20170317/array_test/input hdfs dfs -put /tmp/20170317_array_inputfile.txt /tmp/20170317/array_test/input rm /tmp/20170317_array_inputfile.txt 2. HIVE drop table SAMPLE; CREATE external TABLE SAMPLE( id BIGINT, record array<struct<col1:string,col2:string>> )row format...

0

Connecting to Apache Phoenix

  Syntax used for connecting to Hbase using sqlline through phoenix is as listed below. ./sqlline.py 10.10.20.60:2181:/hbase-unsecure 10.10.20.60 – My Zookeeper 2181 – Zookeeper client port hbase-unsecure – value configured as zookeeper.znode.parent in hbase-site.xml   Sample query /usr/hdp/current/phoenix-client/bin/psql.py zookeeperserver:2181:/hbase-unsecure /usr/hdp/current/phoenix-client/doc/examples/WEB_STAT.sql...

0

Hive Vertex failure

Vertex failure while running Hive queries? Let’s see what can be done… Not sure..Change, hive.fetch.task.conversion=more; to hive.fetch.task.conversion=none;   Was the data on hdfs in ORC files? and error being similar to below? Vertex failed, vertexName= at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java: ) Try changing...

0

Fixing large mysql ibdata1 resulting from ranger audits

Table Partitioning in MySQL: (Version 5.1.6 or above) Note: Before starting backup/restore please stop all running application which usage XA_ACCESS_AUDIT table. this will be help for keeping snapshot of XA_ACCESS_AUDIT for particular timestamp. Table Partitioning in MySQL:- Partitioned tables created...

0

Developing A Custom Apache Nifi Processor (JSON)

Source: http://www.nifi.rocks/developing-a-custom-apache-nifi-processor-json/   Developing A Custom Apache Nifi Processor (JSON) Feb 7, 2015 • Phillip Grenier The list of available Apache Nifi processors is extensive, as documented in this post. There is still a need to develop your own; to pull...

0

Hadoop Hive UDTF Tutorial – Extending Apache Hive with Table Functions

Source: http://beekeeperdata.com/posts/hadoop/2015/07/26/Hive-UDTF-Tutorial.html Author: Matthew Rathbone Co-author: Elena Akhmatova   Article Hadoop Hive UDTF Tutorial – Extending Apache Hive with Table Functions While working with both Primitive types and Embedded Data Structures was discussed in part one, the UDF interfaces are limited to...