Hadoop streaming with Python
Want to write a Hadoop program in less than 5 minutes? Get in here for a quick check on how it’s done. We use Python and Hadoop streaming to complete the task.
Knowledge is Power
Want to write a Hadoop program in less than 5 minutes? Get in here for a quick check on how it’s done. We use Python and Hadoop streaming to complete the task.
Trying to make an attempt to build better performing redshift tables… Queries that has lot of wait, DDL recommendations, etc.
So lately I got stumped by not having the ability to extract DDL/ table definition for a table in Redshift. Quick searches on the internet resulted in… below query, SELECT * FROM pg_table_def WHERE tablename = ‘table_name’ AND schemaname =...
Querying hive metastore tables can provide more in depth details on the tables sitting in Hive. This article is a collection of queries that probes Hive metastore configured with mysql to get details like list of transactional tables, etc. More...
Got in a situation where you were asked to extract hive queries and the time they took to execute? Steps On log files run below 2 extracts awk ‘match($0, “^([^ ]+).*Completed executing command\\(queryId=([0-9a-z_-]+)\\); Time taken: (.*)”, a) {print “COMPLETE\t” a[1]...
Below is data storage estimator based on message size and throughput. Input HDFS replication and the amount of data end users will generate over time using raw data. Hope this helps you. Data calculator Message size in bytes Message per...
Source: https://snakebite.readthedocs.io/en/latest/client.html Example: >>> from snakebite.client import Client >>> client = Client(“localhost”, 8020, use_trash=False) >>> for x in client.ls([‘/’]): … print x Warning Many methods return generators, which mean they need to be consumed to execute! Documentation will explicitly...
Okay, so msck repair is not working and you saw something as below, 0: jdbc:hive2://hive_server:10000> msck repair table mytable; Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) It seems to appear because of higher...
Source: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_administration/content/distcp_between_ha_clusters.html DistCp Between HA Clusters To copy data between HA clusters, use the dfs.internal.nameservices property in the hdfs-site.xml file to explicitly specify the name services belonging to the local cluster, while continuing to use the dfs.nameservices property to specify all of the name services in...
Bash function to export Hive table data to local CSV file Usage: hive_export_csv <db.table> <output.csv> [queue] Recommendation: Add to .bash_profile hive_export_csv () { if [ -z “$2” ]; then echo “Bad arguments. Usage: ${FUNCNAME[0]} <db.table> <output.csv> [queue]” else uuid=$(uuidgen)...
More