Tuesday, September 15, 2015

Writting Hive UDF


There are two ways we can write an UDF for Hive

1) Simple UDF
2) GenericUDF

We can use simple UDF when your UDF accepts simple Primitive types of arguments like Text, IntWritable, LongWritable, DoubleWritable, etc.

If you want to have UDF that accepts array,List,Set,Map etc

1) Simple UDF

Simple UDF extends UDF

For Simple UDF we need import statement

import org.apache.hadoop.hive.ql.exec.UDF;

Also We require to add hive-exec-*.jar JAR from $HIVE_HOME/lib along with JARS requirede for running Hadoop Program from $HADOOP_HOME/share/hadoop/common and $HADOOP_HOME/share/hadoop/mapreduce


Following is Sample UDF which accepts Sring value from Hive and returns String Value to Hive




package com.hive.udfs.common;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;


public class SampleUDF extends UDF
{

public Text evaluate(Text input) {
if(input == null) return null;
return new Text("Hello " + input.toString());
}
}

As you can see evaluate function contains ahe actual logic. This function is a must In UDF

To use this UDF in hive we first need to add this jar to our classpath as below :

add jar /home/user/Data/PROJECTS/Assignment/udf/TestUDF.jar;

this is path where JAR file is present.

After adding it to classpath we need to create a temporary function pointing to the class in which we have written our UDF. In our case it is SampleUDF.

create temporary function test_udf as 'com.hive.udfs.common.TestUDF';

Now last step is to access this UDF from Hive

hive> select test_udf(empname) from Employee;

And We are Done!!



2) Generic UDF
----------------------------------------------
GenericUDF extends GenericUDF

For Generic UDF we need import statement

import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;


Also We require to add hive-serde-*.jar JAR from $HIVE_HOME/lib long with JARS requirede for running Hadoop Program from $HADOOP_HOME/share/hadoop/common and $HADOOP_HOME/share/hadoop/mapreduce

Monday, May 25, 2015

How to upgrade R and RStudio


Type below commands in command line

sudo apt-get install libcurl4-openssl-dev
sudo apt-get install libxml2-dev
sudo add-apt-repository ppa:marutter/rrutter
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install r-base r-base-dev

now restart your Rstudio You will see all packages will work.

Note:
If you will face any problem then type

install.packages(“type packages name here”, dependencies=TRUE)

Friday, May 22, 2015

RHive Setup and Sample Code

Congiguration:

install.packages("Rserve")
install.packages("rJava")
install.packages("RHive")
library(Rserve)
library(rJava)
library(RHive)
Sys.setenv(HADOOP_HOME="/home/ravi/apache/hadoop-1.0.4")
Sys.setenv(HIVE_HOME="/home/ravi/apache/hive-0.12.0/")
Sys.setenv()
rhive.env()
rhive.init()
rhive.connect(hiveServer2=FALSE)


If rJava does'nt get installed properly try below command

sudo apt-get install r-cran-rjava
 
sudo updatedb && locate libjvm.so
 
and then try  :

install.packages("rJava")

After that if error is:

Error: java.io.IOException: Mkdirs failed to create file:/home/rhive/lib/2.0-0.2


Sys.setenv("RHIVE_FS_HOME"="your RHive installation directory here e.g. /home/rhive")
This needs to be local directory on the node with hive installed, create one if doesnt exist. The user created (rstudio) have chown -R rights on this local directory.

For other erroes visit the below link :
http://dailyitsolutions.blogspot.in/2014_12_01_archive.html

Problem 2:
Rhive works well with simple queriess but does not work with complex queries or queries with aggregate SQL functions.
We get the following error code on R console(RStudio)
Error: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask 
 OR
Following exception on Hiveserver2 console
org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=EXECUTE, inode="/tmp/hadoop-yarn/staging/anonymous/.staging":trendwise:supergroup:drwx------

Solution:
To resolve the above problem, we need to change the permission mode of the /tmp folder. To do so type,
$hadoop fs -chmod -R 777 /tmp/hadoop-yarn

 Sample Code:


library(Rserve)
library(RHive)
Sys.setenv(HADOOP_HOME="/usr/local/hadoop")
Sys.setenv(HIVE_HOME="/usr/local/hive")
Sys.setenv("RHIVE_FS_HOME"="/home/trendwise/R/x86_64-pc-linux-gnu-library/3.2/RHive")
rhive.env()
rhive.init()
rhive.connect(hiveServer2=TRUE)
rhive.query("use database_name")
rhive.query("select * from table");





Wednesday, May 13, 2015

Installation of Pig

Download Pig from  Here
Go to the downloaded directory.

$ cd Downloads


Untar the  zip file with below command or do right click and extract here

$ tar -xvzf pig-0.14.0.tar.gz



Move extracted pig to created directory you want to install Pig. In my case it is /usr/local/pig


$ sudo mv pig-0.15.0 /usr/local/pig



Set path for PIG_HOME in .bashrc file 

 $ vim.tiny ~/.bashrc



Append below lines at the end of the file


export PIG_HOME=/usr/local/pig/pig-0.14.0
export PATH=$PATH:$PIG_HOME/bin 


Restart Terminal

$ pig

And We are Done!!



By default Pig creates logs in the current directory. If we want those to be crated in a specific directory we will create it and set the property in $PIG_HOME/conf/pig.properties

Create directory for logs wherever you wish. I created in  $PIG_HOME

$ mkdir /usr/local/pig/logs

go to conf directory in $PIG_HOME


$ cd /usr/local/pig/conf/

Open pig.properties and set the path

$ sudo vim.tiny pig.properties 

set the property to pig.logfile and uncomment it.

pig.logfile=/usr/local/pig/logs





Tuesday, May 12, 2015

Installation of Hive

Download Hive from Here


Go to directory where it is downloaded

Untar the downloaded hive by below commnd or right cllick and extract. 

$ tar -xzvf hive-0.10.0.tar.gz
 Move the folder to /usr/local from Downloads or to your preferable folder. 
I prefer /usr/local because my hadoop is installed at /usr/local/hive
 
$ mv apache-hive-1.2.0-bin /usr/local/hive
 
Set the environment variable HIVE_HOME
 
 $ vim.tiny ~/.bashrc

Add the following lines at the bottom of the file
# Set HIVE_HOME
export HIVE_HOME="/usr/lib/hive/apache-hive-0.13.0-bin"
export PATH=$PATH:$HIVE_HOME/bin 

Close and Restart the Terminal


$  Hive
We are Done!!