There
are two ways we can write an UDF for Hive
1)
Simple UDF
2)
GenericUDF
We
can use simple UDF when your UDF accepts simple Primitive types of
arguments like Text, IntWritable, LongWritable, DoubleWritable, etc.
If
you want to have UDF that accepts array,List,Set,Map etc
1)
Simple UDF
Simple
UDF extends UDF
For
Simple UDF we need import
statement
import
org.apache.hadoop.hive.ql.exec.UDF;
Also
We require to add hive-exec-*.jar JAR from $HIVE_HOME/lib along
with JARS requirede for running Hadoop Program from
$HADOOP_HOME/share/hadoop/common and
$HADOOP_HOME/share/hadoop/mapreduce
Following
is Sample UDF which accepts Sring value from Hive and returns String
Value to Hive
package
com.hive.udfs.common;
import
org.apache.hadoop.hive.ql.exec.UDF;
import
org.apache.hadoop.io.Text;
public
class
SampleUDF extends
UDF
{
public
Text evaluate(Text input)
{
if(input
== null)
return
null;
return
new
Text("Hello
" +
input.toString());
}
}
As
you can see evaluate function contains ahe actual logic. This
function is a must In UDF
To
use this UDF in hive we first need to add this jar to our classpath
as below :
add
jar /home/user/Data/PROJECTS/Assignment/udf/TestUDF.jar;
this
is path where JAR file is present.
After
adding it to classpath we need to create a temporary function
pointing to the class in which we have written our UDF. In our case
it is SampleUDF.
create
temporary function test_udf as 'com.hive.udfs.common.TestUDF';
Now
last step is to access this UDF from Hive
hive>
select test_udf(empname) from Employee;
And
We are Done!!
2)
Generic UDF
----------------------------------------------
GenericUDF
extends GenericUDF
For
Generic UDF we need import statement
import
org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
Also
We require to add hive-serde-*.jar JAR from $HIVE_HOME/lib long with
JARS requirede for running Hadoop Program from
$HADOOP_HOME/share/hadoop/common and
$HADOOP_HOME/share/hadoop/mapreduce