Hive之UDF开发示例

基本内容介绍
UDF:最简单的自定义,实现一对一,输入一行数据输出一行数据  
UDAF:自定义聚合函数,实现多对一,输入多行数据输出一行数  
UDTF:用来实现一行输入多行输出,这次先不讲 
MaxCompute的UDF包括UDF、UDAF和UDTF三种函数。通常情况下,这三种函数被统称为UDF

1.UDF开发主要流程:

在自己机器的java环境中编辑测试好,然后输出jar包,
再通过console或大数据开发套件将jar包添加成resource,
最后再注册成function

2.具体步骤

01.依赖关系pom.xml      
02.编写一个UDF类,集成UDF并实现evaluate        
03.将工程打包成.jar包形式并上传上传到集群服务器;
04.将jar包添加到Hive中
    add jar  /test/program/hiveudf/function.udf-stringLen.jar;
4.自定义函数
    create temporary function urldecode as '******';    
5.使用自定义函数
select  distinct   id ,strlen(detail)  from  testinfo;
06.使用完,可以删除该自定义函数
   drop temporary function strlen;

    add jar /test/program/hiveudf/function.udf-stringLen.jar;
    create temporary function strlen as 'com.test.hive.StringLen';
    select  distinct   id ,strlen(detail)  from  testinfo;
    drop temporary function strlen;

3.Hive的UDF开发过程

  依赖关系
 <dependency>
   <groupId>org.apache.hadoop</groupId>
   <artifactId>hadoop-common</artifactId>
   <version>${hadoop.version}</version>
   <scope>provided</scope>
 </dependency> 
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec -->
<dependency>
   <groupId>org.apache.hive</groupId>
   <artifactId>hive-exec</artifactId>
   <version>1.2.0</version>
</dependency>    
  开发代码:
    import org.apache.hadoop.hive.ql.exec.UDF;
    import java.net.URLDecoder;
        public final class UrlDecode extends UDF {
            public String evaluate(final String s) {
                if (s == null) { return null; }
                return getString(s);
            }    
            public static String getString(String s) {
                String a;
                try {
                    a = URLDecoder.decode(s,"utf-8");
                } catch ( Exception e) {
                    a = "";
                }
                return a;
            }

            public static void main(String args[]) {
                String t = "%e5%a4%a9%e5%a4%a9%e6%9c%89%e5%a5%bd%e8%bf%90";
                System.out.println( getString(t) );
            }
        }

4.MaxCompute 的UDF

依赖关系
<dependency>
    <groupId>com.aliyun.odps</groupId>
    <artifactId>odps-sdk-udf</artifactId>
    <version>0.20.7</version>
</dependency>
代码编写
package <package名称>;
       import com.aliyun.odps.udf.UDF;
       public final class Lower extends UDF {
       public String evaluate(String s) {
        if (s == null) { return null; }
        return s.toLowerCase();
       }
       }

参考:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
http://bigdatums.net/2016/11/13/how-to-decode-urls-in-hive/
https://sematext.com/opensee/m/Hive/h4wBF18pS9m1aZX541?subj=Re:+slow+performance+when+using+udf
开发和调试UDF
https://help.aliyun.com/document_detail/50902.html?spm=a2c4g.11186623.2.22.19ed75c4kTIYst
JAVA UDF开发
https://help.aliyun.com/document_detail/27811.html?spm=a2c4g.11186623.4.2.35fb6982UC817p

blogroll

social