Published: 2018-08-12 20:01:00
By ytwan
In Big Data .
tags: things
基本内容介绍
UDF:最简单的自定义,实现一对一,输入一行数据输出一行数据
UDAF:自定义聚合函数,实现多对一,输入多行数据输出一行数
UDTF:用来实现一行输入多行输出,这次先不讲
MaxCompute的UDF包括UDF、UDAF和UDTF三种函数。通常情况下,这三种函数被统称为UDF
1.UDF开发主要流程:
在自己机器的java环境中编辑测试好,然后输出jar包,
再通过console或大数据开发套件将jar包添加成resource,
最后再注册成function
2.具体步骤
01.依赖关系pom.xml
02.编写一个UDF类,集成UDF并实现evaluate
03.将工程打包成.jar包形式并上传上传到集群服务器;
04.将jar包添加到Hive中
add jar /test/program/hiveudf/function.udf-stringLen.jar;
4.自定义函数
create temporary function urldecode as '******';
5.使用自定义函数
select distinct id ,strlen(detail) from testinfo;
06.使用完,可以删除该自定义函数
drop temporary function strlen;
add jar /test/program/hiveudf/function.udf-stringLen.jar;
create temporary function strlen as 'com.test.hive.StringLen';
select distinct id ,strlen(detail) from testinfo;
drop temporary function strlen;
3.Hive的UDF开发过程
依赖关系
< dependency >
< groupId > org . apache . hadoop </ groupId >
< artifactId > hadoop - common </ artifactId >
< version > $ { hadoop . version } </ version >
< scope > provided </ scope >
</ dependency >
< ! -- https : // mvnrepository . com / artifact / org . apache . hive / hive - exec -->
< dependency >
< groupId > org . apache . hive </ groupId >
< artifactId > hive - exec </ artifactId >
< version > 1.2 . 0 </ version >
</ dependency >
开发代码:
import org.apache.hadoop.hive.ql.exec.UDF ;
import java.net.URLDecoder ;
public final class UrlDecode extends UDF {
public String evaluate ( final String s ) {
if ( s == null ) { return null ; }
return getString ( s );
}
public static String getString ( String s ) {
String a ;
try {
a = URLDecoder . decode ( s , "utf-8" );
} catch ( Exception e ) {
a = "" ;
}
return a ;
}
public static void main ( String args []) {
String t = " %e 5%a4%a9 %e 5%a4%a9 %e 6 %9c%89% e5%a5%bd %e 8%bf%90" ;
System . out . println ( getString ( t ) );
}
}
4.MaxCompute 的UDF
依赖关系
< dependency >
< groupId > com . aliyun . odps </ groupId >
< artifactId > odps - sdk - udf </ artifactId >
< version > 0.20 . 7 </ version >
</ dependency >
代码编写
package < package 名称 > ;
import com.aliyun.odps.udf.UDF ;
public final class Lower extends UDF {
public String evaluate ( String s ) {
if ( s == null ) { return null ; }
return s . toLowerCase ();
}
}
参考:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
http://bigdatums.net/2016/11/13/how-to-decode-urls-in-hive/
https://sematext.com/opensee/m/Hive/h4wBF18pS9m1aZX541?subj=Re:+slow+performance+when+using+udf
开发和调试UDF
https://help.aliyun.com/document_detail/50902.html?spm=a2c4g.11186623.2.22.19ed75c4kTIYst
JAVA UDF开发
https://help.aliyun.com/document_detail/27811.html?spm=a2c4g.11186623.4.2.35fb6982UC817p