数据分析中常用的SQL语句

主要目的

汇总一些分析型SQL操作,为数据分析师们做一些基本的命令总结。包括关系型以及非关系型中的一些数据库,MySQL-Oracle--Hbase-Hive-Spark SQL

数据分析中常用的SQL语句

关系型数据库SQL--MySQL和Orcale
基本查看--查看数据库。表名和表结构-表数据和总数
    show status
    show databases;
    use   your_database;
    show tables;
    DESC your_table;

查询单列或者多列
    select user_birthday from user
查看行以及行数
    select * from your_table limit 3;
        计数:select count(*) from your_table  
排序
    1.只排一列
    select user_birthday from user order by user_birthday;
    select user_birthday from user order by user_birthday DESC;  
    2.多列排序,升序ASC是默认的
    select * from user order by user_id,user_birthday;
    select * from user order by user_id DESC,user_birthday;
    select * from user order by user_id ,user_birthday DESC;

筛选以及过滤
    过滤行以及查找  where  in  not  like
    select user_birthday from user where id>3;
    select user_birthday from user where id like '3%';
去重
     select distinct  user_birthday from user

字符串和数值操作
    拼接:select   name||id   as tiele from tableA ; 
    求和:select sum(field1) as sumvalue from tableA
    平均:select avg(field1) as avgvalue from tableA
    最大:select max(field1) as maxvalue from tableA
    最小:select min(field1) as minvalue from tableA

分组和汇总
    select   count(*)       from tableA  group by  sex;
    select   id,count(*)    from tableA  group by  sex;
    过滤分组
    select   id,count(*)  from tableA  group by  sex having count(*)>2;

嵌套查询
    select   name,sex,id  from tableA where id  in (select   id  from tableB where id >3)

联结查询--以列为单位对表进行联结
    select   name,sex,id  from tableA ,tableB  where tableA.id = tableB.id;
    select   name,sex,id  from tableA  inner join    tableB  on tableA.id = tableB.id;
    select   name,sex,id  from tableA ,tableB,tableC  where tableA.id = tableB.id and  tableC.hot = tableB.hot;

组合查询--以行位单位对表进行操作
    select   name,sex,id  from tableA   union       select   name,sex,id  from tableB  where tableB.id>3;
    select   name,sex,id  from tableA   union all   select   name,sex,id  from tableB  where tableB.id>3;

使用索引以及存储过程
说明
    join 是两张表做交连后里面条件相同的部分记录产生一个记录集, 
    union是产生的两个记录集(字段要一样的)并在一起,成为一个新的记录集

帮助系统
    help
    help index
    ?index
退出
    exit
注意:
    MySQL和Orcale有不一样的地方
    除了多表关联查询,子查询以及表集合查询外,还有层次查询

Hive中基本的数据分析型语句

HQL---进入Hive中类 SQL 查询语言-关系型分析查询作业 
        进入
            su  hdfs
            hbase shell
查看已存在的表
    show databases;
    show tables;
    SHOW TABLES 'page.*';
    查看表中有多少分区:
        show partitions logs;
    查看invites的表的结构
        DESCRIBE invites;
    查看部分行--查询的结果是随机选择的
        SELECT foo FROM invites  limit 3;
      查看分区部分行
        SELECT a.foo FROM invites a limit 3;
查看行以及行数
            SELECT * FROM invites  limit 3;
            计数 select count(*) from invites;
排序
    在分布式中的排序有所不同---全局排序和reduced有关
            order by
            sort by
            distribute by
            cluster By
筛选以及过滤
    SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';

去重查询
    Hive上一个典型表内除重的写法
    select ad ,sum(plus),count(distinct name,id) from  invites;

分组和汇总
    SELECT a.bar, count(*) WHERE a.foo > 0 GROUP BY a.bar;
    SELECT year(ymd), avg(price_close) FROM stocks WHERE exchange = 'NASDAQ' AND symbol = 'AAPL'  GROUP BY year(ymd)  HAVING avg(price_close) > 50.0;
联结
    SELECT sales.*, things.* FROM sales JOIN things ON (sales.id = things.id);
    SELECT sales.*, things.* FROM sales LEFT OUTER JOIN things ON (sales.id = things.id);
帮助系统
    显示所有函数:
        show functions;
    查看函数用法:
        describe function substr;
退出
    exit;

Spark SQL

进入
    spark-sql
支持Hive的查询命令
        SELECT
        GROUP BY
                ORDER BY
                CLUSTER BY
                SORT BY
        All Hive operators, including:
            Relational operators (=, ⇔, ==, <>, <, >, >=, <=, etc)
            Arithmetic operators (+, -, *, /, %, etc)
            Logical operators (AND, &&, OR, ||, etc)
            Complex type constructors
            Mathematical functions (sign, ln, cos, etc)
            String functions (instr, length, printf, etc)
        User defined functions (UDF)
        User defined aggregation functions (UDAF)
        User defined serialization formats (SerDes)
        Window functions
        Joins
                JOIN
                {LEFT|RIGHT|FULL} OUTER JOIN
                LEFT SEMI JOIN
                CROSS JOIN
                例子
                        SELECT people.name FROM people join parquetFile where people.name=parquetFile.name
        Unions
        Sub-queries
            SELECT col FROM ( SELECT a + b AS col from t1) t2

Hbase

查看基本信息
    version  
    status  
    whoami
查看有哪些表  
    list  
    describe 'table' 
查数据内容:  
    scan 'my_test',{LIMIT => 5}
    get  'my_test','rowkey001'
查看数据量--在数据量相对较小的时候用
     count 'tablename', {INTERVAL => 100, CACHE => 500}
退出
    exit

blogroll

social