Big Data Analytics with Hadoop 3
上QQ阅读APP看书,第一时间看更新

SELECT statement syntax

Here's the syntax of Hive's SELECT statement:

SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list]]
[LIMIT number]
;

SELECT is the projection operator in HiveQL. The points are:

  • SELECT scans the table specified by the FROM clause
  • WHERE gives the condition of what to filter
  • GROUP BY gives a list of columns that specifies how to aggregate the records
  • CLUSTER BY, DISTRIBUTE BY, and SORT BY specify the sort order and algorithm
  • LIMIT specifies how many records to retrieve:
Select Description, count(*) as c from OnlineRetail group By Description order by c DESC limit 5;

The following is the hive console showing the query execution:

select * from OnlineRetail limit 5;

The following is the hive console showing the query execution:

select lower(description), quantity from OnlineRetail limit 5;

The following is the hive console showing the query execution: