Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. It is a SQL abstraction layer for interacting with HBase. Phoenix translates SQL to native HBase API calls. Phoenix provide JDBC/ODBC and Python drivers.
Need for Apache Phoenix
Hive is added into Hadoop Eco-system to maintain and manage structured data in Hadoop and it also provide an SQL like dialect HiveQL to query the tables in Hive data warehouse. But hive doesn’t provide record level operations such as insert, update and delete operations.
To overcome this drawback of hive, HBase is brought into Hadoop eco-system to support record level operations. But HBase shell supports only HBase commands such as ‘scan’, ‘get’, ‘put’, ‘list’ to query entire table, to get a row, put a row and for listing tables etc. respectively. Thus HBase introduced its own set of commands and doesn’t support famous SQL interface.
Use of a well-understood language like SQL makes it easier for people to use HBase. Rather than learn another proprietary API, they can just use the language they’re used to to read and write their data.
Apache Phoenix Features
It is delivered as embedded JDBC driver for HBase data. Follows ANSI SQL standards whenever possible
- Allows columns to be modeled as a multi-part row key or key/value cells.
- Full query support with predicate push down and optimal scan key formation.
- Versioned schema repository. Table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema.
- DDL support: CREATE TABLE, DROP TABLE, and ALTER TABLE for adding/removing columns.
- DML support: UPSERT VALUES for row-by-row insertion, UPSERT SELECT for mass data transfer between the same or different tables, and DELETE for deleting rows.
- Compiles SQL query into a series of HBase scans, and runs those scans in parallel to produce regular JDBC result sets.
- It can seamlessly integrate with HBase, Pig, Flume and Sqoop.
- Joins are not completely supported. FULL OUTER JOIN and CROSS JOIN are not supported.
Enable Phoenix on HDP
On the Hortonworks data platform, there is no separate installation required for Phoenix just you will have to enable it using Ambari admin console.
1. Go to Ambari and select Services tab > HBase > Configs tab.
2. Scroll down to the Phoenix SQL settings.
3. Click the Enable Phoenix slider button.
A pop up will come to write about the change that you are making. Type Enabled Phoenix in it and click Save.
To connect to Phoenix, you need to specify the zookeeper quorum in your Hortonworks Data Platform. You can get it using the Hosts link in Ambari Admin Console. To launch it, execute the following command.
[hbase@dn04 root]$ cd /usr/hdp/current/phoenix-client/bin/
[hbase@dn04 bin]$ ./sqlline.py dn04
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:dn04 none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:dn04
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/phoenix/phoenix-4.7.0.2.6.1.0-129-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
17/11/06 12:45:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/06 12:45:22 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
Connected to: Phoenix (version 4.7)
Driver: PhoenixEmbeddedDriver (version 4.7)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
84/84 (100%) Done
Done
sqlline version 1.1.8
0: jdbc:phoenix:dn04>
Create Phoenix table on existing HBase table
You can create a Phoenix table/view on a pre-existing HBase table. There is no need to move the data to Phoenix or convert it. Apache Phoenix supports table creation and versioned incremental alterations through DDL commands. The table metadata is stored in an HBase table and versioned. You can either create a READ-WRITE table or a READ only view with a condition that the binary representation of the row key and key values must match that of the Phoenix data types. The only addition made to the HBase table is Phoenix coprocessors used for query processing. A table can be created with the same name.
create table "emp" ("empno" VARCHAR primary key,"ProfessionalData"."ename" VARCHAR,
"ProfessionalData"."job" VARCHAR,"ProfessionalData"."mgr" VARCHAR,"ProfessionalData"."hiredate" VARCHAR,
"IncomeData"."sal" VARCHAR,"IncomeData"."comm" VARCHAR,"IncomeData"."deptno" VARCHAR);
0: jdbc:phoenix:dn04> create table "emp" ("empno" VARCHAR primary key,"ProfessionalData"."ename" VARCHAR,
. . . . . . . . . . . . . . . .> "ProfessionalData"."job" VARCHAR,"ProfessionalData"."mgr" VARCHAR,"ProfessionalData"."hiredate" VARCHAR,
. . . . . . . . . . . . . . . .> "IncomeData"."sal" VARCHAR,"IncomeData"."comm" VARCHAR,"IncomeData"."deptno" VARCHAR);
14 rows affected (6.235 seconds)
The DDL used to create the table is case sensitive and if HBase table name is in lowercase, you have to put the name in between double quotes. In HBase, you don’t model the possible KeyValues or the structure of the row key. This is the information you specify in Phoenix and beyond the table and column family.
--Check Table list
0: jdbc:phoenix:dn04> !tables
--Query Table
0: jdbc:phoenix:dn04> select * from "emp";
If you want to change the view from horizontal to vertical, type the following command in the shell and then try to view the data again:
0: jdbc:phoenix:dn04> !outputformat vertical
0: jdbc:phoenix:dn04> select * from "emp";
empno 7369
ename SMITH
job CLERK
mgr 7902
hiredate 17-DEC-80
sal 800
comm
deptno 20
empno 7499
ename ALLEN
job SALESMAN
mgr 7698
hiredate 20-FEB-81
sal 1600
comm 300
deptno 30
...
0: jdbc:phoenix:dn04> !outputformat table
Possible format values: [xmlattr, tsv, xmlelements, csv, vertical, table]
So with all existing HBase tables, you can query them with SQL now. You can point your Business Intelligence tools and Reporting Tools and other tools which work with SQL and query HBase as if it was another SQL database with the help of Phoenix.
Create Phoenix table not existing in HBase table already
0: jdbc:phoenix:dn04> create table test(id integer primary key,name varchar);
--Check using hbase shell
hbase(main):044:0> list
--Insert data into table
0: jdbc:phoenix:dn04> UPSERT INTO test VALUES(1,'Inam');
--select data
0: jdbc:phoenix:dn04> select * from test;
+-----+-------+
| ID | NAME |
+-----+-------+
| 1 | Inam |
+-----+-------+
--Where Clause
0: jdbc:phoenix:dn04> select * from "emp" where "ename"='SMITH';
+--------+--------+--------+-------+------------+------+-------+---------+
| empno | ename | job | mgr | hiredate | sal | comm | deptno |
+--------+--------+--------+-------+------------+------+-------+---------+
| 7369 | SMITH | CLERK | 7902 | 17-DEC-80 | 800 | | 20 |
+--------+--------+--------+-------+------------+------+-------+---------+
1 row selected (0.067 seconds)
--Insert using Select
0: jdbc:phoenix:dn04> upsert into test select to_number("empno"),"ename" from "emp";
14 rows affected (0.065 seconds)
-- use of distinct
0: jdbc:phoenix:dn04> select distinct name from test;
-- Group and group by
0: jdbc:phoenix:dn04> select count(id) from test group by id;
Connect Phoenix via ODBC
You can connect any ODBC applications to HBase via Apache Phoenix ODBC drivers. This enables you to use familiar business intelligence tools to run analysis on big datasets. You need to add the Phoenix Query Server for this purpose.
Go to Hosts in Ambari console, in components section click on Add button and select Phoenix Query Server from the drop down list to add it. Then start this component. After this verify that you can access it. In my environment I've added it on DN04 machine.
Go to the URL (http://dn04:8765/) and you should see the below page.
If you see above page, you are done with the setup.
Now download ODBC drivers pertaining to your environment from the below location and install.
Windows
32-bit: http://public-repo-1.hortonworks.com/HDP/phoenix-odbc/1.0.0.1000/windows/HortonworksPhoenixODBC32.msi
32-bit: http://public-repo-1.hortonworks.com/HDP/phoenix-odbc/1.0.0.1000/windows/HortonworksPhoenixODBC32.msi
Next, in control panel, open ODBC Data Sources Administration(64-bit) and configure DSN for Phoenix by picking Hortonworks Phoenix ODBC Driver.
Now test with any tool of your choice, I tested with Excel. In Excel from the Menu Data toolbar, select Data menu and "From Other Sources" and choose "From Microsoft Query" , then choose DSN created earlier. Set all the properties in the dialog box and return data to Excel. Below diagram depicts all the steps.
References
Phoenix Data Types
Phoenix Functions
Aggregate Functions
AVG COUNT MAX MIN | SUM PERCENTILE_CONT PERCENTILE_DISC PERCENT_RANK | STDDEV_POP STDDEV_SAMP |
String Functions
SUBSTR TRIM LTRIM RTRIM | LENGTH REGEXP_SUBSTR REGEXP_REPLACE UPPER | LOWER REVERSE TO_CHAR |
Time and Date Functions
ROUND TRUNCATE | TO_DATE CURRENT_DATE | CURRENT_TIME |
Other Functions
MD5 INVERT | TO_NUMBER COALESCE |
SQL Grammer
Commands
SELECT UPSERT VALUES UPSERT SELECT DELETE | CREATE DROP ALTER TABLE CREATE INDEX | DROP INDEX ALTER INDEX EXPLAIN |
1 comment:
It is really nice to see the best blog for Hadoop hive .This blog helped me a lot easily understandable too.
Hadoop Training in Velachery | Hadoop Training
Post a Comment