Using Apache Phoenix on HDP

Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. It is a SQL abstraction layer for interacting with HBase. Phoenix translates SQL to native HBase API calls. Phoenix provide JDBC/ODBC and Python drivers.

Need for Apache Phoenix

Hive is added into Hadoop Eco-system to maintain and manage structured data in Hadoop and it also provide an SQL like dialect HiveQL to query the tables in Hive data warehouse. But hive doesn’t provide record level operations such as insert, update and delete operations.

To overcome this drawback of hive, HBase is brought into Hadoop eco-system to support record level operations. But HBase shell supports only HBase commands such as ‘scan’, ‘get’, ‘put’, ‘list’ to query entire table, to get a row, put a row and for listing tables etc. respectively. Thus HBase introduced its own set of commands and doesn’t support famous SQL interface.

Use of a well-understood language like SQL makes it easier for people to use HBase. Rather than learn another proprietary API, they can just use the language they’re used to to read and write their data.

Apache Phoenix Features

It is delivered as embedded JDBC driver for HBase data. Follows ANSI SQL standards whenever possible

Allows columns to be modeled as a multi-part row key or key/value cells.
Full query support with predicate push down and optimal scan key formation.
Versioned schema repository. Table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema.
DDL support: CREATE TABLE, DROP TABLE, and ALTER TABLE for adding/removing columns.
DML support: UPSERT VALUES for row-by-row insertion, UPSERT SELECT for mass data transfer between the same or different tables, and DELETE for deleting rows.
Compiles SQL query into a series of HBase scans, and runs those scans in parallel to produce regular JDBC result sets.
It can seamlessly integrate with HBase, Pig, Flume and Sqoop.
Joins are not completely supported. FULL OUTER JOIN and CROSS JOIN are not supported.

Enable Phoenix on HDP

On the Hortonworks data platform, there is no separate installation required for Phoenix just you will have to enable it using Ambari admin console.

1. Go to Ambari and select Services tab > HBase > Configs tab.
2. Scroll down to the Phoenix SQL settings.
3. Click the Enable Phoenix slider button.

A pop up will come to write about the change that you are making. Type Enabled Phoenix in it and click Save.

Luunch Phoenix Shell

To connect to Phoenix, you need to specify the zookeeper quorum in your Hortonworks Data Platform. You can get it using the Hosts link in Ambari Admin Console. To launch it, execute the following command.

[hbase@dn04 root]$ cd /usr/hdp/current/phoenix-client/bin/

[hbase@dn04 bin]$ ./sqlline.py dn04

Setting property: [incremental, false]

Setting property: [isolation, TRANSACTION_READ_COMMITTED]

issuing: !connect jdbc:phoenix:dn04 none none org.apache.phoenix.jdbc.PhoenixDriver

Connecting to jdbc:phoenix:dn04

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/phoenix/phoenix-4.7.0.2.6.1.0-129-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

17/11/06 12:45:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

17/11/06 12:45:22 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

Connected to: Phoenix (version 4.7)

Driver: PhoenixEmbeddedDriver (version 4.7)

Autocommit status: true

Transaction isolation: TRANSACTION_READ_COMMITTED

Building list of tables and columns for tab-completion (set fastconnect to true to skip)...

84/84 (100%) Done

Done

sqlline version 1.1.8

0: jdbc:phoenix:dn04>

Create Phoenix table on existing HBase table

You can create a Phoenix table/view on a pre-existing HBase table. There is no need to move the data to Phoenix or convert it. Apache Phoenix supports table creation and versioned incremental alterations through DDL commands. The table metadata is stored in an HBase table and versioned. You can either create a READ-WRITE table or a READ only view with a condition that the binary representation of the row key and key values must match that of the Phoenix data types. The only addition made to the HBase table is Phoenix coprocessors used for query processing. A table can be created with the same name.

Create a Phoenix table from existing HBase table by writing a code like this:

create table "emp" ("empno" VARCHAR primary key,"ProfessionalData"."ename" VARCHAR,

"ProfessionalData"."job" VARCHAR,"ProfessionalData"."mgr" VARCHAR,"ProfessionalData"."hiredate" VARCHAR,

"IncomeData"."sal" VARCHAR,"IncomeData"."comm" VARCHAR,"IncomeData"."deptno" VARCHAR);

0: jdbc:phoenix:dn04> create table "emp" ("empno" VARCHAR primary key,"ProfessionalData"."ename" VARCHAR,

. . . . . . . . . . . . . . . .> "ProfessionalData"."job" VARCHAR,"ProfessionalData"."mgr" VARCHAR,"ProfessionalData"."hiredate" VARCHAR,

. . . . . . . . . . . . . . . .> "IncomeData"."sal" VARCHAR,"IncomeData"."comm" VARCHAR,"IncomeData"."deptno" VARCHAR);

14 rows affected (6.235 seconds)

The DDL used to create the table is case sensitive and if HBase table name is in lowercase, you have to put the name in between double quotes. In HBase, you don’t model the possible KeyValues or the structure of the row key. This is the information you specify in Phoenix and beyond the table and column family.

--Check Table list

0: jdbc:phoenix:dn04> !tables

--Query Table

0: jdbc:phoenix:dn04> select * from "emp";

If you want to change the view from horizontal to vertical, type the following command in the shell and then try to view the data again:

0: jdbc:phoenix:dn04> !outputformat vertical

0: jdbc:phoenix:dn04> select * from "emp";

empno 7369

ename SMITH

job CLERK

mgr 7902

hiredate 17-DEC-80

sal 800

comm

deptno 20

empno 7499

ename ALLEN

job SALESMAN

mgr 7698

hiredate 20-FEB-81

sal 1600

comm 300

deptno 30

...

If you do not like this view, you can change it back to horizontal view by running the following command:

0: jdbc:phoenix:dn04> !outputformat table

Possible format values: [xmlattr, tsv, xmlelements, csv, vertical, table]

So with all existing HBase tables, you can query them with SQL now. You can point your Business Intelligence tools and Reporting Tools and other tools which work with SQL and query HBase as if it was another SQL database with the help of Phoenix.

Create Phoenix table not existing in HBase table already

0: jdbc:phoenix:dn04> create table test(id integer primary key,name varchar);

--Check using hbase shell

hbase(main):044:0> list

--Insert data into table

0: jdbc:phoenix:dn04> UPSERT INTO test VALUES(1,'Inam');

--select data

0: jdbc:phoenix:dn04> select * from test;

+-----+-------+

| ID | NAME |

+-----+-------+

| 1 | Inam |

+-----+-------+

--Where Clause

0: jdbc:phoenix:dn04> select * from "emp" where "ename"='SMITH';

+--------+--------+--------+-------+------------+------+-------+---------+

+--------+--------+--------+-------+------------+------+-------+---------+

| 7369 | SMITH | CLERK | 7902 | 17-DEC-80 | 800 | | 20 |

+--------+--------+--------+-------+------------+------+-------+---------+

1 row selected (0.067 seconds)

--Insert using Select

0: jdbc:phoenix:dn04> upsert into test select to_number("empno"),"ename" from "emp";

14 rows affected (0.065 seconds)

-- use of distinct

0: jdbc:phoenix:dn04> select distinct name from test;

-- Group and group by

0: jdbc:phoenix:dn04> select count(id) from test group by id;

Connect Phoenix via ODBC

You can connect any ODBC applications to HBase via Apache Phoenix ODBC drivers. This enables you to use familiar business intelligence tools to run analysis on big datasets. You need to add the Phoenix Query Server for this purpose.

Go to Hosts in Ambari console, in components section click on Add button and select Phoenix Query Server from the drop down list to add it. Then start this component. After this verify that you can access it. In my environment I've added it on DN04 machine.

Go to the URL (http://dn04:8765/) and you should see the below page.

If you see above page, you are done with the setup.

Now download ODBC drivers pertaining to your environment from the below location and install.

Windows
32-bit: http://public-repo-1.hortonworks.com/HDP/phoenix-odbc/1.0.0.1000/windows/HortonworksPhoenixODBC32.msi

64-bit: http://public-repo-1.hortonworks.com/HDP/phoenix-odbc/1.0.0.1000/windows/HortonworksPhoenixODBC64.msi

Next, in control panel, open ODBC Data Sources Administration(64-bit) and configure DSN for Phoenix by picking Hortonworks Phoenix ODBC Driver.

Now test with any tool of your choice, I tested with Excel. In Excel from the Menu Data toolbar, select Data menu and "From Other Sources" and choose "From Microsoft Query" , then choose DSN created earlier. Set all the properties in the dialog box and return data to Excel. Below diagram depicts all the steps.