Prerequisite:
Complete my previous post Installing/Configuring PrestoDB
Presto enables you to connect to other databases using some connector, in order to perform queries and joins over several sources providing metadata and data for queries. In this post we will work with some connectors. A coordinator (a master daemon) uses connectors to get metadata (such as table schema) that is needed to build a query plan. Workers use connectors to get actual data that will be processed by them.
Complete my previous post Installing/Configuring PrestoDB
Presto enables you to connect to other databases using some connector, in order to perform queries and joins over several sources providing metadata and data for queries. In this post we will work with some connectors. A coordinator (a master daemon) uses connectors to get metadata (such as table schema) that is needed to build a query plan. Workers use connectors to get actual data that will be processed by them.
Using mysql Connector
create,insert,update,delete,drop
Installing mysql (if not already installed)
2- Install MySQL 5.7 repo file :
3- Install MySQL 5.7 database server :
4- Start MySQL server
5- Connecting to the MySQL Server with the mysql Client
Once your MySQL server is up and running, you can connect to it as the superuser root with the mysql client.
2017-06-16T02:31:25.052590Z 1 [Note] A temporary password is generated for root@localhost: -Ekw6uk3R;Yp
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
6- Create Database and objects in MySQL Server
[presto@vbgeneric bin]$ vi $PRESTO_HOME/etc/catalog/mysql.properties
[presto@vbgeneric bin]$ cat $PRESTO_HOME/etc/catalog/mysql.properties
connector.name=mysql
connection-url=jdbc:mysql://localhost:3306
connection-user=root
connection-password=P@ssw0rd
Connect Presto CLI
Let’s connect Mysql storage plugin to Presto server using command line options. Here "iub" refers to schema in mysql server.
Installing mysql (if not already installed)
1- Download MySQL 5.7 repo file :
[root@vbgeneric bin]# wget https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm
--2017-06-15 22:20:40-- https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm
Resolving dev.mysql.com (dev.mysql.com)... 137.254.60.11
Connecting to dev.mysql.com (dev.mysql.com)|137.254.60.11|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://repo.mysql.com//mysql57-community-release-el7-8.noarch.rpm [following]
--2017-06-15 22:20:41-- https://repo.mysql.com//mysql57-community-release-el7-8.noarch.rpm
Resolving repo.mysql.com (repo.mysql.com)... 104.117.138.198
Connecting to repo.mysql.com (repo.mysql.com)|104.117.138.198|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9116 (8.9K) [application/x-redhat-package-manager]
Saving to: ‘mysql57-community-release-el7-8.noarch.rpm’
100%[===========================================================================>] 9,116 --.-K/s in 0s
2017-06-15 22:20:42 (22.7 MB/s) - ‘mysql57-community-release-el7-8.noarch.rpm’ saved [9116/9116]
[root@vbgeneric bin]# rpm -ivh mysql57-community-release-el7-8.noarch.rpm
warning: mysql57-community-release-el7-8.noarch.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:mysql57-community-release-el7-8 ################################# [100%]
[root@vbgeneric bin]# rpm -ivh mysql57-community-release-el7-8.noarch.rpm
warning: mysql57-community-release-el7-8.noarch.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
warning: mysql57-community-release-el7-8.noarch.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:mysql57-community-release-el7-8 ################################# [100%]
[root@vbgeneric bin]# yum install mysql-server -y
Loaded plugins: langpacks, ulninfo
mysql-connectors-community | 2.5 kB 00:00:00
mysql-tools-community | 2.5 kB 00:00:00
mysql57-community | 2.5 kB 00:00:00
(1/3): mysql-connectors-community/x86_64/primary_db | 14 kB 00:00:00
(2/3): mysql-tools-community/x86_64/primary_db | 33 kB 00:00:00
(3/3): mysql57-community/x86_64/primary_db | 106 kB 00:00:00
Resolving Dependencies
--> Running transaction check
---> Package mysql-community-server.x86_64 0:5.7.18-1.el7 will be installed
--> Processing Dependency: mysql-community-common(x86-64) = 5.7.18-1.el7 for package: mysql-community-server-5.7.18-1.el7.x86_64
....
....
Verifying : mysql-community-common-5.6.23-3.el7.x86_64 7/7
Installed:
mysql-community-server.x86_64 0:5.7.18-1.el7
Dependency Installed:
mysql-community-client.x86_64 0:5.7.18-1.el7 mysql-community-libs-compat.x86_64 0:5.7.18-1.el7
Dependency Updated:
mysql-community-common.x86_64 0:5.7.18-1.el7 mysql-community-libs.x86_64 0:5.7.18-1.el7
Complete!
4- Start MySQL server
[root@vbgeneric bin]# systemctl start mysqld
5- Connecting to the MySQL Server with the mysql Client
Once your MySQL server is up and running, you can connect to it as the superuser root with the mysql client.
[root@vbgeneric bin]# mysql -u root -p
Enter password:
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)
Enter password:
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)
For installations using the MySQL Yum repository, MySQL SUSE repository, or RPM packages directly downloaded from Oracle, the generated root password is in the error log. View it with, for example, the following command:
[root@vbgeneric bin]# grep 'temporary password' /var/log/mysqld.log
2017-06-16T02:31:25.052590Z 1 [Note] A temporary password is generated for root@localhost: -Ekw6uk3R;Yp
Now use the above password
[root@vbgeneric bin]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.7.18
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.7.18
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
Reset your root password with the following statement because until you reset your root password, you will not be able to exercise any of the superuser privileges, even if you are logged in as root.
mysql> ALTER USER 'root'@'localhost' IDENTIFIED BY 'yourPassword';
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
6- Create Database and objects in MySQL Server
mysql> create database iub;
mysql> use iub;
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| iub |
| mysql |
| performance_schema |
| sys |
+--------------------+
mysql> create table friend(frnd_id int not null, frnd_name varchar(50),category varchar(100));
mysql> insert into friend values(1,'Zeeshan','SQL DBA') ;
mysql> insert into friend values(2,'Saeed','Java') ;
mysql> insert into friend values(3,'Ali','Oracle DBA');
mysql> insert into friend values(4,'Tanveer','Telend');
mysql> select * from friend;
+---------+-----------+------------+
| frnd_id | frnd_name | category |
+---------+-----------+------------+
| 1 | Zeeshan | SQL DBA |
| 2 | Saeed | Java |
| 3 | Ali | Oracle DBA |
| 4 | Tanveer | Telend |
+---------+-----------+------------+
Configure mysql Connector
To configure the MySQL connector, create a catalog properties file in etc/catalog named, for example, mysql.properties, to mount the MySQL connector as the mysql catalog. Create the file with the following contents, replacing the connection properties as appropriate for your setup:[presto@vbgeneric bin]$ vi $PRESTO_HOME/etc/catalog/mysql.properties
[presto@vbgeneric bin]$ cat $PRESTO_HOME/etc/catalog/mysql.properties
connector.name=mysql
connection-url=jdbc:mysql://localhost:3306
connection-user=root
connection-password=P@ssw0rd
Connect Presto CLI
Let’s connect Mysql storage plugin to Presto server using command line options. Here "iub" refers to schema in mysql server.
[presto@vbgeneric bin]$ prestocli --server localhost:7070 --catalog mysql --schema iub
presto:iub>
List out all the schemas in mysql
presto:iub> show schemas from mysql;
Schema
--------------------
information_schema
iub
performance_schema
sys
(4 rows)
Query 20170616_041856_00003_prm89, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:02 [4 rows, 62B] [1 rows/s, 28B/s]
List Tables from Schemapresto:iub> show tables from mysql.iub;
Table
--------
friend
(1 row)
Query 20170616_042135_00004_prm89, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:02 [1 rows, 19B] [0 rows/s, 8B/s]
Describe Tablepresto:iub> describe mysql.iub.friend;
Column | Type | Extra | Comment
-----------+--------------+-------+---------
frnd_id | integer | |
frnd_name | varchar(50) | |
category | varchar(100) | |
(3 rows)
Query 20170616_042312_00006_prm89, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:03 [3 rows, 198B] [1 rows/s, 74B/s]
Show Columns from Table
presto:iub> show columns from mysql.iub.friend;
Access Table Recordspresto:iub> select * from mysql.iub.friend;
frnd_id | frnd_name | category
---------+-----------+------------
1 | Zeeshan | SQL DBA
2 | Saeed | Java
3 | Ali | Oracle DBA
4 | Tanveer | Telend
(4 rows)
Query 20170616_042606_00008_prm89, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:03 [4 rows, 0B] [1 rows/s, 0B/s]
Create Table Using as CommandMysql connector doesn’t support create table query but you can create a table using as command.
presto:iub> create table mysql.iub.frnd2 as select * from mysql.iub.friend;CREATE TABLE: 4 rows
Query 20170616_042826_00009_prm89, FINISHED, 1 node
Splits: 35 total, 35 done (100.00%)
0:04 [4 rows, 0B] [0 rows/s, 0B/s]
Using JMX Connector
The JMX connector provides the ability to query JMX information from all nodes in a Presto cluster. This is very useful for monitoring or debugging. Java Management Extensions (JMX) provides information about the Java Virtual Machine and all of the software running inside it. Presto itself is heavily instrumented via JMX.
This connector can also be configured so that chosen JMX information will be periodically dumped and stored in memory for later access.
As we have already enabled “jmx.properties” file under “etc/catalog”directory. Now connect Prest CLI to enable JMX plugin.
presto:jmx> show schemas from jmx;
Schema
--------------------
current
history
information_schema
(3 rows)
presto:jmx> show tables from jmx.current;
Table
----------------------------------------------------------------------------------------------------------
com.facebook.presto.execution.executor:name=taskexecutor
com.facebook.presto.execution.scheduler:name=nodescheduler
com.facebook.presto.execution.scheduler:name=splitschedulerstats
com.facebook.presto.execution:name=queryexecution
com.facebook.presto.execution:name=querymanager
....
presto:jmx> select * from jmx.current."com.facebook.presto.execution.executor:name=taskexecutor";
node | blockedquantawalltime.alltime.count | blockedquantawalltime.alltim
--------------------------------------+-------------------------------------+-----------------------------
ffffffff-ffff-ffff-ffff-ffffffffffff | 403.0 | 24413
(1 row)
Using HIVE Connector
The Hive connector allows querying data stored in a Hive data warehouse. The Hive metastore service (default port 9083) stores the metadata for Hive tables and partitions in a relational database, and provides clients (including Hive) access to this information using the metastore service API. Presto uses Hive metastore service to get the hive table’s details. It does not use HiveQL or any part of Hive’s execution environment.
ORC,Parquet,Avro,RCFile,SequenceFile,JSON,Text
Start Presto CLI
Start Presto CLI to connect Hive storage plugin using the following command.
Test Performance
metadata.table_propertiesThe table properties table contains the list of available properties that can be set when creating a new table.
runtime.nodesThe nodes table contains the list of visible nodes in the Presto cluster along with their status.
runtime.queriesThe queries table contains information about currently and recently running queries on the Presto cluster. From this table you can find out the original query text (SQL), the identity of the user who ran the query and performance information about the query including how long the query was queued and analyzed.
runtime.tasksThe tasks table contains information about the tasks involved in a Presto query including where they were executed and and how many rows and bytes each task processed.
runtime.transactionsThe transactions table contains the list of currently open transactions and related metadata. This includes information such as the create time, idle time, initialization parameters, and accessed catalogs.
Using Local File Connector
[hdpsysuser@vbgeneric ~]$ hive --service metastore
which: no hbase in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/hadoopsw/.local/bin:/usr/hadoopsw/bin:/usr/java/default/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/apache-hive-2.1.1-bin/bin:/usr/hadoopsw/db-derby-10.13.1.1-bin/bin:/usr/hadoopsw/apache-flume-1.7.0-bin/bin)
Starting Hive Metastore Server
Configuring Hive Connector
[presto@vbgeneric bin]$ vi $PRESTO_HOME/etc/catalog/hive.properties
[presto@vbgeneric bin]$ cat $PRESTO_HOME/etc/catalog/hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083
Check hive tables from hive prompt
I've already created hive database and tables which we will query from presto, but before that check from hive itself.
hive> use flume;
hive> show tables;
OK
raw_tweets
tweets_data2
twitterdata
vw_parsed_tweets
Time taken: 0.272 seconds, Fetched: 4 row(s)
hive> select * from raw_tweets limit 1;
OK
{"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"created_at":"Sun Jun 11 22:11:39 +0000 2017","in_reply_to_user_id_str":null,"source":"<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone<\/a>","retweeted_status":{"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"coordinates":null,"created_at":"Sun Jun 11 10:16:58 +0000 2017","truncated":false,"in_reply_to_user_id_str":null,"source":"<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone<\/a>","retweet_count":2,"retweeted":false,"geo":null,"filter_level":"low","in_reply_to_screen_name":null,"is_quote_status":false,"entities":{"urls":[],"hashtags":[{"indices":[74,95],"text":"شعب_قطر_في_قلب_سلمان"},{"indices":[96,117],"text":"_قطع_العلاقات_مع_قطر"}],"user_mentions":[],"symbols":[]},"id_str":"873846512484208640","in_reply_to_user_id":null,"favorite_count":5,"id":873846512484208640,"text":"UAE : Trouble maker \nSaudi : Support UAE but not sure \nBahrain : Retweet #شعب_قطر_في_قلب_سلمان #_قطع_العلاقات_مع_قطر","place":null,"contributors":null,"lang":"en","user":{"utc_offset":10800,"friends_count":306,"profile_image_url_https":"https://pbs.twimg.com/profile_images/615523415617712128/DMuJuuUu_normal.jpg","listed_count":0,"profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","default_profile_image":false,"favourites_count":483,"description":"ربِ بي منك فضلٌ ونعمه عطاً غير مجذوذ ولا منقوص","created_at":"Sun Jul 29 17:41:11 +0000 2012","is_translator":false,"profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","protected":false,"screen_name":"im_aisha92","id_str":"724360172","profile_link_color":"1DA1F2","id":724360172,"geo_enabled":true,"profile_background_color":"C0DEED","lang":"ar","profile_sidebar_border_color":"C0DEED","profile_text_color":"333333","verified":false,"profile_image_url":"http://pbs.twimg.com/profile_images/615523415617712128/DMuJuuUu_normal.jpg","time_zone":"Kuwait","url":null,"contributors_enabled":false,"profile_background_tile":false,"profile_banner_url":"https://pbs.twimg.com/profile_banners/724360172/1446770569","statuses_count":2895,"follow_request_sent":null,"followers_count":187,"profile_use_background_image":true,"default_profile":true,"following":null,"name":"Alsulaiti Aisha","location":"Cardiff-UK","profile_sidebar_fill_color":"DDEEF6","notifications":null},"favorited":false},"retweet_count":0,"retweeted":false,"geo":null,"filter_level":"low","in_reply_to_screen_name":null,"is_quote_status":false,"id_str":"874026367360409600","in_reply_to_user_id":null,"favorite_count":0,"id":874026367360409600,"text":"RT @im_aisha92: UAE : Trouble maker \nSaudi : Support UAE but not sure \nBahrain : Retweet #شعب_قطر_في_قلب_سلمان #_قطع_العلاقات_مع_قطر","place":null,"lang":"en","favorited":false,"coordinates":null,"truncated":false,"timestamp_ms":"1497219099260","entities":{"urls":[],"hashtags":[{"indices":[90,111],"text":"شعب_قطر_في_قلب_سلمان"},{"indices":[112,133],"text":"_قطع_العلاقات_مع_قطر"}],"user_mentions":[{"indices":[3,14],"screen_name":"im_aisha92","id_str":"724360172","name":"Alsulaiti Aisha","id":724360172}],"symbols":[]},"contributors":null,"user":{"utc_offset":null,"friends_count":8,"profile_image_url_https":"https://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png","listed_count":0,"profile_background_image_url":"","default_profile_image":false,"favourites_count":0,"description":null,"created_at":"Wed Jun 07 05:44:14 +0000 2017","is_translator":false,"profile_background_image_url_https":"","protected":false,"screen_name":"Reem01020398","id_str":"872328324122644482","profile_link_color":"1DA1F2","id":872328324122644482,"geo_enabled":false,"profile_background_color":"F5F8FA","lang":"en","profile_sidebar_border_color":"C0DEED","profile_text_color":"333333","verified":false,"profile_image_url":"http://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png","time_zone":null,"url":null,"contributors_enabled":false,"profile_background_tile":false,"statuses_count":65,"follow_request_sent":null,"followers_count":1,"profile_use_background_image":true,"default_profile":true,"following":null,"name":"Reem","location":null,"profile_sidebar_fill_color":"DDEEF6","notifications":null}}
Time taken: 11.043 seconds, Fetched: 1 row(s)
hive>
Start Presto CLI
Start Presto CLI to connect Hive storage plugin using the following command.
[presto@vbgeneric bin]$ prestocli --server localhost:7070 --catalog hive --schema flume
presto:flume> show schemas from hive;
Schema
--------------------
default
flume
information_schema
(3 rows)
presto:flume> show tables from hive.flume;
Table
------------------
raw_tweets
tweets_data2
twitterdata
vw_parsed_tweets
(4 rows)
presto:flume> select * from hive.flume.raw_tweets;
Query 20170616_051901_00005_aceay, RUNNING, 1 node, 16 splits
0:19 [ 0 rows, 0B] [ 0 rows/s, 0B/s] [ <=> ]
STAGES ROWS ROWS/s BYTES BYTES/s QUEUED RUN DONE
0.........R 0 0 0B 0B 0 0 0
1.......S 0 0 0B 0B 0 0 0
----------------------------------------------------------------------------------------------------------
{"quoted_status":{"extended_tweet":{"extended_entities":{"media":[{"display_url":"pic.twitter.com/iG9xY6R
{"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"created_at":"Sun Jun 11 22:11:50 +0000 20
{"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"created_at":"Sun Jun 11 22:11:50 +0000 20
{"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"created_at":"Sun Jun 11 22:11:50 +0000 20
{"quoted_status":{"extended_tweet":{"entities":{"urls":[{"display_url":"twitter.com/ajenglish/stat\u2026"
{"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"created_at":"Sun Jun 11 22:11:51 +0000 20
...
..
Run the below query both in hive and presto and observe the response time. I'm running Oracle VM with one node cluster for this test. Both hive and presto are running on the same VM.
presto:flume> select count(*) from raw_tweets;
_col0
-------
428
(1 row)
Query 20170616_101712_00002_g7yqb, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:20 [428 rows, 2.4MB] [21 rows/s, 125KB/s]
Using System Connector
The System connector provides information and metrics about the currently running Presto cluster. It makes this available via normal SQL queries. The System connector doesn’t need to be configured: it is automatically available via a catalog named system.
List the available system schemas:
presto:flume> SHOW SCHEMAS FROM system;
Schema
--------------------
information_schema
jdbc
metadata
runtime
(4 rows)
List the tables in one of the schemas:
presto:flume> SHOW TABLES FROM system.runtime;
Table
--------------
nodes
queries
tasks
transactions
(4 rows)
Query one of the tables:
presto:flume> SELECT * FROM system.runtime.nodes;
node_id | http_uri | node_version | coordinator | state
--------------------------------------+----------------------------+--------------+-------------+--------
ffffffff-ffff-ffff-ffff-ffffffffffff | http://192.168.56.101:7070 | 0.179 | true | active
(1 row)
Kill a running query:
presto:flume> CALL system.runtime.kill_query('20151207_215727_00146_tx3nr');
CALL
System Connector Tables
System Connector Tables
metadata.catalogsThe catalogs table contains the list of available catalogs.
metadata.schema_propertiesThe schema properties table contains the list of available properties that can be set when creating a new schema.
metadata.table_propertiesThe table properties table contains the list of available properties that can be set when creating a new table.
runtime.nodesThe nodes table contains the list of visible nodes in the Presto cluster along with their status.
runtime.queriesThe queries table contains information about currently and recently running queries on the Presto cluster. From this table you can find out the original query text (SQL), the identity of the user who ran the query and performance information about the query including how long the query was queued and analyzed.
runtime.tasksThe tasks table contains information about the tasks involved in a Presto query including where they were executed and and how many rows and bytes each task processed.
runtime.transactionsThe transactions table contains the list of currently open transactions and related metadata. This includes information such as the create time, idle time, initialization parameters, and accessed catalogs.
Using Local File Connector
The local file connector allows querying data stored on the local file system of each worker. To configure the local file connector, create a catalog properties file under etc/catalog named, for example, localfile.properties with the following contents:
[presto@vbgeneric ~]$ mkdir /data/presto_data/localfiles
[presto@vbgeneric ~]$ vi $PRESTO_HOME/etc/catalog/localfile.properties
[presto@vbgeneric ~]$ cat $PRESTO_HOME/etc/catalog/localfile.properties
connector.name=localfile
presto-logs.http-request-log.location=/data/presto_data/var/log/http-request.log
The local file connector provides a single schema named logs and this connector can only process the http log format created by Presto itself.
[presto@vbgeneric ~]$ prestocli --server localhost:7070 --catalog localfile --schema logs
presto:logs> show tables;
Table
------------------
http_request_log
(1 row)
presto:logs> describe http_request_log;
Column | Type | Extra | Comment
-------------------+-----------+-------+---------
server_address | varchar | |
timestamp | timestamp | |
client_address | varchar | |
method | varchar | |
request_uri | varchar | |
user | varchar | |
agent | varchar | |
response_code | bigint | |
request_size | bigint | |
response_size | bigint | |
time_to_last_byte | bigint | |
trace_token | varchar | |
(12 rows)
presto:logs> select method,user,agent from http_request_log;
method | user | agent
--------+------+-----------------------------------------------------------------------------------------------------
GET | NULL | ffffffff-ffff-ffff-ffff-ffffffffffff
GET | NULL | ffffffff-ffff-ffff-ffff-ffffffffffff
POST | NULL | NULL
GET | NULL | NULL
POST | NULL | NULL
POST | NULL | NULL
POST | NULL | NULL
GET | NULL | NULL
POST | NULL | NULL
GET | NULL | ffffffff-ffff-ffff-ffff-ffffffffffff
GET | NULL | ffffffff-ffff-ffff-ffff-ffffffffffff
POST | NULL | NULL
GET | NULL | NULL
POST | NULL | NULL
POST | NULL | NULL
POST | NULL | NULL
GET | NULL | NULL
GET | NULL | Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safar
...
...
Using Hive and mysql as combined data source
Prepare data in mysql
Create table in mysql database eg; iub.dept
mysql> CREATE TABLE iub.dept( deptno INT, dname VARCHAR(10), loc varchar(10));
load data into mysql table
insert into iub.dept values (10,'ACCOUNTING','NEW YORK');
insert into iub.dept values (20,'RESEARCH','DALLAS');
insert into iub.dept values (30,'SALES','CHICAGO');
insert into iub.dept values (40,'OPERATIONS','BOSTON');
mysql> select * from iub.dept;
+--------+------------+----------+
| deptno | dname | loc |
+--------+------------+----------+
| 10 | ACCOUNTING | NEW YORK |
| 20 | RESEARCH | DALLAS |
| 30 | SALES | CHICAGO |
| 40 | OPERATIONS | BOSTON |
+--------+------------+----------+
4 rows in set (0.00 sec)
Prepare data in hive
Create emp table in hive database eg; scott.emp
create table scott.emp( empno STRING, ename STRING, job STRING,
mgr STRING, hiredate STRING, sal STRING,
comm STRING, deptno STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
load data into hive table
emp.csv
7369,SMITH,CLERK,7902,17-DEC-80,800,,20
7499,ALLEN,SALESMAN,7698,20-FEB-81,1600,300,30
7521,WARD,SALESMAN,7698,22-FEB-81,1250,500,30
7566,JONES,MANAGER,7839,02-APR-81,2975,,20
7654,MARTIN,SALESMAN,7698,28-SEP-81,1250,1400,30
7698,BLAKE,MANAGER,7839,01-MAY-81,2850,,30
7782,CLARK,MANAGER,7839,09-JUN-81,2450,,10
7788,SCOTT,ANALYST,7566,19-APR-87,3000,,20
7839,KING,PRESIDENT,,17-NOV-81,5000,,10
7844,TURNER,SALESMAN,7698,08-SEP-81,1500,0,30
7876,ADAMS,CLERK,7788,23-MAY-87,1100,,20
7900,JAMES,CLERK,7698,03-DEC-81,950,,30
7902,FORD,ANALYST,7566,03-DEC-81,3000,,20
7934,MILLER,CLERK,7782,23-JAN-82,1300,,10
7499,ALLEN,SALESMAN,7698,20-FEB-81,1600,300,30
7521,WARD,SALESMAN,7698,22-FEB-81,1250,500,30
7566,JONES,MANAGER,7839,02-APR-81,2975,,20
7654,MARTIN,SALESMAN,7698,28-SEP-81,1250,1400,30
7698,BLAKE,MANAGER,7839,01-MAY-81,2850,,30
7782,CLARK,MANAGER,7839,09-JUN-81,2450,,10
7788,SCOTT,ANALYST,7566,19-APR-87,3000,,20
7839,KING,PRESIDENT,,17-NOV-81,5000,,10
7844,TURNER,SALESMAN,7698,08-SEP-81,1500,0,30
7876,ADAMS,CLERK,7788,23-MAY-87,1100,,20
7900,JAMES,CLERK,7698,03-DEC-81,950,,30
7902,FORD,ANALYST,7566,03-DEC-81,3000,,20
7934,MILLER,CLERK,7782,23-JAN-82,1300,,10
LOAD DATA LOCAL INPATH '/tmp/mydata/emp.csv' OVERWRITE INTO TABLE scott.emp;
Process Data
As we have data in both databases, we are ready to process data through hive and mysql catalogs which have been defined earlier.
With Presto, to specify a table that you wish to process, we use the catalog.database.table naming convention e.g. mysql.iub.dept or hive.scott.emp.
Connect with presto CLI
[presto@vbgeneric ~]$ prestocli --server localhost:7070
presto>
presto> SHOW TABLES FROM mysql.iub;
Table
--------
dept
friend
frnd2
(3 rows)
presto> SHOW TABLES FROM hive.scott;
Table
-------
emp
(1 row)
Run the join query now
presto> SELECT d.dname,e.ename from mysql.iub.dept d JOIN hive.scott.emp e ON d.deptno = e.deptno;
Query 20170616_143710_00007_mgvsh failed: line 1:80: '=' cannot be applied to integer, varchar
SELECT d.dname,e.ename from mysql.iub.dept d JOIN hive.scott.emp e ON d.deptno = e.deptno
Query failed because in hive emp table has deptno column as STRING while in mysql deptno column is defined as INT. So we cast in presto and run the query again.
presto> SELECT d.dname,e.ename from mysql.iub.dept d JOIN hive.scott.emp e ON d.deptno = cast(e.deptno as integer)
-> ;
dname | ename
------------+--------
ACCOUNTING | MILLER
ACCOUNTING | KING
ACCOUNTING | CLARK
RESEARCH | FORD
RESEARCH | ADAMS
RESEARCH | SCOTT
RESEARCH | JONES
RESEARCH | SMITH
SALES | JAMES
SALES | TURNER
SALES | BLAKE
SALES | MARTIN
SALES | WARD
SALES | ALLEN
(14 rows)
Query 20170616_144316_00008_mgvsh, FINISHED, 1 node
Splits: 66 total, 66 done (100.00%)
0:05 [18 rows, 617B] [3 rows/s, 132B/s]
Run another query with summary function
select dname,count(*) number_of_employees
from hive.scott.emp e join mysql.iub.dept d
on cast(e.deptno as integer) = d.deptno
group by dname
order by dname
presto> select dname,count(*) number_of_employees
-> from hive.scott.emp e join mysql.iub.dept d
-> on cast(e.deptno as integer) = d.deptno
-> group by dname
-> order by dname;
dname | number_of_employees
------------+---------------------
ACCOUNTING | 3
RESEARCH | 5
SALES | 6
(3 rows)
Query 20170616_145245_00014_mgvsh, FINISHED, 1 node
Splits: 99 total, 99 done (100.00%)
0:05 [18 rows, 617B] [3 rows/s, 133B/s]
Hive, or between two different PostgreSQL instances.
Check PostgreSQL
I've Ambari setup containing postgreSQL, first check that PostgreSQL is running fine and play with some commands.
1- Connct with PostgreSQL owner and connect to its using psql utility
[root@te1-hdp-rp-en01 ~]# su - postgres
-bash-4.2$ psql
2- List the databases
postgres=# \list
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
ambari | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres +
| | | | | postgres=CTc/postgres+
| | | | | ambari=CTc/postgres
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
3- Quit from psql
ctrl+d
postgres=# \q
4-
postgres=# SELECT version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.2.13 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
(1 row)
5- List of existing users in DB
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------+-----------
ambari | | {}
postgres | Superuser, Create role, Create DB, Replication | {}
postgres=# SELECT usename FROM pg_user;
usename
----------
postgres
ambari
6- Connect with specific user , ambari user password is bigdata
-bash-4.2$ psql -U ambari
7- Downloaded pgadmin3 to connect with ambari database for Ambari UI on EN01 node if required.
8- Few more tests
ambari=> CREATE SCHEMA test;
ambari=> CREATE USER xxx PASSWORD 'yyy'
postgres=# CREATE USER xxx PASSWORD 'yyy';
ambari=> GRANT ALL ON SCHEMA test TO xxx;
ambari=> GRANT ALL ON ALL TABLES IN SCHEMA test TO xxx;
ambari=> CREATE TABLE test.test (coltest varchar(20));
ambari=> insert into test.test (coltest) values ('It works!');
ambari=> SELECT * from test.test;
Configuration
1- create a catalog properties (postgresql.properties) file in etc/catalog named, on Presto Coordinator's (EN01) location /root/.prestoadmin/catalog
connector.name=postgresql
connection-url=jdbc:postgresql://en01:5432/ambari
connection-user=ambari
connection-password=bigdata
2- Check the status and restart to take effect for new config done above
[root@te1-hdp-rp-en01 ~]# su - presto
[presto@te1-hdp-rp-en01 ~]$ sudo /usr/hadoopsw/prestoadmin/presto-admin server status -p presto
[presto@te1-hdp-rp-en01 ~]$ sudo /usr/hadoopsw/prestoadmin/presto-admin topology show
[presto@te1-hdp-rp-en01 ~]$ sudo /usr/hadoopsw/prestoadmin/presto-admin catalog add postgresql -p presto
[presto@te1-hdp-rp-en01 ~]$ sudo /usr/hadoopsw/prestoadmin/presto-admin server restart -p presto
3- Connect with particular schema of particular catalog
[presto@te1-hdp-rp-en01 ~]$ /usr/hadoopsw/prestoadmin/prestocli --server te1-hdp-rp-en01:6060 --catalog postgresql --schema test
presto:test> select count(*) from postgresql.test.test;
4- Connect with presto server and query any catalog
[presto@te1-hdp-rp-en01 ~]$ /usr/hadoopsw/prestoadmin/prestocli --server te1-hdp-rp-en01:6060
presto> select count(*) from postgresql.test.test;
-- Query 2 catalogs at the same time
presto> select count(*) from postgresql.test.test,hive.flume.presto_vw_syslogs;
Check PostgreSQL
I've Ambari setup containing postgreSQL, first check that PostgreSQL is running fine and play with some commands.
1- Connct with PostgreSQL owner and connect to its using psql utility
[root@te1-hdp-rp-en01 ~]# su - postgres
-bash-4.2$ psql
postgres=# ALTER USER postgres WITH PASSWORD 'postgres';
postgres=# \list
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
ambari | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres +
| | | | | postgres=CTc/postgres+
| | | | | ambari=CTc/postgres
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
3- Quit from psql
ctrl+d
postgres=# \q
4-
postgres=# SELECT version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.2.13 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
(1 row)
5- List of existing users in DB
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------+-----------
ambari | | {}
postgres | Superuser, Create role, Create DB, Replication | {}
postgres=# SELECT usename FROM pg_user;
usename
----------
postgres
ambari
6- Connect with specific user , ambari user password is bigdata
-bash-4.2$ psql -U ambari
7- Downloaded pgadmin3 to connect with ambari database for Ambari UI on EN01 node if required.
8- Few more tests
ambari=> CREATE SCHEMA test;
ambari=> CREATE USER xxx PASSWORD 'yyy'
postgres=# CREATE USER xxx PASSWORD 'yyy';
ambari=> GRANT ALL ON SCHEMA test TO xxx;
ambari=> GRANT ALL ON ALL TABLES IN SCHEMA test TO xxx;
ambari=> CREATE TABLE test.test (coltest varchar(20));
ambari=> insert into test.test (coltest) values ('It works!');
ambari=> SELECT * from test.test;
Configuration
1- create a catalog properties (postgresql.properties) file in etc/catalog named, on Presto Coordinator's (EN01) location /root/.prestoadmin/catalog
connector.name=postgresql
connection-url=jdbc:postgresql://en01:5432/ambari
connection-user=ambari
connection-password=bigdata
2- Check the status and restart to take effect for new config done above
[root@te1-hdp-rp-en01 ~]# su - presto
[presto@te1-hdp-rp-en01 ~]$ sudo /usr/hadoopsw/prestoadmin/presto-admin server status -p presto
[presto@te1-hdp-rp-en01 ~]$ sudo /usr/hadoopsw/prestoadmin/presto-admin topology show
[presto@te1-hdp-rp-en01 ~]$ sudo /usr/hadoopsw/prestoadmin/presto-admin catalog add postgresql -p presto
[presto@te1-hdp-rp-en01 ~]$ sudo /usr/hadoopsw/prestoadmin/presto-admin server restart -p presto
3- Connect with particular schema of particular catalog
[presto@te1-hdp-rp-en01 ~]$ /usr/hadoopsw/prestoadmin/prestocli --server te1-hdp-rp-en01:6060 --catalog postgresql --schema test
presto:test> select count(*) from postgresql.test.test;
4- Connect with presto server and query any catalog
[presto@te1-hdp-rp-en01 ~]$ /usr/hadoopsw/prestoadmin/prestocli --server te1-hdp-rp-en01:6060
presto> select count(*) from postgresql.test.test;
-- Query 2 catalogs at the same time
presto> select count(*) from postgresql.test.test,hive.flume.presto_vw_syslogs;
No comments:
Post a Comment