If you have Presto cluster as your processing layer, you could connect to it from Spark using Scala.
1- Copy the presto driver to the spark master location eg; /opt/progs/spark-2.4.5-bin-hadoop2.7/jars
2- run the spark shell and connect using scala
[solr@te1-hdp-rp-nn01 ~]$ spark-shell
--######### READING FROM PRESTO ####################
//import org.apache.spark.sql.SQLContext;
//import org.apache.spark.sql.SparkSession;
//import org.apache.spark.SparkContext;
//val spark = SparkSession.builder.master("local").appName("Read From Presto").getOrCreate(); //spark session
//sc.stop(); //stop existing spark context
//val sc = new SparkContext(); //create your own spark context
val JDBC_DRIVER = "com.facebook.presto.jdbc.PrestoDriver";
val DB_URL = "jdbc:presto://x.x.44.135:6060/kudu/default";
//set the jdbc options
val jdbcOptions = spark.read.format("jdbc");
jdbcOptions.option("driver",JDBC_DRIVER);
jdbcOptions.option("url",DB_URL);
jdbcOptions.option("user", "presto326");
//jdbcOptions.option("dbtable", "default.syslog");
//load data to dataframe using jdbc options
jdbcOptions.option("query","SELECT * FROM default.syslog limit 15"); //pushdown to presto
val df = jdbcOptions.load(); //now sent to Presto
df.show();
df.registerTempTable("mysyslog");//local table to spark
df.printSchema();
//sqlContext.sql("select * from mysyslog limit 5").show(); //local query to spark
spark.sql("select * from mysyslog limit 5").show(); //local query to spark
Notes:
1- SparkContext
- Main entry point for Spark functionality.
- A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.
- Only one SparkContext may be active per JVM.
2- SparkSession
- The entry point to programming Spark with the Dataset and DataFrame API.
1- Copy the presto driver to the spark master location eg; /opt/progs/spark-2.4.5-bin-hadoop2.7/jars
2- run the spark shell and connect using scala
[solr@te1-hdp-rp-nn01 ~]$ spark-shell
--######### READING FROM PRESTO ####################
//import org.apache.spark.sql.SQLContext;
//import org.apache.spark.sql.SparkSession;
//import org.apache.spark.SparkContext;
//val spark = SparkSession.builder.master("local").appName("Read From Presto").getOrCreate(); //spark session
//sc.stop(); //stop existing spark context
//val sc = new SparkContext(); //create your own spark context
val JDBC_DRIVER = "com.facebook.presto.jdbc.PrestoDriver";
val DB_URL = "jdbc:presto://x.x.44.135:6060/kudu/default";
//set the jdbc options
val jdbcOptions = spark.read.format("jdbc");
jdbcOptions.option("driver",JDBC_DRIVER);
jdbcOptions.option("url",DB_URL);
jdbcOptions.option("user", "presto326");
//jdbcOptions.option("dbtable", "default.syslog");
//load data to dataframe using jdbc options
jdbcOptions.option("query","SELECT * FROM default.syslog limit 15"); //pushdown to presto
val df = jdbcOptions.load(); //now sent to Presto
df.show();
df.registerTempTable("mysyslog");//local table to spark
df.printSchema();
//sqlContext.sql("select * from mysyslog limit 5").show(); //local query to spark
spark.sql("select * from mysyslog limit 5").show(); //local query to spark
Notes:
1- SparkContext
- Main entry point for Spark functionality.
- A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.
- Only one SparkContext may be active per JVM.
2- SparkSession
- The entry point to programming Spark with the Dataset and DataFrame API.
2 comments:
Thank you for sharing such an intresting and unique content blog.
keep writing more blogs......
big data hadoop online training
I think its ok if u r importing data from oracle/small amount of data.
but get data from oracle using huge amount of data few extra options also available like lowerbound, upperbound etc, pls explain those.
Thanks in advance
Docker and Kubernetes is next generation platform
Apache spark now a days hot cake and huge demand in the market. In future combination of these two create wonders in software industry.
Thanks to share ur knowledge
Regards
Venu
bigdata training institute in Hyderabad
spark training in Hyderabad
Post a Comment