Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Monday, May 04, 2020

Connect to Presto from Spark

If you have Presto cluster as your processing layer, you could connect to it from Spark using Scala.


1- Copy the presto driver to the spark master location eg; /opt/progs/spark-2.4.5-bin-hadoop2.7/jars

Kudu Integration with Spark


Kudu integrates with Spark through the Data Source API, I downloaded the jar files from below locaton

https://jar-download.com/artifacts/org.apache.kudu/kudu-spark2_2.11/1.10.0/source-code
you can place the jar files in $SPARK_HOME/jars (eg; /opt/progs/spark-2.4.5-bin-hadoop2.7/jars)if you dont want to use --jars option with spark shell

Sunday, May 03, 2020

Working with Ignite [In-Memory Data Grid]



Introduction                                                                        

Apache Ignite is an open source In-Memory Data Grid (IMDG), distributed database, caching and high performance computing platform. It offers a bucketload of features and integrates well with other Apache frameworks such as Hadoop, Spark, and Cassandra.  We need it for its High Performance and Scalability. It keeps data in RAM for fast processing and linear scaling. If you add more workstations to the grid, it will offer higher scalability and performance gains.

Working with Apache Kudu


Introduction                                                                        


Kudu is a columnar storage manager developed for the Apache Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.

Unlike other storage for big data analytics, Kudu isn't just a file format. It's a live storage system which supports low-latency millisecond-scale access to individual rows. Kudu isn't designed to be an OLTP system, but Fast processing of OLAP workloads.