Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Sunday, May 03, 2020

Working with Ignite [In-Memory Data Grid]



Introduction                                                                        

Apache Ignite is an open source In-Memory Data Grid (IMDG), distributed database, caching and high performance computing platform. It offers a bucketload of features and integrates well with other Apache frameworks such as Hadoop, Spark, and Cassandra.  We need it for its High Performance and Scalability. It keeps data in RAM for fast processing and linear scaling. If you add more workstations to the grid, it will offer higher scalability and performance gains.


NoSQL databases were introduced to mitigate RDBMS scalability issues. There are four types of NoSQL databases, used to handle different use cases, but still, a NoSQL database cannot help us to scale our system to handle real high volume transactional data. Apache Ignite offers caching APIs to process a high volume of ACID-compliant transactional data. Doesn't offer transactional consistency, relational SQL joins but scales many times faster than the RDBMs.

Ignite offers an ANSI SQL query API to query data, an API to perform CRUD on caches, ACID transactions, a compute and service grid, streams, and complex event processing to Machine Learning APIs. NewSQL is a new type of databases offer the ACID complaint distributed transaction that can scale. Apache Ignite can be termed as a NewSQL db.

In-Memory Data Grid (IMDG)

You can consider IMDG as a distributed Key-Value pair store; the key and value both must implement the serializable interface as they get transferred over the network. Apache Ignite stores objects in off-heap and on-heap memory (and on disk when native persistence is enabled). Apache Ignite's data grid operations, such as Create, Read, Update, and Delete (CRUD), are many times faster than RDBMs operations as the traditional databases store data in a filesystem (B+ tree), whereas IMDG data is stored in memory.

In-Memory SQL Grid (IMSG)

Apache Ignite SQL Grid is a distributed data grid where you can execute ANSI SQL-99-compliant SQLs (SELECT, UPDATE, INSERT, MERGE, and DELETE queries) to manipulate a cache.

Compute Grid

Apache Ignite Compute Grid is a distributed in-memory MapReduce/ForkJoin or Splitter-Aggregator platform. It enables the parallel processing of data to reduce the overall processing time.

Service Grid

What if we get the ability to deploy our service to a MySql/MS SQL or Oracle database? The service will collocate with the data and process DB-related computational requests way faster than the traditional deployment model. Service grid is a nice concept where you can deploy a service to an Apache Ignite cluster. 

Streaming and Complex Event Processing

Complex event processing enables real-time analytics on transactional event streams. It intercepts different events, then computes or detects patterns, and finally takes action or provides business insights. Apache Ignite has the capability to stream events from disparate sources and then perform complex event processing.

Ignite File System (IGFS)

Apache Ignite has an in-memory distributed filesystem interface to work with files in memory. IGFS is the acronym of Ignite distributed file system. IGFS accelerates Hadoop processing by keeping the files in memory and minimizing disk IO.

Clustering

Apache Ignite can automatically detect when a new node is added to the cluster, and similarly can detect when a node is stopped or crashed, transparently redistributing the data. This enables you to scale your system as you add more nodes. The coolest feature of this sophisticated clustering is that it can connect a private cloud's Ignite node to a public cloud's domain cluster, such as AWS.

Messaging

Messaging is a communication protocol to decouple senders from receivers. Apache Ignite supports various models of data exchange between nodes.

Distributed data structures

Apache Ignite allows you to create distributed data structures and share them between the nodes. One really useful data structure is the ID generator. In many applications, ID generation is handled using a UUID or custom stored procedure logic, or by configuring tables to generate seq ids. A distributed ID generator residing in an in-memory grid is orders of magnitude faster than traditional ID generators. 


Setting up Ignite (2.7.6)                                                         

1-  Download Ignite binary from Ignite website https://ignite.apache.org/download.cgi https://apache.bintray.com/ignite-rpm/
- Requires JDK 8


export IGNITE_HOME=/opt/apache-ignite-2.7.6-bin
export JAVA_HOME=/usr/java/jdk1.8.0_121
export PATH=$PATH:$IGNITE_HOME/bin:$JAVA_HOME/bin

Run the Ignite 

ignite.sh $IGNITE_HOME/config/default-config.xml

at this point your ignite cluster is running

2- Configuration
- all configuration parameters are defined in an instance of the IgniteConfiguration class
- set the parameters either programmatically or via an XML configuration file
- XML configuration file is a Spring Bean definition file that must contain the IgniteConfiguration bean
- When starting a node from the command line, pass the configuration file as a parameter to the ignite.sh
- If you don’t specify a configuration file, the default file {IGNITE_HOME}/config/default-config.xml will be used

3- Memory Configuration
- By default, Ignite nodes consume up to 20% of the RAM available locally, and in most cases, ​this is the only parameter you might need to change.

4- configure any client tool (Squirrel)

You can configure any client tool to query ignite, below is the necessary information to configure it.

- Driver location: $IGNITE_HOME/libs/ignite-core-2.7.6.jar
- Put the driver in C:\Program Files\squirrel-sql-3.8.1\lib
- Register the driver using Squirrel "Add Drivers" tab
- Driver Name: Apache Ignite Driver
- Class Name:  org.apache.ignite.IgniteJdbcThinDriver
-   Example URL: jdbc:ignite:thin://x.x.44.133/
-   Default Port - port 10800 is used by Ignite's JDBC driver by default '
-   Save preferences
- Go to "Alias" tab and add it
- Name:   IgniteConn
- Driver: Apache Ignite Driver
- URL:    jdbc:ignite:thin://x.x.44.133/
- Connect with Ignite now 

5- SQL Usage
- a number of default schemas
-  IGNITE schema, contains a number of system views with information about cluster nodes
-  PUBLIC schema, used by default whenever schema is not specified
- Custom Schemas
- Custom Schemas can be set via the sqlSchemas property of IgniteConfiguration. You can specify a list of schemas in the configuration before starting your cluster and then create objects in these schemas at runtime.
- /opt/apache-ignite-2.7.6-bin/config/default-config.xml

<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="
        http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans.xsd">

    <bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="sqlSchemas">
<list>
<value>REALTIMEDB</value>
<value>STAGEDB</value>
</list>
</property>    
    </bean>
</beans>

To connect to a specific schema via, for example, a JDBC driver, provide the schema name in the connection string:    jdbc:ignite:thin://x.x.44.133/realtimedb

5.1 Creating Tables

CREATE TABLE log_in_ignite(id double PRIMARY KEY,logline varchar ) WITH "template=replicated";

CREATE TABLE realtimedb.City ( --  new distributed cache is created 
id LONG PRIMARY KEY, name VARCHAR)
WITH "template=replicated"; -- Distributed cache related parameters are passed in the WITH clause of the statement. 

CREATE TABLE realtimedb.Person (
id LONG, name VARCHAR, city_id LONG, PRIMARY KEY (id, city_id))
WITH "backups=1, affinityKey=city_id";  -- with 1 backup of data and city_id as the affinity key


- it is beneficial to collocate different entries if they will be accessed together
- we can ensure that all the entries with the same affinityKey will be stored on the same processing node

- Creating Indexes
CREATE INDEX idx_city_name ON City (name);
CREATE INDEX idx_person_name ON Person (name);
- Inserting Data
INSERT INTO City (id, name) VALUES (1, 'Forest Hill');
INSERT INTO City (id, name) VALUES (2, 'Denver');
INSERT INTO City (id, name) VALUES (3, 'St. Petersburg');
INSERT INTO Person (id, name, city_id) VALUES (1, 'John Doe', 3);
INSERT INTO Person (id, name, city_id) VALUES (2, 'Jane Roe', 2);
INSERT INTO Person (id, name, city_id) VALUES (3, 'Mary Major', 1);
INSERT INTO Person (id, name, city_id) VALUES (4, 'Richard Miles', 2);
- Querying Data

SELECT * FROM City;
SELECT name FROM City WHERE id = 1;
SELECT p.name, c.name FROM Person p, City c WHERE p.city_id = c.id;

- Modifying Data

UPDATE City SET name = 'Foster City' WHERE id = 2;
- Removing Data

DELETE FROM Person WHERE name = 'John Doe';

- MVCC (Multiversion Concurrency Control) must be enabled in order to start transaction
5.2- SQLLine

- Command line tool for SQL connectivity in Ignite.
- Connecting to Ignite Cluster
      - /usr/share/apache-ignite/bin/sqlline.sh --verbose=true -u jdbc:ignite:thin://x.x.44.133/


Clustering                                                                     

- Ignite nodes can automatically discover each other
- TCP/IP Discovery is designed and optimized for 100s of nodes deployments, ZooKeeper Discovery 100s and 1000s of nodes
- TcpDiscoverySpi as a default implementation of DiscoverySpi
- TcpDiscoveryMulticastIpFinder uses Multicast to discover other nodes in the grid and is the default IP finder.
- configure this finder via a Spring XML file 
<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="
        http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans.xsd">

    <bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="sqlSchemas">
<list>
<value>REALTIMEDB</value>
<value>STAGEDB</value>
</list>
</property>    
<!--
    Our Cluster Configuration
-->

   <property name="discoverySpi">
      <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
         <property name="ipFinder">
            <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
             <property name="addresses">
               <list>
      <!-- In distributed environment, replace with actual host IP address. -->
                <value>x.x.44.133:47500..47509</value>
                <value>x.x.44.134:47500..47509</value>
                <value>x.x.44.135:47500..47509</value>
               </list>
             </property>
            </bean>
         </property>
     </bean>
   </property>



    </bean>
</beans>