Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Tuesday, April 24, 2018

Configuring Logstash with Elasticsearch


Logstash is an open source data collection engine with real-time pipelining capabilities. Logstash can dynamically unify data from disparate sources and normalize the data into destinations of your choice. Cleanse and democratize all your data for diverse advanced downstream analytics and visualization use cases. You can clean and transform your data during ingestion to gain near real-time insights immediately at index or output time. Logstash comes out-of-box with many aggregations and mutations along with pattern matching, geo mapping, and dynamic lookup capabilities.

Monday, April 23, 2018

Integrating Hadoop and Elasticsearch


Hadoop shines as a batch processing system, but serving real-time results can be challenging. For truly interactive data discovery, ES-Hadoop (The Elasticsearch-Hadoop connector) lets you index Hadoop data into the Elastic Stack to take full advantage of the speedy Elasticsearch engine and beautiful Kibana visualizations.

Thursday, April 19, 2018

Working with Elasticsearch


Elasticsearch is a  distributed, scalable, real-time search and analytics engine built on top of Apache Lucene™,. Lucene ( a library) is arguably the most advanced, high-performance, and fully featured search engine library in existence today—both open source and proprietary. It enables you to search, analyze, and explore your data whether you need full-text search, real-time analytics of structured data, or a combination of the two. 

Tuesday, April 17, 2018

Configure Rsyslog with Any Log File

Modern linux distros ship with Rsyslog which has some nice additional functionality (imfile module) that provides the ability to convert any standard text file into a Syslog message.

Tuesday, March 20, 2018

HDFS Centralized Cache Management

Due to increasing memory capacity, many interesting working sets are able to fit in aggregate cluster memory. By using HDFS centralized cache management, applications can take advantage of the performance benefits of in-memory computation. Cluster cache state is aggregated and controlled by the NameNode, allowing applications schedulers to place their tasks for cache locality. 

Configuring ACLs on HDFS

ACLs extend the HDFS permission model to support more granular file access based on arbitrary combinations of users and groups. We will discuss how to use Access Control Lists (ACLs) on the Hadoop Distributed File System (HDFS).

Tuesday, March 13, 2018

Spooling Files to HBase using Flume


One of my team wants to upload the contents of file existing in a specific directory (spooling dir) to HBase for some analysis. For the purpose we will be using Flume's spooldir-source which will allow users and applications to place files in spooling dir and process each line as one event to put it in HBase. It is assumed that Hadoop cluster and HBase is running, our environment is on HDP 2.6.

Tuesday, February 06, 2018

Integrating Hadoop Cluster with Microsoft Azure Blob Storage


Azure Blob storage is a service for storing large amounts of unstructured object data, such as text or binary data, that can be accessed from anywhere in the world via HTTP or HTTPS. You can use Blob storage to expose data publicly to the world, or to store application data privately. All access to Azure Storage is done through a storage account.