Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Thursday, August 10, 2017

Working with Talend for Big Data (TOSBD)


Introduction                                                                               

Talend (eclipse based) provides unified development and management tools to integrate and process all of your data with an easy to use, visual designer. It helps companies become data driven by making data more accessible, improving its quality and quickly moving it where it’s needed for real-time decision making.
Talend for Big Data is built on top of Talend's data integration solution that enables users to access, transform, move and synchronize big data by leveraging the Apache Hadoop Big Data Platform and makes the Hadoop platform ever so easy to use.

Tuesday, August 08, 2017

Analyzing/Parsing syslogs using Hive and Presto


Scenario


My company asked me to provide the solution for syslog aggregation for all the environments so that they may be able to analyze and get insights. Logs should be captured first, then retained and finally processed by the analyst team in a way they already use to query/process with database. The requirements are not much clearer as well as volume of data can't be determined at the stage.

Wednesday, August 02, 2017

Working with Apache Cassandra (RHEL 7)


Introduction
Cassandra (created at Facebook for inbox search) like HBase is a NoSQL database, generally, it means you cannot manipulate the database with SQL. However, Cassandra has implemented CQL (Cassandra Query Language), the syntax of which is obviously modeled after SQL and designed to manage extremely large data sets with manipulation capabilities. It is a distributed database, clients can connect to any node in the cluster and access any data.

Tuesday, August 01, 2017

Hortonworks - Using HDP Spark SQL



Using SQLContext, Apache Spark SQL can read data directly from the file system. This is useful when the data you are trying to analyze does not reside in Apache Hive (for example, JSON files stored in HDFS).