Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Monday, July 25, 2022

Centralized Logging with Fluentd/Fluent-bit and Minio

 Fluentd is an open source data collector for building the unified logging layer. Once installed on a server, it runs in the background to collect, parse, transform, analyze and store various types of data. It is written in Ruby for flexibility, with performance-sensitive parts in C. td-agent is a stable distribution package of Fluentd having 30-40MB memory footprint. 

Fluent Bit is a Lightweight Data Forwarder (with 450KB memory footprint) for Fluentd. Fluent Bit is specifically designed for forwarding the data from the edge (Containers / Servers / Embedded Systems) to Fluentd aggregators.

Minio is an open source distributed object storage server written in Go, designed for Private Cloud infrastructure providing S3 storage functionality. Minio is the best server which is suited for storing unstructured data such as photos, videos, log files, backups, and container. Size of an object can be range from a KBs to a maximum of 5TB.

Fluentd Setup                                    

1- Install fluentd (CentOS)

download (td-agent) from below location and place to your desired location.

https://www.fluentd.org/download

[hdpsysuser@hdpmaster apps]$ sudo yum localinstall td-agent-4.0.0-1.el7.x86_64.rpm

-- Check the status

[hdpsysuser@hdpmaster apps]$ sudo systemctl status td-agent.service

-- Start Service

[hdpsysuser@hdpmaster apps]$ sudo systemctl start td-agent.service



2- Test posting Sample Logs via HTTP

The default configuration (/etc/td-agent/td-agent.conf) is to receive logs at an HTTP endpoint and route them to stdout. For td-agent logs, see /var/log/td-agent/td-agent.log.

You can post sample log records with curl command:

[hdpsysuser@hdpmaster apps]$ curl -X POST -d 'json={"json":"message"}' http://localhost:8888/debug.test

[hdpsysuser@hdpmaster apps]$ tail -n 1 /var/log/td-agent/td-agent.log

2022-07-07 09:57:50.897853249 +0300 debug.test: {"json":"message"}


3- Manually running fluentd

Although we installed fulentd as service but we can run the process manually if required, for example for testing or debugging purpose.

/opt/td-agent/bin/fluentd -c /etc/td-agent/td-agent.conf

4- Configure the fluentd

The configuration file (/etc/td-agent/td-agent.conf) allows the user to control the input and output behavior of Fluentd by 

1- Selecting input and output plugins
2- specifying the plugin parameters

Fluentd assumes configuration file is UTF-8 or ASCIIThe configuration file consists of the following directives:

source directives determine the input sources, where all the data comes from. Fluentd input sources are enabled by selecting and configuring the desired input plugins using source directives. Fluentd standard input plugins include http and forward. The http provides an HTTP endpoint to accept incoming HTTP messages whereas forward provides a TCP endpoint to accept TCP packets. The source submits events to the Fluentd routing engine. An event consists of three entities: tag, time and record. The tag is a string separated by dots (e.g. myapp.access), and is used as the directions for Fluentd internal routing engine. The time field is specified by input plugins, and it must be in the Unix time format. The record is a JSON object.

match directives determine the output destinations, tell fluentd what to do. It looks for events with matching tags and processes them. The most common use of the match directive is to output events to other systems. For this reason, the plugins that correspond to the match directive are called output plugins. Fluentd standard output plugins include file and forward.

filter directives determine the event processing pipelines, this directive has the same syntax as match but filter could be chained for processing pipeline

system directives set system-wide configuration

label directives group the output and filter for internal routing

worker directives limit to the specific workers

@include directives include other files


-- Example Configuration 

Below is the example configuration which uses "forward"  Input plugin which listens to a TCP socket to receive the event stream. This plugin is mainly used to receive event logs from other Fluentd instances, the fluent-cat command, or Fluentd client libraries. This is by far the most efficient way to retrieve the records. 

The received event  goes to record_transformer filter first. The record_transformer filter adds host_param field to the event; and, then the filtered event goes to the file output plugin (match).

Output plugin writes records into the  S3  object storage service. By default, it creates files on an hourly basis. This means that when you first import records using the plugin, no file is created immediately. The file will be created when the timekey condition has been met. To change the output frequency, please modify the timekey value in the buffer section.


<system> process_name fluentd-test workers 2 </system> <source> @type forward bind 0.0.0.0 port 8888 </source> <filter {msglog,audlog}> @type record_transformer <record> host_param "#{Socket.gethostname}" </record> </filter> <match {msglog,audlog}> @type s3 aws_key_id minioadmin # The access key for Minio aws_sec_key minioadmin # The secret key for Minio s3_bucket logstash-output # The bucket to store the log data s3_endpoint http://hdpmaster:9000 # The endpoint URL (like "http://localhost:9000/") s3_region us-east-1 # See the region settings of your Minio server path logs/ # This prefix is added to each file #path logs/%Y/%m/%d/file-%H%M # This prefix is added to each file force_path_style true # This prevents AWS SDK from breaking endpoint URL time_slice_format %Y%m%d%H%M # This timestamp is added to each file name <buffer> timekey 3600 # 1 hour partition timekey_wait 1m </buffer> </match>

After configuration is done, run the fluentd.
[hdpsysuser@hdpmaster ~]$ /opt/td-agent/bin/fluentd  -c /etc/td-agent/td-agent.conf




Fluent-bit Setup                                      

As we have setup the fluentd to collect all the logs, it is the time to forward logs to fluentd using our light weight log forwarder fluent-bit.

1- Download and install fluent-bit (CentOS)

[hdpsysuser@hdpmaster ~]$ curl https://raw.githubusercontent.com/fluent/fluent-bit/master/install.sh | sh

-- Check the status

[hdpsysuser@hdpmaster ~]$ systemctl status fluent-bit.service

[hdpsysuser@hdpmaster ~]$ sudo systemctl start fluent-bit.service

[hdpsysuser@hdpmaster ~]$ systemctl status fluent-bit.service


2- Test standard output
[hdpsysuser@hdpmaster ~]$ /opt/fluent-bit/bin/fluent-bit -i cpu -F stdout -m '*' -o null




3- Manually running fluent-bit

Although we installed fluent-bit as service but we can start it manually also for testing and debugging purpose. The configuration file is located at /etc/fluent-bit/fluent-bit.conf

[hdpsysuser@hdpmaster ~]$ sudo /opt/fluent-bit/bin/fluent-bit  -c /etc/fluent-bit/fluent-bit.conf



4- Configure the fluent-bit

Fluent Bit might use a configuration file (/etc/fluent-bit/fluent-bit.conf) to define how the service will behave. The file uses configuration schema with three concepts. A section is defined by a name or title inside brackets. A section may contain Entries, an entry is defined by a line of text that contains a Key and a Value. Fluent Bit configuration files are based in a strict Indented Mode, that means that each configuration file must follow the same pattern of alignment from left to right when writing text. By default an indentation level of four spaces from left to right is suggested.

-- Example Configuration

The Service section defines global properties of the service, for example parsers_file denotes Path for a parsers configuration file. Multiple Parsers_File entries can be defined within the section.

An INPUT section defines a source (related to an input plugin). The Name is mandatory and it let Fluent Bit know which input plugin should be loaded. The Tag is mandatory for all plugins except for the input forward plugin (as it provides dynamic tags).

A FILTER section defines a filter (related to an filter plugin), for example Match key denotes a pattern to match against the tags of incoming records. It's case sensitive and support the star (*) character as a wildcard. The Name is mandatory and it let Fluent Bit know which filter plugin should be loaded. The Match or Match_Regex is mandatory for all plugins. If both are specified, Match_Regex takes precedence.

The OUTPUT section specify a destination that certain records should follow after a Tag match. Currently, Fluent Bit can route up to 256 OUTPUT plugins.


-- /etc/fluent-bit/fluent-bit.conf

[SERVICE]
    Parsers_File parserFile.conf

[INPUT]
    Name        tail
    Path        /var/log/messages
    Tag         msglog

[INPUT]
    Name        tail
    Path        /var/log/audit/audit.log
    Tag         audlog

[FILTER]
    Name  grep
    Match *

[OUTPUT]
    Name forward
    Match *
    Host hdpmaster
    Port 8888
    Retry_Limit False



--Manually start fluent-bit and forward from fluent-bit to fluentd
-- send CPU stats
bin/fluent-bit -i INPUT -o forward://HOST:PORT
[root@hdpmaster ~]# /opt/fluent-bit/bin/fluent-bit -i cpu -t fluent_bit -o forward://hdpmaster:8888

Using the CPU input plugin as an example we will flush CPU metrics to Fluentd with tag fluent_bit:
Now on the Fluentd side, you will see the CPU metrics gathered in the last seconds:

-- send OS losgs

[hdpsysuser@hdpmaster ~]$ sudo /opt/fluent-bit/bin/fluent-bit  -c /etc/fluent-bit/fluent-bit.conf  -o forward://hdpmaster:8888



Verify Flow                                              

Now fluentd and fluent-bit is running, you can verify from minio whether logs are being received or not.



Analyze Received Logs                         

After the logs are received , you can analyze your logs using Presto,  for details please check the below post. 


I'm pasting the relevant section from the above post.

-- create external table in Presto
create table minio.default.mytable2 (col varchar(65535)) 
with (format='TEXTFILE', external_location='s3a://logstash-output/logs');



-- create view in presto
Create view pvw_myt2 As
SELECT 
regexp_extract(col, '{.*}') as stmt
from minio.default.mytable2 ;

-- query the view 
presto:default> select stmt from pvw_myt2 where stmt like '%Message%' limit 20;




No comments: