Fluentd is an open source data collector for building the unified logging layer. Once installed on a server, it runs in the background to collect, parse, transform, analyze and store various types of data. It is written in Ruby for flexibility, with performance-sensitive parts in C. td-agent is a stable distribution package of Fluentd having 30-40MB memory footprint.
Fluent Bit is a Lightweight Data Forwarder (with 450KB memory footprint) for Fluentd. Fluent Bit is specifically designed for forwarding the data from the edge (Containers / Servers / Embedded Systems) to Fluentd aggregators.
Minio is an open source distributed object storage server written in Go, designed for Private Cloud infrastructure providing S3 storage functionality. Minio is the best server which is suited for storing unstructured data such as photos, videos, log files, backups, and container. Size of an object can be range from a KBs to a maximum of 5TB.
Fluentd Setup
1- Install fluentd (CentOS)
download (td-agent) from below location and place to your desired location.
https://www.fluentd.org/download
[hdpsysuser@hdpmaster apps]$ sudo yum localinstall td-agent-4.0.0-1.el7.x86_64.rpm
-- Check the status
[hdpsysuser@hdpmaster apps]$ sudo systemctl status td-agent.service
-- Start Service
[hdpsysuser@hdpmaster apps]$ sudo systemctl start td-agent.service
2- Test posting Sample Logs via HTTP
The default configuration (/etc/td-agent/td-agent.conf) is to receive logs at an HTTP endpoint and route them to stdout. For td-agent logs, see /var/log/td-agent/td-agent.log.
You can post sample log records with curl command:
[hdpsysuser@hdpmaster apps]$ curl -X POST -d 'json={"json":"message"}' http://localhost:8888/debug.test
[hdpsysuser@hdpmaster apps]$ tail -n 1 /var/log/td-agent/td-agent.log
2022-07-07 09:57:50.897853249 +0300 debug.test: {"json":"message"}
3- Manually running fluentd
Although we installed fulentd as service but we can run the process manually if required, for example for testing or debugging purpose.
/opt/td-agent/bin/fluentd -c /etc/td-agent/td-agent.conf
4- Configure the fluentd
The configuration file (/etc/td-agent/td-agent.conf) allows the user to control the input and output behavior of Fluentd by
1- Selecting input and output plugins
2- specifying the plugin parameters
Fluentd assumes configuration file is UTF-8
or ASCII
. The configuration file consists of the following directives:
source directives determine the input sources, where all the data comes from. Fluentd input sources are enabled by selecting and configuring the desired input plugins using source directives. Fluentd standard input plugins include http
and forward
. The http
provides an HTTP endpoint to accept incoming HTTP messages whereas forward
provides a TCP endpoint to accept TCP packets. The source
submits events to the Fluentd routing engine. An event consists of three entities: tag, time and record. The tag
is a string separated by dots (e.g. myapp.access
), and is used as the directions for Fluentd internal routing engine. The time
field is specified by input plugins, and it must be in the Unix time format. The record
is a JSON object.
match directives determine the output destinations, tell fluentd what to do. It looks for events with matching tags and processes them. The most common use of the match
directive is to output events to other systems. For this reason, the plugins that correspond to the match
directive are called output plugins. Fluentd standard output plugins include file
and forward
.
filter directives determine the event processing pipelines, this directive has the same syntax as match
but filter
could be chained for processing pipeline
system directives set system-wide configuration
label directives group the output and filter for internal routing
worker directives limit to the specific workers
@include directives include other files
-- Example Configuration
Below is the example configuration which uses "forward" Input plugin which listens to a TCP socket to receive the event stream. This plugin is mainly used to receive event logs from other Fluentd instances, the fluent-cat command, or Fluentd client libraries. This is by far the most efficient way to retrieve the records.
The received event goes to record_transformer filter first. The record_transformer filter adds host_param field to the event; and, then the filtered event goes to the file output plugin (match).
Output plugin writes records into the S3 object storage service. By default, it creates files on an hourly basis. This means that when you first import records using the plugin, no file is created immediately. The file will be created when the timekey condition has been met. To change the output frequency, please modify the timekey value in the buffer section.
<system>
process_name fluentd-test
workers 2
</system>
<source>
@type forward
bind 0.0.0.0
port 8888
</source>
<filter {msglog,audlog}>
@type record_transformer
<record>
host_param "#{Socket.gethostname}"
</record>
</filter>
<match {msglog,audlog}>
@type s3
aws_key_id minioadmin # The access key for Minio
aws_sec_key minioadmin # The secret key for Minio
s3_bucket logstash-output # The bucket to store the log data
s3_endpoint http://hdpmaster:9000 # The endpoint URL (like "http://localhost:9000/")
s3_region us-east-1 # See the region settings of your Minio server
path logs/ # This prefix is added to each file
#path logs/%Y/%m/%d/file-%H%M # This prefix is added to each file
force_path_style true # This prevents AWS SDK from breaking endpoint URL
time_slice_format %Y%m%d%H%M # This timestamp is added to each file name
<buffer>
timekey 3600 # 1 hour partition
timekey_wait 1m
</buffer>
</match>
After configuration is done, run the fluentd.
[hdpsysuser@hdpmaster ~]$ /opt/td-agent/bin/fluentd -c /etc/td-agent/td-agent.conf
As we have setup the fluentd to collect all the logs, it is the time to forward logs to fluentd using our light weight log forwarder fluent-bit.
1- Download and install fluent-bit (CentOS)
[hdpsysuser@hdpmaster ~]$ curl https://raw.githubusercontent.com/fluent/fluent-bit/master/install.sh | sh
-- Check the status
[hdpsysuser@hdpmaster ~]$ systemctl status fluent-bit.service
[hdpsysuser@hdpmaster ~]$ sudo systemctl start fluent-bit.service
[hdpsysuser@hdpmaster ~]$ systemctl status fluent-bit.service
2- Test standard output
[hdpsysuser@hdpmaster ~]$ /opt/fluent-bit/bin/fluent-bit -i cpu -F stdout -m '*' -o null
3- Manually running fluent-bit
Although we installed fluent-bit as service but we can start it manually also for testing and debugging purpose. The configuration file is located at /etc/fluent-bit/fluent-bit.conf
[hdpsysuser@hdpmaster ~]$ sudo /opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf
4- Configure the fluent-bit
Fluent Bit might use a configuration file (/etc/fluent-bit/fluent-bit.conf) to define how the service will behave. The file uses configuration schema with three concepts. A section is defined by a name or title inside brackets. A section may contain Entries, an entry is defined by a line of text that contains a Key and a Value. Fluent Bit configuration files are based in a strict Indented Mode, that means that each configuration file must follow the same pattern of alignment from left to right when writing text. By default an indentation level of four spaces from left to right is suggested.
-- Example Configuration
The Service section defines global properties of the service, for example parsers_file denotes Path for a parsers configuration file. Multiple Parsers_File entries can be defined within the section.
An INPUT section defines a source (related to an input plugin). The Name is mandatory and it let Fluent Bit know which input plugin should be loaded. The Tag is mandatory for all plugins except for the input forward plugin (as it provides dynamic tags).
A FILTER section defines a filter (related to an filter plugin), for example Match key denotes a pattern to match against the tags of incoming records. It's case sensitive and support the star (*) character as a wildcard. The Name is mandatory and it let Fluent Bit know which filter plugin should be loaded. The Match or Match_Regex is mandatory for all plugins. If both are specified, Match_Regex takes precedence.
The OUTPUT section specify a destination that certain records should follow after a Tag match. Currently, Fluent Bit can route up to 256 OUTPUT plugins.
-- /etc/fluent-bit/fluent-bit.conf
[SERVICE]
Parsers_File parserFile.conf
[INPUT]
Name tail
Path /var/log/messages
Tag msglog
[INPUT]
Name tail
Path /var/log/audit/audit.log
Tag audlog
[FILTER]
Name grep
Match *
[OUTPUT]
Name forward
Match *
Host hdpmaster
Port 8888
Retry_Limit False
--Manually start fluent-bit and forward from fluent-bit to fluentd
-- send CPU stats
bin/fluent-bit -i INPUT -o forward://HOST:PORT
[root@hdpmaster ~]# /opt/fluent-bit/bin/fluent-bit -i cpu -t fluent_bit -o forward://hdpmaster:8888
Using the CPU input plugin as an example we will flush CPU metrics to Fluentd with tag fluent_bit:
Now on the Fluentd side, you will see the CPU metrics gathered in the last seconds:
-- send OS losgs
[hdpsysuser@hdpmaster ~]$ sudo /opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf -o forward://hdpmaster:8888
Now fluentd and fluent-bit is running, you can verify from minio whether logs are being received or not.
After the logs are received , you can analyze your logs using Presto, for details please check the below post.
I'm pasting the relevant section from the above post.
-- create external table in Presto
create table minio.default.mytable2 (col varchar(65535))
with (format='TEXTFILE', external_location='s3a://logstash-output/logs');
-- create view in presto
Create view pvw_myt2 As
SELECT
regexp_extract(col, '{.*}') as stmt
from minio.default.mytable2 ;
-- query the view
presto:default> select stmt from pvw_myt2 where stmt like '%Message%' limit 20;
No comments:
Post a Comment