fluentd の履歴(No.46)

ログ収集ツール。単体でつかうよりもパッケージされたtd-agentで利用することが多い。

td-agentのインストール
td-agent on Mac
プラグイン
設定ファイル
プラグイン
- プラグイン一覧
- 日付付きファイル名に対応させる
format
- 主要フォーマット
- フィルタリング正規表現
実行
- トラブルシューティング
Tips
- td-agentのログの再取り込み
- 既存ログの取り込み
filter
- 設定
rewrite
- インストール
- 設定
Filter
- Filter設定例

td-agentのインストール†

AmazonLinuxなら楽勝。

http://dev.classmethod.jp/cloud/td-agent2-amazon-linux/

上記ページに沿って行った。CentOSでも同じ手順。

手動で実行するならば以下の通りに実行する

↑

td-agent on Mac†

dmgファイルをダウンロードしてインストール
以下のコマンドで起動停止

sudo launchctl load /Library/LaunchDaemons/td-agent.plist
sudo launchctl unload /Library/LaunchDaemons/td-agent.plist

↑

td-agentのバージョン整理†

2015/10時点の調査記録。yumは基本的にどんどん新しいものに更新されていく。

2015/10	OS	td-agent	fluentd
2015/10	CentOS6	0.12.12
2015/10	CentOS7	0.12.12
2015/10	AmazonLinux 15.09	0.12.12
2016/01	Azure CentOS7	0.12.19

↑

バージョンのなぞ†

同じリポジトリを使っていて同じrpmのバージョンなのに0.10.60と0.10.55が混在している。

$ td-agent --version
td-agent 0.10.55
$ rpm -qi td-agent
Name        : td-agent                     Relocations: (not relocatable)
Version     : 1.1.21                            Vendor: Treasure Data, Inc.
Release     : 0                             Build Date: 2014年10月20日 17時31分13秒
Install Date: 2015年08月12日 14時02分46秒      Build Host: ip-10-123-31-198.ec2.internal
Group       : System Environment/Daemons    Source RPM: td-agent-1.1.21-0.src.rpm
Size        : 103551538                        License: APL2
Signature   : DSA/SHA1, 2014年10月20日 22時07分39秒, Key ID 1093db45a12e206f
URL         : http://treasure-data.com/
Summary     : td-agent
Description :

↑

yum.repositoryの追加†

vi /etc/yum.repos.d/td.repo

[treasuredata]
name=TreasureData
baseurl=http://packages.treasure-data.com/redhat/$basearch
gpgcheck=0

V2を入れる場合（最近はデフォルトこちら)

[treasuredata]
name=TreasureData
baseurl=http://packages.treasuredata.com/2/redhat/$releasever/$basearch
gpgcheck=1
gpgkey=https://packages.treasuredata.com/GPG-KEY-td-agent

↑

td-agentのバージョンアップ†

yum remove td-agentxxx
curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent2.sh | sh

プラグインは入れなおしとなる。

↑

td-agentのインストール†

yum install td-agent

↑

プラグイン†

↑

プラグインの場所†

td-agent 0.10.55(32bit)	/usr/lib/fluent/ruby/lib/ruby/gems/1.9.1/gems
td-agent 0.10.55(64bit)	/usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems
td-agent 0.12.12	/opt/td-agent/embedded/lib/ruby/gems/2.1.0/

↑

プラグインのgemインストール†

上記場所にあるfluent-gemを利用する。td-agentが管理するrubyでインストールする必要がある。fluent-gemの場所がOS&バージョンにより違うので注意

Redhat5	/usr/lib/fluent/ruby/bin/fluent-gem
RedHat6	/opt/td-agent/embedded/bin/fluent-gem
AmazonLinux	/usr/lib64/fluent/ruby/bin/fluent-gem

/usr/lib64/fluent/ruby/bin/fluent-gem install fluent-plugin-zabbix
/opt/td-agent/embedded/bin/fluent-gem install fluent-plugin-forest
/usr/lib/fluent/ruby/bin/fluent-gem install fluent-plugin-record-reformer

↑

直接配置する場合†

/etc/td-agent/plugin/in_xxx.rb or out_xxx.rb

↑

設定ファイル†

sourceで入力を定義して、matchで処理を行う。matchで複数の処理はできないので別々のプラグインで複数処理をしたい場合はtagをつける。

<match apache.access>
  type file
  path /var/tmp/apache_all.log
  # ワイルドカードを使う場合は"で囲む！
  path "/var/tmp/*_access_log"

  tag next.apache.access
</match>
<match next.apache.access>
  type file
  path /var/tmp/apache_all2.log
</match>

↑

設定ファイルのインクルード†

@include conf.d/*.conf

↑

設定ファイルで環境変数を使う†

引数で--use-v1-configが必須。/etc/init.d/td-agentにて付与する（V2からは標準だが、付けているようだ）
/etc/sysconfig/td-agentなどで変数を設定する

<source>
  type tail
  tag var.tmp
  path "/var/tmp/#{ENV['TD_HOSTNAME']}"
  format none
</source>

ダブルクォーテーションで囲まないと展開されないので注意。 matchには使えない・・・・これでは意味ない "#{Socket.gethostname}"でホスト名利用可能

<match raw.dummy>
  type file
  path "/var/tmp/#{ENV['HOME']}/test.log"
</match>

インクルードには使えた。

@include "#{ENV['TD_HOSTNAME']}.conf"

HOMEはtd-agentの実行ユーザーのディレクトリとなる。デフォルトでは/var/lib/td-agent/

↑

変数のタイプを設定†

types size:integer,response_time:integer

↑

httpポート8888で待ち受け†

# http://localhost:8888/<tag>?json=<json>
<source>
  type http
  port 8888
</source>

type forwardの場合はhttpアクセスはできないがそのポートで待ち受けすることになる。

↑

tagやフィールドにhost名を自動付与する。†

http://www.fluentd.org/guides/recipes/apache-add-hostname

↑

フィールドに付与する場合はfilterタグを使うのが良い†

<filter web.*>
  type record_transformer
  <record>
    service_name ${tag_parts[1]}
  </record>
</filter>

除去したい場合も。excludeは複数並べることができる。

<filter apache.access>
  type grep
  exclude1 statuscode (200|301|302|304)
</filter>

↑

設定をDSLで記述する†

/usr/sbin/td-agentの読み込み設定ファイルを.rbに変更して以下の記載をする。

#!/opt/td-agent/embedded/bin/ruby
ENV["GEM_HOME"]="/opt/td-agent/embedded/lib/ruby/gems/2.1.0/"
ENV["GEM_PATH"]="/opt/td-agent/embedded/lib/ruby/gems/2.1.0/"
#ENV["FLUENT_CONF"]="/etc/td-agent/td-agent.conf"
ENV["FLUENT_CONF"]="/etc/td-agent/test.rb"
ENV["FLUENT_PLUGIN"]="/etc/td-agent/plugin"
ENV["FLUENT_SOCKET"]="/var/run/td-agent/td-agent.sock"
load "/opt/td-agent/embedded/bin/fluentd"

元のファイルは以下の通り配列をループさせている。type以降の設定値は""で加工必要がある。

['hoge','fuga'].each do |i|
  match ("foo#{i}.#{ENV['HOSTNAME']}") {
    type :stdout
  }
end
source {
  type :tail
  path "/var/tmp/hoge.log"
}

# apche settingをＤＳＬで記載してみた
apache_hash = { "access" => "apache", "error" => "apache_error"}
apache_hash.each do |key,value|
source {
    type :tail
    path "/var/log/httpd/*_#{key}_log"
    format "#{value}"
    tag "apache_#{key}"
    pos_file "/tmp/td-agent/apache_#{key}.pos"
}
end

設定確認

td-agent -c /etc/td-agent/test.rb --dry-run

/var/log/td-agent/td-agent.logにxml形式で展開される

 <match foohoge.**>
   type stdout
 </match>
 <match foofuga.**>
   type stdout
 </match>

↑

type(subtype)の説明†

type名	簡単な概要
null	転送せずに捨てる
forest	タグ名を置換変数化できるので、まとめて同じような設定をしたいときに使う
rewrite_tag_filter	正規表現でタグづけできる
record_modifier	新たに属性を追加できる。たとえばApacheログにホスト名を付与したりとか

↑

ローカルのファイルを転送する。†

<source>
  type tail
  format apache
  path /var/log/httpd/*_access_log
  tag apache.access
  pos_file /tmp/fluentd-apache.pos
</source>
<match apache.access>
       type s3
       aws_key_id 
       aws_sec_key 
       s3_bucket bucket_name
       s3_endpoint bucket_name.s3-website-ap-northeast-1.amazonaws.com
       path logs/
       buffer_path /var/tmp/fluentd
       time_slice_format %Y%m%d/%H_apache.log
       time_slice_wait 30m
       flush_interval 60s // この感覚でS3にputするので一日1440リクエストで危うくクラウド破産！
</match>
<source>
  type   tail
  path   /var/log/httpd/error_log
  format apache_error
  tag    apache.error
  pos_file /tmp/apache_error.pos
</source>
# 送り先を Fluentd の標準ログへ出力します
<match apache.error>
  type stdout
</match>

<source>
  type tail
  path /var/log/httpd/access_log
  pos_file /var/tmp/access_log.pos
  tag httpd
  format none
</source>
# 送り先を Fluentd の標準ログへ出力します
<match httpd>
  type stdout
</match>

↑

プラグイン†

↑

プラグイン一覧†

プラグイン名
copy	転送やファイル保存など複数に保存したいときに
rewrite_tag_filter	条件に応じてタグを書き換えることができる
forest	同じようなタグに一括で適用したい場合に非常に便利
fluent-plugin-map	レコードの内容書き換え
fluent-plugin-record-reformer	同じくレコード書き換え

↑

日付付きファイル名に対応させる†

<source>
  type tail
  format none
  path /var/tmp/%Y%m%d%H.log
  tag tail_ex_test
  pos_file /tmp/td-agent/tail_ex_test.pos
  refresh_interval 10
</source>

日付のフォーマットはrubyのもの参照！上記の例だと2015083122.logが監視ファイル名となる。

↑

format†

↑

主要フォーマット†

フォーマット名	入力文字例	備考
none	入力そのまま
none_with_hostname		入力文字列にhost情報
ltsv	domain:example.com	ラベル付きのTSV
apache2	apacheのcombined	カスタマイズしてたらNG
apache.error	apacheのerrorログ	カスタマイズしてたらNG
csv,tsv	example.com,/hoge	keys domain,pathなどとキーを別個定義

↑

フィルタリング正規表現†

formatを自分で作る場合rubyの正規表現の知識が必須。

↑

Apacheの場合(combined以外)†

日付の部分の正規表現がとてもめんどくさい。\[(?<time>[^\]]+)\]がその正規表現。フォーマットも指定しないとだめ。

format /^(?<host>[^ ]+) [^ ]+ [^ ]+ \[(?<time>[^\]]+)\] (?<message>[^ ]+).*$/
time_format %d/%b/%Y:%T %z

↑

参考サイト†

http://diary.tachibanakikaku.com/2013/12/fluentdformat.html

↑

手元で正規表現テスト†

#!/usr/bin/env ruby
# -*- coding: utf-8 -*-
require 'time'
require 'fluent/log'
require 'fluent/config'
require 'fluent/engine'
require 'fluent/parser'
$log ||= Fluent::Log.new
# debug
log = ''
format = //
time_format = ''
parser = Fluent::TextParser::RegexpParser.new(format, 'time_format' => time_format)
puts parser.call(log)

/usr/lib64/fluent/ruby/bin/ruby fluenttest.ruby
# amazonLinuxだとrubyのパスが違う
/opt/td-agent/embedded/bin/ruby fluenttest.ruby

↑

テスト実行サイト†

Fluentular: a Fluentd regular expression editor http://fluentular.herokuapp.com/

↑

実行†

↑

トラブルシューティング†

読み込みファイルの指定にワイルドカードが使えないわかがない！→後で修正
読み込みにはtd-agentグループ権限が付与されていないとエラー
combinedがパターンマッチされない・・これはカスタマイズしている可能性もあるので今後調査。→カスタマイズしてたら取り込まれない！

↑

Tips†

secure messageの取り込み

http://y-ken.hatenablog.com/entry/fluentd-syslog-permission

↑

td-agentのログの再取り込み†

そのまま取り込めそうなものだがJSONに組み替えてあげないとだめ。

cut -f1,3 fluent_test.log | awk -F'\t' '{print "{\"timestamp\":\"" $1 "\","  substr($2,2)}'

取り込みの設定も下記のように細かく記載する。time_keyとtime_formatを指定しないと取り込み時間がログの記録時間になってしまう。

<source>
  type tail
  tag recover
  path /var/tmp/recover.log
  format json
  time_key timestamp
  time_format "%Y-%m-%dT%H:%M:%S%z"
</source>

↑

既存ログの取り込み†

posファイルを変更してもダメ！tailプラグインしかないのがイタイ。結局ファイルを上書きすることで解決だが、一気に読み込むため以下のエラーが出てしまう。

2015-09-01 16:28:23 +0900 [warn]: Size of the emitted data exceeds buffer_chunk_limit.
2015-09-01 16:28:23 +0900 [warn]: This may occur problems in the output plugins ``at this server.``
2015-09-01 16:28:23 +0900 [warn]: To avoid problems, set a smaller number to the buffer_chunk_limit
2015-09-01 16:28:23 +0900 [warn]: in the forward output ``at the log forwarding server.``

outputのbuffer_chunk_limitを100Mにしたら、エラーは消えた。

emblukという新しいソリューションが出ているので今後はそちらに期待。

↑

filter†

最近のバージョンではこちらを使う。

↑

設定†

matchの前に置くべし！

<filter foo.bar>
 type grep
 regexp1 message cool
 regexp2 hostname ^web\d+\.example\.com$
 exclude1 message uncool

/filter>

複数条件がある場合 regexpの場合はAND条件になり、excludeの場合はor条件になる。基本的にexcludeで使っていくべきだろう。

↑

rewrite†

いまいち使えないのでFilterを検討する！

↑

インストール†

/usr/lib64/fluent/ruby/bin/fluent-gem install fluent-plugin-rewrite

↑

設定†

<match test.log>
  type rewrite
  remove_prefix test
  add_prefix reformed
  <rule>
    key message
    pattern hoge
    replace fuga
  </rule>
</match>

↑

Filter†

v0.12から利用可能。AWSならOKだが、CentOS系はV0.10だから使えない。

↑

Filter設定例†

<source>
  type dummy
  tag raw.dummy
  dummy {"message":"[WARN] warning[tab]message[tab]"}
</source>
<filter raw.**>
  type grep
  regexp1 message WARN
</filter>
<filter raw.**>
  type record_transformer
  enable_ruby true
  <record>
    tag ${tag}
    hostname "#{Socket.gethostname}"
    replaced ${message.gsub(/tab/,'\t')}
  </record>
</filter>
<match raw.**>
  type stdout
</match>