-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support RFC5424 message format #15
Comments
Loggly, Logentries and other SaaS logging services are using RFC5424 for shipping logs to their servers. Even if Logstash has it's own format (Lumberjack), I favor using syslog for interoperability with those services. I've started to write my own parsing in Logstash, to work around the limitations of the input {
tcp {
type => "syslog"
port => 5544
tags => []
add_field => { "_index_name" => "syslog" }
}
}
filter {
# Manually parse the log, as we want to support both RCF3164 and RFC5424
grok {
break_on_match => true
match => [
"message", "%{SYSLOG5424LINE}",
"message", "%{SYSLOGLINE}"
]
}
if [syslog5424_ts] {
# Handle RFC5424 formatted Syslog messages
mutate {
remove_field => [ "message", "host" ]
add_tag => [ "syslog5424" ]
}
mutate {
# Use a friendlier naming scheme
rename => {
"syslog5424_app" => "program"
"syslog5424_msg" => "message"
"syslog5424_host" => "host"
}
remove_field => [ "syslog5424_ver", "syslog5424_proc" ]
}
if [syslog5424_pri] {
# Calculate facility and severity from the syslog PRI value
ruby {
code => "event['severity'] = event['syslog5424_pri'].modulo(8)"
}
ruby {
code => "event['facility'] = (event['syslog5424_pri'] / 8).floor"
}
mutate {
remove_field => [ "syslog5424_pri" ]
}
}
if [syslog5424_sd] {
# All structured data needs to be in format [key=value,key=value,...]
mutate {
# Remove wrapping brackets
gsub => [ "syslog5424_sd", "[\[\]]", "" ]
}
kv {
# Convert the structured data into Logstash fields
source => "syslog5424_sd"
field_split => ","
value_split => "="
remove_field => [ "syslog5424_sd" ]
}
}
date {
match => [ "syslog5424_ts", "ISO8601" ]
remove_field => [ "syslog5424_ts", "timestamp" ]
}
}
else {
# Handle RFC3164 formatted Syslog messages
mutate {
add_tag => [ "syslog3164" ]
}
}
}
output {
# ...
} This works pretty with in combination with Logspout. |
+1 |
1 similar comment
+1 |
@arabold Thanks for the config, but you do realise, that the snippet above parses attributes in a form incompatible with RFC5424? Implementation listed above is incorrect and lacking in multiple ways, if intended to parse RFC5424, aside from using a kv format incompatible with RFC5424. This is not meant as a critique of arabold's code, rather to show those interested in using RFC5424 format, that the above code is not RFC5424 compliant and how much more would be needed to properly parse and handle it. As usual, this is my simplistic opinion and may be incorrect and incomplete:
|
If you want to you can augment the above with this code in place of the kv filter. You also should remove the Mutate as it breaks the format of the field. The below should be RFC 5424 Compliant without taking into account encoding issues that may be present.
|
You probably would want to strip double quotes from param_value by modifying line in above ruby filter with:
|
Is there any sight on this feature being implemented in the near future? I need to capture syslog from 2000+ cisco switches and routers, currently it is nearly impossible to do so. |
In case anyone else ends up here looking for a way to work with syslog in either RFC5424 or RFC3164, I had to make some tweaks to the above suggestions to support logstash 5.x. I also noted an issue in the RegEx that caused failures if an SD Element has no SD Params. According to the spec an SD Element can have 0+ Params. def extract_syslog5424_sd(syslog5424_sd)
sd = {}
syslog5424_sd.scan(/\[(?<element>.*?[^\\])\]/) do |element|
data = element[0].match(/(?<sd_id>[^\ ]+)(?<sd_params> .*)?/)
sd[data[:sd_id]] = {}
next if data.nil? || data[:sd_params].nil?
data[:sd_params].scan(/(.*?[=]".*?[^\\]")/) do |set|
set = set[0].match(/(?<param_name>.*?)[=]\"(?<param_value>.*)\"/)
sd[data[:sd_id]][set[:param_name].lstrip] = set[:param_value]
end
end
sd
end
event.set('syslog5424_sd', extract_syslog5424_sd(event.get('syslog5424_sd'))) Note also that I had to concatenate that with
Thanks for the template to work with! |
There's a small bug in @ABitMoreDepth 's code. If you send an empty value (e.g. filter {
grok {
match => {
"message" => "<%{NONNEGINT:syslog_pri}>%{NONNEGINT:version}%{SPACE}(?:-|%{TIMESTAMP_ISO8601:syslog_timestamp})%{SPACE}(?:-|%{IPORHOST:hostname})%{SPACE}(?:%{SYSLOG5424PRINTASCII:program}|-)%{SPACE}(?:-|%{SYSLOG5424PRINTASCII:process_id})%{SPACE}(?:-|%{SYSLOG5424PRINTASCII:message_id})%{SPACE}(?:-|(?<structured_data>(\[.*?[^\\]\])+))(?:%{SPACE}%{GREEDYDATA:syslog_message}|)"
}
add_tag => [ "match" ]
}
if "match" in [tags] {
syslog_pri {
remove_field => "syslog_pri"
}
date {
match => [ "syslog_timestamp", "ISO8601" ]
remove_field => "syslog_timestamp"
}
if [structured_data] {
ruby {
code => '
def extract_syslog5424_sd(syslog5424_sd)
sd = {}
syslog5424_sd.scan(/\[(?<element>.*?[^\\])\]/) do |element|
data = element[0].match(/(?<sd_id>[^\ ]+)(?<sd_params> .*)?/)
sd_id = data[:sd_id].split("@", 2)[0]
sd[sd_id] = {}
next if data.nil? || data[:sd_params].nil?
data[:sd_params].scan(/ (.*?[=](?:""|".*?[^\\]"))/) do |set|
set = set[0].match(/(?<param_name>.*?)[=]\"(?<param_value>.*)\"/)
sd[sd_id][set[:param_name]] = set[:param_value]
end
end
sd
end
event.set("[sd]", extract_syslog5424_sd(event.get("[structured_data]")))
'
remove_field => "structured_data"
}
}
}
} |
+1 for handling this natively |
+1 |
While I agree it would be nice for logstash to natively support RFC5424, some pointers from a lot of experience with this amounted to:
If you want to go with the rsyslog + kafka approach, I did build a container to tackle it docker-rsyslog. However, I noticed that rsyslog upstream is finally maturing their support for containers and there are official rsyslog images. Not necessarily promoting rsyslog here, given it (in my opinion) has fairly complicated configuration syntax and its documentation lags often referencing old legacy style syslogd config syntax, but I can vouch for it performing remarkably well. If you want to stick to pure logstash, my hack was the following
And this is an example logstash conf using the above
If you look at the above mess, running rsyslog and dealing with it's cumbersome config to produce nice JSON output is possibly worth the pain anyhow. |
That doesn't parse the structured data in the message into separate fields. And also, it also isn't correctly extracting the structured data because it doesn't take into account that there can be a Try putting this through your rule:
|
Yip, what I did was limited and abandoned once I hit performance limitations and hadn't yet worried about structured data... @ABitMoreDepth's approach and your adaptation from it will likely work better for using native logstash, which given lower volumes or lots of CPU to throw at the problem is less effort than the rsyslog way of things (which can also neatly parse structured element data into JSON objects and index directly to elastic if you like)... Hacking at logstash to handle syslog at high volumes is something I recommend against (unless someone comes up with some small miracles to make this input plugin perform better). |
@JPvRiel do you have any numbers for the kind of performance you were seeing before you moved away from logstash? I haven't had the chance to get much volume through the logstash plugin yet (couple of thousand per second max, but only in short bursts). I was planning to scale the logstash container and handle volume horizontally when the need arises, but its an interesting point you make about CPU consumption, I just wondered what kind of throughput you got to before starting to feel that logstash wasn't going to scale well enough for you? 1 million events per second is pretty hefty! |
It was about about 2 years ago I moved away from my regex/grok pattern hackery for RFC5424 in logstash and onto rsyslog. At the time, I don't recall benchmarking it in detail, since one can't simply just run However, I didn't go as far as the above because I know rsyslog has "hand crafted" parser modules for RFC3164 and RFC5424 aimed specifically at just those kinds of syslog headers, without needing to rely on heavier generic pattern matching regexes. I'm venturing a safe bet that rsyslog (C code with targeted header specific string matching) versus logstash (JVM + JRuby + regex) is going to be no contest.
Just firewalls, DNS, and proxies logging into syslog pushes volume quickly. Add the trend of microservices in even medium size organisations can start loading a central syslog service. |
Working plugin. Thanks to AbitMoreDepth for the ruby script (logstash-plugins/logstash-input-syslog#15 (comment))
This issue has been open for over 5 years. Any chance logstash will get native support for RFC 5424 soonish? The RFC will literally be older than our interns soon, and parts of our equipment use RFC 5424 only :| |
@Trolldemorted Just my two cents. Possibly look at alternative solutions for parsing (RFC 5424) logs. Some that come to my mind: |
Yes, exactly! I ended up using rsyslog. Once you work past it's niche config syntax (bit of a learning curve!), it's able to output syslog as JSON. It also has a an elasticsearch or kafka module for output. My workaround was rsyslog in a container (https://github.com/JPvRiel/docker-rsyslog), but I also used kafka since logstash buffering/queuing support wasn't a feature until fairly recently. From this example, I learnt rsyslog has mature and performant syslog handling features (consumes much less CPU compared to logstash!), including parsing both RFC3164, RFC5424 and being able to deal with odd legacy operating systems like Solaris and AIX, neither of which follow the RFCs nicely. |
Is there something fundamentally hard about implementing this natively in Logstash? RFC5424 is widespread and the workarounds all seem pretty ugly. Surprised no contributors weighing in on this since @jordansissel opened the issue ... |
While the RFC is widespread, you'll be surprised at how many vendors don't properly follow it and add special quirks. For teams who have just one or two well formatted syslog sources properly formatted, sure, a better native RFC5424 module support would be helpful. But try do this at scale in enterprise with odd things like Solaris, AIX, security devices that don't even follow RFC3164 nicely, etc and you'll find that the 10+ years of hacking done in rsyslog can handle most situations with performant C code much better than any attempt to re-implement all that quirky logic in logstash. Weather implemented in Java (or worse JRuby), I'm doubting it's going to be quick or compete with rsyslog in terms of performance. Rsyslog also has an Again, too bad rsyslog custom config syntax is even worse than Logstash's config / domain specific language... Hopefully some day the legacy way of logging with syslog dies (it's over 20 years old now) and more mature JSON-based formats with HTTP event collection patterns emerge. Until then, you'll be hard pressed to find a better fully open source implementation than rsyslog. |
I fully agree …
Hands down, look at http://vector.dev/, it is not yet feature complete with Logstash but we already replaced parts of what was previously done with Logstash. It can even replace most of the Beats products like Filebeat. |
(This issue was originally filed by @suyograo at elastic/logstash#1667)
Logstash has the syslog input which only supports messages in RFC3164 (with some modifications). It would be useful to add a codec which supports RFC5424 messages which could be used with inputs like TCP. With this support users would not have to use a grok filter with
SYSLOG5424LINE
patternThe text was updated successfully, but these errors were encountered: