Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add elastic search ingester module #82

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

FUSAKLA
Copy link
Contributor

@FUSAKLA FUSAKLA commented Jan 28, 2022

Adding new module with support to read events from elastic search.
Tested locally and looks ok and working just right.

The module remembers the last timestamp from each log (user needs to specify which field holds the timestamp and its format).
It periodically (interval configurable) queries elastic for max X documents(also configurable) since the last timestamp.
So if slow of very much behind, it should be catching up until reading the most recent documents.

It converts every key in the root of the document from elastic to a key in the event. If its value is printed using fmt.Sprint so if there is some nested JSON it will be rendered as a string map representation for example. In case user does not use structured logging, it supports parsing of the field with the source log (configurable) same way the tailer module does(specifying regular expression with named groups)

still to be done:

  • tests
  • more documentation

@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented Jan 28, 2022

@david-vavra PTAL

@FUSAKLA FUSAKLA force-pushed the fus-elastic-ingester branch from b977bb1 to 33d8209 Compare January 28, 2022 02:09
@FUSAKLA FUSAKLA requested a review from david-vavra January 28, 2022 08:54
@FUSAKLA FUSAKLA force-pushed the fus-elastic-ingester branch from 33d8209 to 7c3c671 Compare January 28, 2022 14:32
@FUSAKLA FUSAKLA force-pushed the fus-elastic-ingester branch from 7c3c671 to bc3bfef Compare January 31, 2022 23:11
@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented Jan 31, 2022

Ok, some tests are there and documentation also, I think it's gtg

@FUSAKLA FUSAKLA marked this pull request as ready for review January 31, 2022 23:23
@FUSAKLA FUSAKLA requested a review from rudo-thomas March 17, 2022 09:17
@FUSAKLA
Copy link
Contributor Author

FUSAKLA commented Mar 17, 2022

@rudo-thomas PTAL, and please feel free to rewrite my czenglish in the documentation if not clear enough 🙏

@@ -5,6 +5,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Unreleased
### Added
- [#82](https://github.com/seznam/slo-exporter/pull/82) New module `elasticSerachIngester`, for more info see [the docs](./docs/modules/elasticsearch_ingester.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [#82](https://github.com/seznam/slo-exporter/pull/82) New module `elasticSerachIngester`, for more info see [the docs](./docs/modules/elasticsearch_ingester.md)
- [#82](https://github.com/seznam/slo-exporter/pull/82) New module `elasticSearchIngester`, for more info see [the docs](./docs/modules/elasticsearch_ingester.md)


```yaml
# OPTIONAL Debug logging
debug: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once I checked what this option actually does, I would prefer it to be refactored to sth like - ES client log level.

timeout: "5s"
# Enable/disable sniffing, autodiscovery of other nodes in Elasticsearch cluster
sniffing: true
# Enable/disable healtchecking of the Elasticsearch nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Enable/disable healtchecking of the Elasticsearch nodes
# Enable/disable healthchecking of the Elasticsearch nodes

# REQUIRED Document filed name containing a timestamp of the event
timestampField: "@timestamp"
# REQUIRED Golang time parse format to parse the timestamp from the timestampField
timestampFormat: "2006-01-02T15:04:05Z07:00" # See # https://www.geeksforgeeks.org/time-formatting-in-golang/ for common examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's reference something more official here :)

https://pkg.go.dev/time#Time.Format

}

var clientCaCertPool *x509.CertPool
if config.CaCertFile != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't if I understand the TLSClientConfig correctly, but this seems rather confusing to me.

I suggest to refactor this to one of the two:

  • either append config.CaCertFile to both TLSClientConfig.ClientCAs and TLSClientConfig.RootCAs
  • or use explicit config options so that it clear how config.CaCertFile is applied. This mean dropping config.CaCertFile and replacing it with something like config.clientCaCertFile, config.serverCaCertFile

if config.Debug {
opts = append(opts, elasticV7.SetTraceLog(logger), elasticV7.SetInfoLog(logger))
}
if config.Username != "" || config.Password != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not this be mutual exclusive to tls auth?

client *elasticV7.Client
}

func (v *v7Client) RangeSearch(ctx context.Context, index, timestampField string, since time.Time, size int, query string, timeout time.Duration) ([]json.RawMessage, int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we are not really consistent with this across the slo-exporter code, but please add documention for the public methods.

Help: "Timestamp of the last processed document, next fetch will read since this timestamp",
})
missingRawLogField = prometheus.NewCounter(prometheus.CounterOpts{
Name: "missing_raw_log_filed_total",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Name: "missing_raw_log_filed_total",
Name: "missing_raw_log_field_total",

fields map[string]string
}

type tailer struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the fields. At least/especially timestamp*, rawLog*.

}
}

timeFiled, ok := newDoc.fields[t.timestampField]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> timeField

@david-vavra
Copy link
Contributor

Just some nits, documentation and typos fix needed. Otherwise ok, gj. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants