Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException on /_prometheus/metrics with Elasticsearch 7.8.0 #295

Open
chaitrahegde115 opened this issue Oct 29, 2020 · 12 comments

Comments

@chaitrahegde115
Copy link

chaitrahegde115 commented Oct 29, 2020

Hi,
I am using Elasticsearch 7.8.0 and elasticsearch-prometheus plugin 7.8.0.0.
I can see many warning logs in the Elasticsearch w.r.t NullPointerException. Below are the logs.

`"log":"path: /_prometheus/metrics, params: {}"
java.lang.NullPointerException: null
        at org.compuscene.metrics.prometheus.PrometheusMetricsCollector.updatePerIndexContextMetrics(PrometheusMetricsCollector.java:468) ~[?:?]
        at org.compuscene.metrics.prometheus.PrometheusMetricsCollector.updatePerIndexMetrics(PrometheusMetricsCollector.java:461) ~[?:?]
        at org.compuscene.metrics.prometheus.PrometheusMetricsCollector.updateMetrics(PrometheusMetricsCollector.java:947) ~[?:?]
        at org.elasticsearch.rest.prometheus.RestPrometheusMetricsAction$1.buildResponse(RestPrometheusMetricsAction.java:97) ~[?:?]
        at org.elasticsearch.rest.prometheus.RestPrometheusMetricsAction$1.buildResponse(RestPrometheusMetricsAction.java:80) ~[?:?]
        at org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:89) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction.gatherRequests(TransportNodePrometheusMetricsAction.java:127) [prometheus-exporter-7.8.0.0.jar:7.8.0.0]
        at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction.access$400(TransportNodePrometheusMetricsAction.java:76) [prometheus-exporter-7.8.0.0.jar:7.8.0.0]
        at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction$1.onResponse(TransportNodePrometheusMetricsAction.java:136) [prometheus-exporter-7.8.0.0.jar:7.8.0.0]
        at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction$1.onResponse(TransportNodePrometheusMetricsAction.java:132) [prometheus-exporter-7.8.0.0.jar:7.8.0.0]
        at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:89) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.TransportService$6.handleResponse(TransportService.java:633) [elasticsearch-7.8.0.jar:7.8.0]
        at com.floragunn.searchguard.transport.SearchGuardInterceptor$RestoringTransportResponseHandler.handleResponse(SearchGuardInterceptor.java:259) [search-guard-suite-security-7.8.0-43.0.0-144.jar:7.8.0-43.0.0-144]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1163) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:213) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:226) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundHandler.handleResponse(InboundHandler.java:205) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:117) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:78) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:692) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82) [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:73) [transport-netty4-client-7.8.0.jar:7.8.0]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1518) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1267) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1314) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.49.Final.jar:4.1.49.Final]
        at java.lang.Thread.run(Thread.java:834) [?:?]`
@lukas-vlcek
Copy link
Collaborator

How large and busy is your cluster? Can this be because internal metric requests timeout? Do you see any timeout related logs in ES log?

@chaitrahegde115
Copy link
Author

chaitrahegde115 commented Nov 2, 2020

Hi,
The cluster is not too busy as I am running some automated test cases in pipeline.
I don't have any other logs apart from shared above.

@chaitrahegde115
Copy link
Author

Hi,
Is there any update?

@jamshid
Copy link

jamshid commented May 11, 2021

FWIW I also get it on Elasticsearch 7.5.2:

[2021-05-11T15:21:48,063][WARN ][r.suppressed             ] [ivory4.X] path: /_prometheus/metrics, params: {}
java.lang.NullPointerException: null
        at org.compuscene.metrics.prometheus.PrometheusMetricsCollector.updatePerIndexContextMetrics(PrometheusMetricsCollector.java:468) ~[?:?]
        at org.compuscene.metrics.prometheus.PrometheusMetricsCollector.updatePerIndexMetrics(PrometheusMetricsCollector.java:461) ~[?:?]
        at org.compuscene.metrics.prometheus.PrometheusMetricsCollector.updateMetrics(PrometheusMetricsCollector.java:947) ~[?:?]
        at org.elasticsearch.rest.prometheus.RestPrometheusMetricsAction$1.buildResponse(RestPrometheusMetricsAction.java:87) ~[?:?]
        at org.elasticsearch.rest.prometheus.RestPrometheusMetricsAction$1.buildResponse(RestPrometheusMetricsAction.java:70) ~[?:?]
        at org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37) ~[elasticsearch-7.5.2.jar:7.5.2]
        at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47) [elasticsearch-7.5.2.jar:7.5.2]
        at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:70) [elasticsearch-7.5.2.jar:7.5.2]
        at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:64) [elasticsearch-7.5.2.jar:7.5.2]
        at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction.gatherRequests(TransportNodePrometheusMetricsAction.java:127) [prometheus-exporter-7.5.2.0.jar:7.5.2.0]
        at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction.access$400(TransportNodePrometheusMetricsAction.java:76) [prometheus-exporter-7.5.2.0.jar:7.5.2.0]
        at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction$1.onResponse(TransportNodePrometheusMetricsAction.java:136) [prometheus-exporter-7.5.2.0.jar:7.5.2.0]
        at org.elasticsearch.action.TransportNodePrometheusMetricsAction$AsyncAction$1.onResponse(TransportNodePrometheusMetricsAction.java:132) [prometheus-exporter-7.5.2.0.jar:7.5.2.0]
...

I've always been confused by elasticsearch prometheus exporters and grafana dashboards, I guess there is no official one? This one also seems popular, I wish I knew if it were worth switching:
https://github.com/justwatchcom/elasticsearch_exporter

@lukas-vlcek
Copy link
Collaborator

lukas-vlcek commented May 12, 2021

@jamshid Did you change the prometheus.indices settings in any way? Right now it is true by default, have you changed that to false?

Can you share output of:

curl <elasticsearch>/_cluster/settings?include_defaults=true&filter_path=defaults.prometheus
curl <elasticsearch>/_cluster/settings?include_defaults=true&filter_path=transient.prometheus
curl <elasticsearch>/_cluster/settings?include_defaults=true&filter_path=persistent.prometheus

@jamshid
Copy link

jamshid commented May 13, 2021

Sure, thanks:

curl 'ivory4:9200/_cluster/settings?include_defaults=true&filter_path=defaults.prometheus'   \
     'ivory4:9200/_cluster/settings?include_defaults=true&filter_path=transient.prometheus'  \
     'ivory4:9200/_cluster/settings?include_defaults=true&filter_path=persistent.prometheus'
{"defaults":{"prometheus":{"cluster":{"settings":"true"},"indices":"true"}}}
{}
{}

FWIW it doesn't happen all the time, sometimes 5 times in a half hour, sometimes twice within a minute.

@rafis
Copy link

rafis commented Jul 13, 2021

@jamshid @chaitrahegde115 What scrape_interval do you use? Can increasing scrape_interval help?
@lukas-vlcek Any update? Seems like we all have default settings and from average to high load on elasticsearch.

@Mohitj252
Copy link

Mohitj252 commented Nov 10, 2021

I am also facing the same issue, I was going through the code (https://github.com/vvanholl/elasticsearch-prometheus-exporter/blob/7.8.0.0/src/main/java/org/compuscene/metrics/prometheus/PrometheusMetricsCollector.java) there I found that idx.getDocs() can return null (based on the code observation), so getCount() on that can cause nullpointerexception. Please help me how can I fix this issue. FYI this issue is coming intermittently.
image

@lukas-vlcek
Copy link
Collaborator

@Mohitj252 Interesting. Quick workaround would be to wrap the entire line (no. 468) with something like the following:

try { ... } catch (Exception e) { log.warn("Skipping index_doc_number metric", e); }

However, that would result in missing metric in Prometheus sample in case of exception.
Better would be checking how call to getCount() is handled in Elasticsearch AL2 code.

@Mohitj252
Copy link

Mohitj252 commented Nov 11, 2021

Hi @lukas-vlcek , adding try catch will not solve our problem, better we can focus on handling condition for getDocs, if getDocs is null then we create new object (Just an idea) and pass all the required info to the new created object. what is your opinion on this approach ?
2. If you think, this is an issue in the elastic search then can you please fix the issue in the upcoming elastic version.
Thanks in advance.

@lukas-vlcek
Copy link
Collaborator

Oh sorry, I was misreading what you said. In that case I think it makes sense to do something like

if (idx.getDocs() != null) {
   //handle all metrics that require getDocs()
}

Why the getDocs() can be null is another topic. I can not tell from the top of my head if it can be "fixed" directly in Elasticsearch without knowing the reason behind.

@Mohitj252
Copy link

Hi @lukas-vlcek , can you please check and discuss the issue with elastic team to understand why the issue is coming intermittently and how can I overcome from this null pointer exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants