[BUG] Egeria Metadata Server crashes silently while processing massive amount of elements #235

juergenhemelt · 2022-08-16T12:25:00Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

While processing 500+ elements (KafkaTopics from StrimziMonitorIntegrationConnector) the Egeria Metadata Server crashes silently. Last few lines of log:

Tue Aug 16 12:02:43 GMT 2022 mds1 Event OMRS-AUDIT-8009 The Open Metadata Repository Services (OMRS) has sent event of type Instance Event to the cohort topic egeria.omag.openmetadata.repositoryservices.cohort.mds.OMRSTopic.instances
Tue Aug 16 12:02:43 GMT 2022 mds1 Event OMAS-SECURITY-MANAGER-0013 The Security Manager Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:46 GMT 2022 mds1 Event OMRS-AUDIT-8009 The Open Metadata Repository Services (OMRS) has sent event of type Instance Event to the cohort topic mds1.openmetadata.repositoryservices.enterprise.mds1.OMRSTopic
Tue Aug 16 12:02:46 GMT 2022 mds1 Event OMRS-AUDIT-8009 The Open Metadata Repository Services (OMRS) has sent event of type Instance Event to the cohort topic egeria.omag.openmetadata.repositoryservices.cohort.mds.OMRSTopic.instances
Tue Aug 16 12:02:46 GMT 2022 mds1 Event OMAS-GOVERNANCE-ENGINE-0019 The Governance Engine Open Metadata Access Service (OMAS) has sent event of type Metadata change event
Tue Aug 16 12:02:46 GMT 2022 mds1 Information OMAS-GOVERNANCE-ENGINE-0017 The Governance Engine Open Metadata Access Service (OMAS) sent a metadata change event to listening Open Watchdog Governance Action Services: New Relationship
Tue Aug 16 12:02:46 GMT 2022 mds1 Event OMAS-DATA-MANAGER-0013 The Data Manager Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:46 GMT 2022 mds1 Event OMAS-ASSET-MANAGER-0013 The Asset Manager Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:48 GMT 2022 mds1 Event OMRS-AUDIT-8009 The Open Metadata Repository Services (OMRS) has sent event of type Instance Event to the cohort topic mds1.openmetadata.repositoryservices.enterprise.mds1.OMRSTopic
Tue Aug 16 12:02:48 GMT 2022 mds1 Event OMAS-STEWARDSHIP-ACTION-0012 The Stewardship Action Open Metadata Access Service (OMAS) has sent event of type: Element Declassified
Tue Aug 16 12:02:48 GMT 2022 mds1 Event OMRS-AUDIT-8009 The Open Metadata Repository Services (OMRS) has sent event of type Instance Event to the cohort topic egeria.omag.openmetadata.repositoryservices.cohort.mds.OMRSTopic.instances
Tue Aug 16 12:02:48 GMT 2022 mds1 Event OMAS-IT-INFRASTRUCTURE-0013 The IT Infrastructure Open Metadata Access Service (OMAS) has sent event of type: Element Declassified
Tue Aug 16 12:02:48 GMT 2022 mds1 Event OMAS-ASSET-CONSUMER-0012 The Asset Consumer Open Metadata Access Service (OMAS) has sent event of type: Element Declassified
Tue Aug 16 12:02:48 GMT 2022 mds1 Event OMAS-GOVERNANCE-ENGINE-0019 The Governance Engine Open Metadata Access Service (OMAS) has sent event of type Metadata change event
Tue Aug 16 12:02:48 GMT 2022 mds1 Information OMAS-GOVERNANCE-ENGINE-0017 The Governance Engine Open Metadata Access Service (OMAS) sent a metadata change event to listening Open Watchdog Governance Action Services: New Classification
Tue Aug 16 12:02:48 GMT 2022 mds1 Event OMAS-DIGITAL-ARCHITECTURE-0012 The Digital Architecture Open Metadata Access Service (OMAS) has sent event of type: Element Declassified
Tue Aug 16 12:02:48 GMT 2022 mds1 Event OMAS-SECURITY-MANAGER-0013 The Security Manager Open Metadata Access Service (OMAS) has sent event of type: Element Declassified
Tue Aug 16 12:02:49 GMT 2022 mds1 Event OMAS-ASSET-OWNER-0012 The Asset Owner Open Metadata Access Service (OMAS) has sent event of type: Element Declassified
Tue Aug 16 12:02:49 GMT 2022 mds1 Event OMAS-DATA-MANAGER-0013 The Data Manager Open Metadata Access Service (OMAS) has sent event of type: Element Classified
Tue Aug 16 12:02:49 GMT 2022 mds1 Event OMAS-ASSET-MANAGER-0013 The Asset Manager Open Metadata Access Service (OMAS) has sent event of type: Element Declassified
Tue Aug 16 12:02:49 GMT 2022 mds1 Event OMAS-STEWARDSHIP-ACTION-0012 The Stewardship Action Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:49 GMT 2022 mds1 Event OMAS-IT-INFRASTRUCTURE-0013 The IT Infrastructure Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:49 GMT 2022 mds1 Event OMAS-ASSET-CONSUMER-0012 The Asset Consumer Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:49 GMT 2022 mds1 Event OMAS-DIGITAL-ARCHITECTURE-0012 The Digital Architecture Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:49 GMT 2022 mds1 Event OMAS-ASSET-OWNER-0012 The Asset Owner Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:49 GMT 2022 mds1 Event OMAS-DATA-MANAGER-0013 The Data Manager Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:49 GMT 2022 mds1 Event OMAS-ASSET-MANAGER-0013 The Asset Manager Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:50 GMT 2022 mds1 Event OMRS-AUDIT-8009 The Open Metadata Repository Services (OMRS) has sent event of type Instance Event to the cohort topic mds1.openmetadata.repositoryservices.enterprise.mds1.OMRSTopic
Tue Aug 16 12:02:50 GMT 2022 mds1 Event OMAS-GOVERNANCE-ENGINE-0019 The Governance Engine Open Metadata Access Service (OMAS) has sent event of type Metadata change event
Tue Aug 16 12:02:50 GMT 2022 mds1 Information OMAS-GOVERNANCE-ENGINE-0017 The Governance Engine Open Metadata Access Service (OMAS) sent a metadata change event to listening Open Watchdog Governance Action Services: New Element
Tue Aug 16 12:02:50 GMT 2022 mds1 Event OMRS-AUDIT-8009 The Open Metadata Repository Services (OMRS) has sent event of type Instance Event to the cohort topic egeria.omag.openmetadata.repositoryservices.cohort.mds.OMRSTopic.instances
Tue Aug 16 12:02:50 GMT 2022 mds1 Event OMAS-SECURITY-MANAGER-0013 The Security Manager Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:53 GMT 2022 mds1 Event OMRS-AUDIT-8009 The Open Metadata Repository Services (OMRS) has sent event of type Instance Event to the cohort topic mds1.openmetadata.repositoryservices.enterprise.mds1.OMRSTopic
Tue Aug 16 12:02:53 GMT 2022 mds1 Event OMRS-AUDIT-8009 The Open Metadata Repository Services (OMRS) has sent event of type Instance Event to the cohort topic egeria.omag.openmetadata.repositoryservices.cohort.mds.OMRSTopic.instances
Tue Aug 16 12:02:53 GMT 2022 mds1 Event OMAS-GOVERNANCE-ENGINE-0019 The Governance Engine Open Metadata Access Service (OMAS) has sent event of type Metadata change event
Tue Aug 16 12:02:53 GMT 2022 mds1 Information OMAS-GOVERNANCE-ENGINE-0017 The Governance Engine Open Metadata Access Service (OMAS) sent a metadata change event to listening Open Watchdog Governance Action Services: New Relationship
Tue Aug 16 12:02:53 GMT 2022 mds1 Event OMAS-DATA-MANAGER-0013 The Data Manager Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:53 GMT 2022 mds1 Event OMAS-ASSET-MANAGER-0013 The Asset Manager Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:53 GMT 2022 mds1 Event OMAS-STEWARDSHIP-ACTION-0012 The Stewardship Action Open Metadata Access Service (OMAS) has sent event of type: Element Declassified
Tue Aug 16 12:02:53 GMT 2022 mds1 Event OMAS-IT-INFRASTRUCTURE-0013 The IT Infrastructure Open Metadata Access Service (OMAS) has sent event of type: Element Declassified
Tue Aug 16 12:02:53 GMT 2022 mds1 Event OMAS-ASSET-CONSUMER-0012 The Asset Consumer Open Metadata Access Service (OMAS) has sent event of type: Element Declassified
Tue Aug 16 12:02:53 GMT 2022 mds1 Event OMAS-DIGITAL-ARCHITECTURE-0012 The Digital Architecture Open Metadata Access Service (OMAS) has sent event of type: Element Declassified
Tue Aug 16 12:02:53 GMT 2022 mds1 Event OMAS-GOVERNANCE-ENGINE-0019 The Governance Engine Open Metadata Access Service (OMAS) has sent event of type Metadata change event
Tue Aug 16 12:02:53 GMT 2022 mds1 Information OMAS-GOVERNANCE-ENGINE-0017 The Governance Engine Open Metadata Access Service (OMAS) sent a metadata change event to listening Open Watchdog Governance Action Services: New Classification
Tue Aug 16 12:02:53 GMT 2022 mds1 Event OMAS-ASSET-OWNER-0012 The Asset Owner Open Metadata Access Service (OMAS) has sent event of type: Element Declassified
Tue Aug 16 12:02:53 GMT 2022 mds1 Event OMAS-SECURITY-MANAGER-0013 The Security Manager Open Metadata Access Service (OMAS) has sent event of type: Element Declassified
Tue Aug 16 12:02:53 GMT 2022 mds1 Event OMAS-DATA-MANAGER-0013 The Data Manager Open Metadata Access Service (OMAS) has sent event of type: Element Classified
Tue Aug 16 12:02:53 GMT 2022 mds1 Event OMAS-ASSET-MANAGER-0013 The Asset Manager Open Metadata Access Service (OMAS) has sent event of type: Element Declassified
Tue Aug 16 12:02:55 GMT 2022 mds1 Event OMRS-AUDIT-8009 The Open Metadata Repository Services (OMRS) has sent event of type Instance Event to the cohort topic mds1.openmetadata.repositoryservices.enterprise.mds1.OMRSTopic
Tue Aug 16 12:02:55 GMT 2022 mds1 Event OMRS-AUDIT-8009 The Open Metadata Repository Services (OMRS) has sent event of type Instance Event to the cohort topic egeria.omag.openmetadata.repositoryservices.cohort.mds.OMRSTopic.instances
Tue Aug 16 12:02:55 GMT 2022 mds1 Event OMAS-IT-INFRASTRUCTURE-0013 The IT Infrastructure Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:55 GMT 2022 mds1 Event OMAS-STEWARDSHIP-ACTION-0012 The Stewardship Action Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:55 GMT 2022 mds1 Event OMAS-DIGITAL-ARCHITECTURE-0012 The Digital Architecture Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:55 GMT 2022 mds1 Event OMAS-ASSET-CONSUMER-0012 The Asset Consumer Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:55 GMT 2022 mds1 Event OMAS-ASSET-OWNER-0012 The Asset Owner Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:55 GMT 2022 mds1 Event OMAS-DATA-MANAGER-0013 The Data Manager Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:55 GMT 2022 mds1 Event OMAS-ASSET-MANAGER-0013 The Asset Manager Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:57 GMT 2022 mds1 Event OMAS-GOVERNANCE-ENGINE-0019 The Governance Engine Open Metadata Access Service (OMAS) has sent event of type Metadata change event
Tue Aug 16 12:02:57 GMT 2022 mds1 Information OMAS-GOVERNANCE-ENGINE-0017 The Governance Engine Open Metadata Access Service (OMAS) sent a metadata change event to listening Open Watchdog Governance Action Services: New Element
Tue Aug 16 12:02:57 GMT 2022 mds1 Event OMAS-SECURITY-MANAGER-0013 The Security Manager Open Metadata Access Service (OMAS) has sent event of type: New Element
Tue Aug 16 12:02:58 GMT 2022 mds1 Event OMRS-AUDIT-8009 The Open Metadata Repository Services (OMRS) has sent event of type Instance Event to the cohort topic mds1.openmetadata.repositoryservices.enterprise.mds1.OMRSTopic
Tue Aug 16 12:02:58 GMT 2022 mds1 Event OMRS-AUDIT-8009 The Open Metadata Repository Services (OMRS) has sent event of type Instance Event to the cohort topic egeria.omag.openmetadata.repositoryservices.cohort.mds.OMRSTopic.instances

Expected Behavior

Should have processed every KafkaTopic element.

Steps To Reproduce

Install metadata server and view server
Configure StrimziMonitorIntegrationConnector so that it connects to a Strimzi installation with 500+ topics
Start StrimziMonitorIntegrationConnector
Wait about 40 minutes

Environment

- Egeria:3.10
- OS: Kubernetes (Openshift 4.9)
- Java: 11.x

Any Further Information?

No response

The text was updated successfully, but these errors were encountered:

planetf1 · 2022-08-16T17:34:53Z

Do you have further indications of what may have happened.

Was anything recorded in the statefullset or pod stats (kubectl describe)

Was the topic connector the only workload?

planetf1 · 2022-08-16T17:37:26Z

Also, how was the behaviour after the crash? Did the system recover? Was any data lost? Did the connector resume retrieving topics?

planetf1 · 2022-08-16T17:54:16Z

For example, I had an issue with my Jupyter container (a bug) which led to the container restarting and I'd see something like:

Containers:
  jupyter:
    Container ID:   cri-o://3e6a2f6d264f9e8c6f052f23e49cb2da6a1cf35c869104c0686da689ffd2604f
    Image:          quay.io/odpi/jupyter:lab-3.4.5
    Image ID:       quay.io/odpi/jupyter@sha256:34f7ddbdebf759e4121e25c1ebe06929a118346ecd8a3ea06dbd17ca75fb6c42
    Port:           8888/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 15 Aug 2022 17:47:49 +0100
    Last State:     Terminated
      Reason:       Error
      Exit Code:    128
      Started:      Mon, 15 Aug 2022 17:47:46 +0100
      Finished:     Mon, 15 Aug 2022 17:47:48 +0100
    Ready:          True
    Restart Count:  1

I'm not sure how useful it would be, but the exit code may help us confirm running out of heap (for example) could be the cause.

What about the pod itself, rather than kubectl logs how about kubectl logs --previous - is there any extra information there?

planetf1 · 2022-08-16T17:58:18Z

If none of that helps, we probably either need to try and

examine code to guess failure
analyse/debug, check allocations when integration connector working normally, via an agent (local or in k8s)
capture a heap on failure -- complicated by the fact we'd need to write some k8s logic to ensure we store/copy the dump to a persistent volume
enable debug (in case this helps. May not...)

planetf1 · 2022-08-17T08:05:49Z

Just checking the log again - I presume this is the metadata server -- not where you are running the integration connector?

What repository connector are you using? I'm guessing it could be XTDB?

If we can't get any info from the suggestions above quickly, I wonder if it's worth testing also with local graph or in-memory - that wouldn't completely rule a core issue out (as this could relate to timing, concurrent requests etc) but may give some extra insight.

if it is xtdb, what's the xtdb configuration like?

planetf1 · 2022-08-17T08:11:21Z

We have limited licenses for 'yourkit' java profiler (there are other tools available including using intellij+async profiler for example), which can help in pinning down memory issues, may be an option if the other approaches don't get us answers.

juergenhemelt · 2022-08-18T10:10:45Z

We use a local Janusgraph Repository. On the first restart after the crash, the server is unable to connect to the JanusGraph. This looks like a problem with JanusGraph. The oc describe does not show any problems. The previous log file is the one I attached to this issue. I will try to reproduce the error and check the events of the Stateful Set and the Pod.

Would you recommend using XTDB instead of JanusGraph?

planetf1 · 2022-08-18T15:36:01Z

@juergenhemelt I think generally we would suggest xtdb - it looks like a more scalable, reliable approach, and some performance analysis has been done on it.

That being said there's no evidence one way or the other yet to suggest your issue is specific to graph, we don't know. So whilst you could try it, equally it may be good to understand why it broken even with graph.

Were you using the default JanusGraph setup (uses berkeleydb)?

I think capturing the data would help, interesting to see if we can figure out if, for example, the jvm failed due to heap exhaustion. Or perhaps openshift killed it due to resource - perhaps you could also check the openshift dashboard to see what it's reporting?

I'd be interesting in trying to reproduce, though I may struggle to do this until later next week. I presume you just pointed your topic integrator at a strimzi deployment hosting lots of topics? Do you have any assets to share that may speed me trying to reproduce? (server config, chart config etc)?

planetf1 · 2022-08-18T15:37:19Z

I'm also intrigued by the fact you said the graph failed to work after the first restart?

I think you are using customised helm charts? It's possible a problem there, or perhaps not. Do you have any logs from this failure to work on restart? For example if the metadata collection id stored in the server config document gets out of sync with the actual repository (graph) then egeria will throw an error and, at least in >=3.10, fail to start the server

planetf1 · 2022-08-18T15:49:07Z

This is an interesting kind of issue - how do we debug apparently the jvm going away. If the cause isn't external and anything shows up in k8s log, I wonder if we need to do something like:

set jvm flag to capture heapdump
modify the helm charts so that not only is the flag set, but the output either goes to, or is copied to, a persistent volume
we can then retrospectively analyze the heap dump

^ I'll raise an issue on this

OR maybe we can somehow modify the startup script of the container so that it does not exit (as you'd normally want) on this error condition, so that we could then interactively connect and analyze

But it still might be easier to just connect to the JVM using an agent, which should give us similar information (ie using YourKit profiler or similar)

For your issue another alternative might be to try and run just the mds server locally

juergenhemelt · 2022-08-19T14:20:27Z

I have just switched to XTDB in-memory. The good news is as the performance is tremendously better I can test faster than before. The bad news is that I still have the same problem. I will continue analyzing.

planetf1 · 2022-08-19T14:59:14Z

Good to hear the performance is better.
I'm also completing/fixing/merging #127 which is for our lab demo (notebook) chart to allow spec of additional connectors. I'll look at same in base. This would help me/us setting up a strimzi connector environment, and something I've intended to do for a while.

Please do share any data you capture especially from 'describe' against the k8s objects

juergenhemelt · 2022-08-22T10:53:18Z

In deed the oc describeof the pod gives a lastState:

    State:          Running
      Started:      Mon, 22 Aug 2022 10:49:56 +0200
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Mon, 22 Aug 2022 10:42:06 +0200
      Finished:     Mon, 22 Aug 2022 10:49:55 +0200

Obviously an out of memory issue.

juergenhemelt · 2022-08-23T06:12:51Z

The ressource limits in Kubernetes were at 1GB memory. I increased it and now it works. For the 547 topics it requires 3.4GB memory. I think that's expected behaviour as I am still on in-memory XTDB.

planetf1 · 2022-08-24T08:18:11Z

Did you have explicit 'resources' entries in your modified charts? Is that where you were setting the 1GB limit.
The standard charts don't specify a limit by default

When the pod restarted, did it continue processing ok?. In a k8s environment we need to expect pods to fail - for example nodes may fail or be rebooted, and the pod rescheduled elsewhere. It's important processing then continues normally - including resuming where it left off.

Are there any aspects of that which is problematic with the testing you have done?
What happens if you go far beyond 500 topics?
I am also thinking if our jam limits are set such that a failure could occur before aggressive garbage collection has kicked in. I can't see any explicit setting of heap in the container, though we do set MaxMetaSpaceSize to 1G -- it's extremely unlike we could get close to that, but may be worth checking./removing. By default heap is 1/2 physical ram, which should be the allocated size of the container

I think this aspect is worth pursuing - we'd mostly expect java heap to be exceeded (for example if we had an effective memory leak) with an exception before the allocated memory on the container ran out ? Need to look if we following best practice and/or how to do better.

juergenhemelt · 2022-08-24T10:20:36Z

Yes, I have explicitly set resource limits. This is a policy in my organization. It was set to 1GB. I now switched from XTDB in-memory to RocksDB persistence. That brings the used memory down again to less than 1GB. So the in-memory XTDB was the one which took all the memory. It could be good to check that XTDB is not using more than the available memory and log an exception then. But I can imagine that this is not easy to implement.
For me this issue is fixed and you could close it.

planetf1 · 2022-08-24T13:13:10Z

If we get killed by k8s for exceeding limits there's nothing we can do in the java application -- only ensure recovery through the k8s deployment. If we hit a memory exception (ie out of heap) in the jvm there's not much we can do (especially anything that allocates memory). A output message, Maybe (and agree).

I'll keep the issue open to look into this behaviour. there may be something about the container / image we use/ settings that are allowing the jvm to continue unhindered (ie not running aggressive GC) yet where physical is restricted. could we do better. I don't know, but I see it as an important question for production. There's also questions like monitoring our heap best practice, and which gc algorithm. We also need to check ops continue normally after the pod is killed and restarted elsewhere, including for example how we handle commits to the Kafka offset.

Likely a combination of understand, document, improve.

I'll modify the title

github-actions · 2022-11-30T00:39:27Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

github-actions · 2023-02-20T00:40:50Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

dwolfson · 2023-03-02T19:22:49Z

I am seeing similar (if not the same) behavior when running the Lab Helm charts with the notebook "working-with-standard-models". In particular, if I run with the in-memory repository, the pod silently fails. I am using Rancher-Desktop with an allocation of 14gb and 4 cores. When I switch to the graphDB repository, it usually works but occasionally dies (happened once or twice). I suspect that the JVM may need more resource?

planetf1 · 2023-03-09T15:58:26Z

I'm fairly sure the orchestration environment is killing our jvm - hence why we see no last runtime exception from the jvm.

I was wondering what memory the jvm is seeing

In java 8 timeframe this was an issue. See https://docs.openshift.com/container-platform/3.11/dev_guide/application_memory_sizing.html for example, where it was essential to set flags like

-XX:+UnlockExperimentalVMOptions \
  -XX:+UseCGroupMemoryLimitForHeap \

Moving forward in jvm options, we now have better options. There's now a flag UseContainerSupport, though this is enabled by default. Also we can then control with InitialRAMPercentage, MinRAMPercentage, and MaxRAMPercentage. These all work off the memory available to the container

But we also had a change in the kernel from cgroup1 to cgroups2, and it was only Java 15 that added support for cgroups2. I expect Openshift & other modern distros are using kernels with that.

So we're then back to the pre-container days of incorrect memory calculatinos.

From Java 15 onwards this seems corrected.

So.. now that we are with v4 of egeria, and using 17, it maybe these issues have been addressed by default.

@dwolfson were you running with 3.15 or 4? I wonder if both fail in the same way

planetf1 · 2023-03-09T15:58:50Z

Useful article - https://developers.redhat.com/articles/2022/04/19/java-17-whats-new-openjdks-container-awareness

planetf1 · 2023-03-09T16:18:16Z

Checking some real environments

Egeria 3.15, in openshift, 16GB node, default resource limits

sh-4.4$ java -XshowSettings:system -version
Operating System Metrics:
    Provider: cgroupv1
    Effective CPU Count: 4
    CPU Period: 100000us
    CPU Quota: -1
    CPU Shares: 2us
    List of Processors, 4 total:
    0 1 2 3
    List of Effective Processors, 4 total:
    0 1 2 3
    List of Memory Nodes, 2 total:
    0 1
    List of Available Memory Nodes, 2 total:
    0 1
    Memory Limit: 13.08G
    Memory Soft Limit: Unlimited
    Memory & Swap Limit: Unlimited
    Maximum Processes Limit: 44753

openjdk version "11.0.18" 2023-01-17 LTS
OpenJDK Runtime Environment (Red_Hat-11.0.18.0.10-2.el8_7) (build 11.0.18+10-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.18.0.10-2.el8_7) (build 11.0.18+10-LTS, mixed mode, sharing)

This may mean the problem is simpler - we're not setting a container size limit at all -- so it really can be as big as available on a node -- the subtlety above isn't relevant. And each of our processes will see the same, so the potential is for a massive over allocation

So we need to ensure limits are applied - and configurable. This isn't an egeria or even container consideration, rather it's at the deployment level ie in the charts (or operator). We must allow this to be config. Just as other apps do, or when a user is creating a deployment directly in a yaml, they would add

This consideration is part of the sizing activity needed in any deployment.. what to exactly set those values too

Platforms like openshift may also have the ability to dynamically override resource specifications - for example see https://docs.openshift.com/container-platform/4.11/nodes/clusters/nodes-cluster-overcommit.html , though we note that the default is not to overcommit

planetf1 · 2023-03-09T16:22:17Z

Running in Rancher desktop on macOS is interesting:

java -XshowSettings:system -version
Operating System Metrics:
    Provider: cgroupv1
    Effective CPU Count: 6
    CPU Period: 100000us
    CPU Quota: -1
    CPU Shares: 2us
    List of Processors, 6 total:
    0 1 2 3 4 5
    List of Effective Processors, 6 total:
    0 1 2 3 4 5
    List of Memory Nodes, 1 total:
    0
    List of Available Memory Nodes, 1 total:
    0
    Memory Limit: Unlimited
    Memory Soft Limit: Unlimited
    Memory & Swap Limit: Unlimited
    Maximum Processes Limit: Unlimited

openjdk version "11.0.18" 2023-01-17 LTS
OpenJDK Runtime Environment (Red_Hat-11.0.18.0.10-2.el8_7) (build 11.0.18+10-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.18.0.10-2.el8_7) (build 11.0.18+10-LTS, mixed mode)

as here it is telling me there is NO limit !! (Unlimited) which is potentially even worse

planetf1 · 2023-03-09T16:46:06Z

In summary:

currently pods will potentially use as much RAM as they can (especially in simpler k8s environments)
we may fail before even trying to do garbage collection
The k8s environment may offer controls to manage this (ie openshift) by modifying the pod definitions as they are being created
our helm charts need to provide a 'sensible' memory limit, at least for demos, to aim for more predictable behaviour (we do already for cts).
for the best reliability in the most environments, by default, we need to get the limits as low as we can
The limits need to be configurable
Applies to all the pods we deploy - ie including ui, nginx, jupyter etc
Docs should be updated with an explanation of our defaults, overriding & simple approach, ideally with links to standard k8s docs in this area
Note that in production environments this is a specialist skill

dwolfson · 2023-03-14T01:11:12Z

@planetf1 My testing has been with 3.15 so far - I'm happy to switch to testing with 4 if you think that would be helpful. I was running more tests with the Jupyter Labs today and found several more ways to have Egeria crash - not just loading the huge archive in the working with standards lab, but also with the automated curation and other notebooks - it seems like garbage collection isn't happening (often enough?). It seems like it often crashes if I try to run 2 or 3 notebooks after configuring Egeria..

planetf1 · 2023-03-14T09:26:11Z

Thanks for testing.

Can you try with v4? I think it would be most productive to focus there as we have changes in the base image, JVM level.

That being said, I am almost certain the behaviour will be the same I do suspect there's no garbage collection with the current sizing - running java -XshowSettings:system -version in the jvm will indicate what limits the jvm is working too. My guess is it's physical mem as we have no limit

(so maybe experimenting as below is more fruitful)

Our pods need to have properties similar to

resources:
      requests:
        memory: "512Mi"
        cpu: "250m"
      limits:
        memory: "1024Mi"
        cpu: "1000m"

currently the charts use

resources: {}

'request' is the MINIMUM required to allow the pod to be scheduled (so adding up all the requests effectively defines min ram needed), whilst the LIMIT is the most that can be used. Units above are MB for ram, and 1/1000 of a cpu unit. CPU is less critical, and probably minimum can be pretty low, and max can be higher

You can try editing the charts and adding in if you wish -- ie replace that section here with the follow example, and install the chart from disk (to do that, be in the directory which contains odpi-egeria-lab, and refer to the charts as just 'odpi-egeria-lab', not 'egeria/odpi-egeria-lab'

we'll need to update the charts to set these with sensible default values, that are also configurable.

So trying to figure out the limits is the next thing - we want them to be as small as possible
We'll try and get it into the charts soon, I suspect I've never hit it as I tend to run with 16GB physical nodes and don't run the more complex labs that previously were 'in development'

Some platforms allow defaults to be set, perhaps using an admission controller (this modifies the kubernetes documents as they are ingested, it's a technique for validation & annotation or enforcing any kind of change/override)

They may also be a default default setting for the k8s implementation - ie defined by the control plane. I don't know if rancher desktop has this -- it's looking that's set to unlimited. At least on openshift it's node physical.

planetf1 · 2023-03-14T09:27:47Z

The image is from charts/odpi-egeria-lab/templates/egeria-core.yaml -- we have 4 distinct ones for each platform (some refactoring is needed really!)

planetf1 · 2023-03-14T09:32:46Z

An alternative is to edit the definitions once deployed. Perhaps the easiest way is to edit the deployment ie

kubectl get deployments
NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
lab-odpi-egeria-lab-jupyter        1/1     1            1           4d17h
lab-odpi-egeria-lab-nginx          1/1     1            1           4d17h
lab-odpi-egeria-lab-presentation   1/1     1            1           4d17h
lab-odpi-egeria-lab-ui             1/1     1            1           4d17h
lab-odpi-egeria-lab-uistatic       1/1     1            1           4d17h
lab-strimzi-entity-operator        1/1     1            1           4d17h

Then edit one of these ie with

kubectl get statefulset
NAME                           READY   AGE
lab-odpi-egeria-lab-core       1/1     4d17h
lab-odpi-egeria-lab-datalake   1/1     4d17h
lab-odpi-egeria-lab-dev        1/1     4d17h
lab-odpi-egeria-lab-factory    1/1     4d17h
➜  templates git:(atruvia/main) kubectl edit statefulset lab-odpi-egeria-lab-core

Make the same edit as above to the resources section

Then save

Then delete the pod ie with kubectl delete pod xxxx. k8s will then create a new one, this time with the higher limits (as this is what is defined in the deployment)

There are other ways to edit also - you can read more at https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/use-resource-controller-to-dynamically-modify-the-upper-limit-of-resources-for-a-pod

dwolfson · 2023-03-14T14:20:24Z

The JVM says

[jboss@lab-odpi-egeria-lab-core-0 ~]$ java -XshowSettings:system -version
Operating System Metrics:
Provider: cgroupv1
Effective CPU Count: 4
CPU Period: 100000us
CPU Quota: -1
CPU Shares: 2us
List of Processors, 4 total:
0 1 2 3
List of Effective Processors, 4 total:
0 1 2 3
List of Memory Nodes, 1 total:
0
List of Available Memory Nodes, 1 total:
0
Memory Limit: Unlimited
Memory Soft Limit: Unlimited
Memory & Swap Limit: Unlimited
Maximum Processes Limit: Unlimited

openjdk version "11.0.18" 2023-01-17 LTS
OpenJDK Runtime Environment (Red_Hat-11.0.18.0.10-2.el8_7) (build 11.0.18+10-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.18.0.10-2.el8_7) (build 11.0.18+10-LTS, mixed mode)
[jboss@lab-odpi-egeria-lab-core-0 ~]$

I will try the above.

planetf1 · 2023-03-14T14:26:53Z

Memory is unlimited. So why would java do garbage collection ... certainly likely route to having the container killed.

The fix will be to ensure we have a resource declaration

planetf1 · 2023-03-15T14:14:09Z

I've added support for setting memory limits to the 'lab' chart (prerelease of 4.0)

Docs need to be added.

Quick test - usual data catalog lab worked fine with default values set, whilst for the standard models lab, both 'core' and 'dev' platforms crashed

There is still no exception logged at the end of the prior container run - ie it just stops

planetf1 · 2023-03-15T14:31:03Z

Tested again and was able to run standard models when egeria pods allocated 2GB with usage peaking around 1400+ MB. Max heap is 80% so we're in that region

I've updated the defaults

It's in chart 4.0.0-prerelease.2

Perhaps you can find some better refinements & validate what values make sense -- at least as a first pass. I think we'll need to return to analyze in more depth

dwolfson · 2023-03-16T02:55:40Z

Thanks Nigel, did you test it with "Working with Standard Models" notebook? Which repository was configured? I ran some quick tests this evening with the prerelease chart and had failures with all three repositories even though the memory didn't seem to exceed 1400 MB. I will try to test a bit more carefully tomorrow.

planetf1 · 2023-03-16T12:41:05Z

Tested std models lab using inmem repo

On a 3x16GB openshift cluster the test ran successully with no pod restarts

For rancher desktop on mac, a 12GB cluster would not allow all pods to be scheduled due to insufficient memory.
A 14GB cluster worked, the lab ran successfully, and the memory used by pods grew to 8.6 GB

This demonstrates a challenge in setting values

If we set no limits failures are very likely to occur
if we set limits big enough to handle our labs, our memory requirement is a lot higher than before
if we distinguish between 'requests' (minimum) and 'limits'
- being overcommitted means jvms may still fail
- our container bases it's memory defaults on the 'request' value, so would just end up with a small value instead - causing the original issue
- we would have to explicitly set min/max - but are then back to potential overcommit

Some reduction is likely possible with closer monitoring using metrics

On rancher, after a test run the memory used was:
NAME CPU(cores) MEMORY(bytes)
lab-odpi-egeria-lab-core-0 1963m 1384Mi << these three are sized at 2GB
lab-odpi-egeria-lab-datalake-0 700m 1748Mi << this is close - and prob ok but needs checking
lab-odpi-egeria-lab-dev-0 161m 1464Mi
lab-odpi-egeria-lab-factory-0 2m 357Mi << we are still not doing this, should we remove? @mandy-chessell - allocated 1GB
lab-odpi-egeria-lab-jupyter-7d7fbcf578-w7289 1m 405Mi << SIzed at 2GB. Can prob be reduced to 1/2 or 2/3 GB?
lab-odpi-egeria-lab-nginx-564f9c47bc-v257l 1m 8Mi << sized at 256MB
lab-odpi-egeria-lab-presentation-799ddfc5f-r6plz 1m 29Mi <<768MB assigned - Managed 46MB after querying 100s of items
lab-odpi-egeria-lab-ui-599868cc9-jkx8j 2m 256Mi << 1GB
lab-odpi-egeria-lab-uistatic-5d8ccf677b-sr62s 1m 8Mi << 256MB
lab-strimzi-entity-operator-5969bd794-bwqr6 27m 775Mi
lab-strimzi-kafka-0 107m 859Mi << allocated 1GB
lab-strimzi-zookeeper-0 10m 220Mi << I allocated 768

So
2 + 1.25 + .2 + .5 + .2 + .5 ie ~ 4.7GB reduction ? might reduce footprint to c. 10GB if we remove 'factory' and tighten the other components a little. (before more involved investigations)

planetf1 · 2023-03-16T13:04:57Z

with the graph repository, this notebook took 16 minutes to run (instead of <2 minutes - rancher desktop). No failure, but rather unusable. It did complete with no pod restarts, and similar amount of memory used (but need jvm stats to properly analyse)

planetf1 · 2023-03-20T13:03:11Z

Resource limits & jvm options can currently be set by creating a file ie ~/etc/rs.yaml containing

egeria:
  core:
     jvmopts: "-XX:+HeapDumpOnOutOfMemoryError"
     resources:
       limits:
         cpu: 2000m
         memory: 3000Mi
       requests:
         cpu: 200m
         memory: 3000Mi

This entry is for 'core'. For our other platforms add similar sections for 'datalake', 'dev', 'factory', and also for our ui components 'ui', 'uistatic', 'presentation'

Optionally settings can be added for our other containers 'nginx', 'zookeeper', 'kafka' with the same resources section ie:

nginx:
  resources:
    limits: 100m

etc

Once we have understand the behaviour, and established sensible defaults, then we can consider how to make this simpler for the user.

Another important part here is the 'jvmopts' which applies to all of our containers

One useful value is to set it to "-Dmanagement.endpoints.web.exposure.include=*" which enables the spring actuator endpoint, which is probably the easiest way to capture metrics.

Here's an example just to enable metrics (but no changes to default resources)

egeria:
  core:
    jvmopts: "-Dmanagement.endpoints.web.exposure.include=*"
  datalake:    
    jvmopts: "-Dmanagement.endpoints.web.exposure.include=*"
  dev:
    jvmopts: "-Dmanagement.endpoints.web.exposure.include=*"
  factory:
    jvmopts: "-Dmanagement.endpoints.web.exposure.include=*"

If this proves useful, we can make this a default setting - perhaps even in the egeria server chassis itself, or else just the container or charts

See odpi/egeria-docs#718 where I've added some examples of using this URL, and through which we will provide further docs on this

Signed-off-by: Nigel Jones <[email protected]>

planetf1 · 2023-03-22T09:24:20Z

See #245 which adds some example snippets to help define resources for the containers used in lan chart

--

I've been looking at the memory profile when importing CIM using the YourKit(tm) analyser

The default when run locally is to use G1 - this shows fairly smooth behaviour with minimal (<1ms) GC delays. It does allow memory to get close to the heap size before kicking in, leading to some sawtooth behaviour, but it's consistent.

The default in our container is to use ParallelGC. This shows different behaviour - it cleans garbage much more quickly, and keeps the memory footprint smaller (as low as 250MB in a 1GB jvm), but we see large numbers of GC cycles. Additionally GCs block threads for an extended period, with frequent delays approaching 100s

This was replicated locally using

 JAVA_TOOL_OPTIONS="-Xmx1024m -XX:+UseParallelGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:+ExitOnOutOfMemoryError -     Dmanagement.endpoints.web.exposure.include=*"

Some differences from container -- it doesn't set -Xmx, but instead sets a fraction of ram. This makes sense in the resource limited container where we set a 1GB or 2GB limit, but on a 32GB machine locally we need te be more precise.

Of these parameters the option to exit the vm on OOM makes sense, as this is in a k8s environment where we would want the pod to restart, rather than malfunction.

These settings are inherited from the red hat ubi9 openjdk containe we use - they are defaults. However they seem inconsistent with our workload

planetf1 · 2023-03-22T11:19:32Z

Further observation. As suspected, with different platforms returning varying values of physical memory, for now
I will set actual memory limits for max heap in java -- using the %'s is too variable.

ie

jvmopts:"-Xlog:os+container=trace-XX:+ExitOnOutOfMemoryError-XX:+UseG1GC -Xlog:gc -Xms410m -Xmx820m -Dmanagement.endpoints.web.exposure.include=*"

for a container with a request/limit of 1GB

However even with this set, the datalake container is killed by the OOS with OOM ie the last trace is

[199.595s][trace][os,container] total physical memory: 16797204480
[199.595s][trace][os,container] Path to /memory.limit_in_bytes is /sys/fs/cgroup/memory/memory.limit_in_bytes
[199.596s][trace][os,container] Memory Limit is: 1048576000
[199.596s][trace][os,container] Path to /memory.usage_in_bytes is /sys/fs/cgroup/memory/memory.usage_in_bytes
[199.596s][trace][os,container] Memory Usage is: 937668608

Note

in this case the memory limit of the container is correct - 1GB, and the jvm is also aware of physical ram - 16GB
the last usage we see is a little under this. We know exceeding will get the process killed by k8s

So what is java doing?

At least in terms of the heap & gc:

[192.983s][info ][gc          ] GC(39) Concurrent Mark Cycle 322.952ms
[194.153s][info ][gc          ] GC(40) Pause Young (Prepare Mixed) (G1 Evacuation Pause) 342M->116M(410M) 20.224ms
[194.199s][info ][gc          ] GC(41) Pause Young (Mixed) (G1 Evacuation Pause) 120M->113M(410M) 31.901ms
[195.762s][info ][gc          ] GC(42) Pause Young (Normal) (G1 Evacuation Pause) 346M->119M(410M) 12.718ms
[197.606s][info ][gc          ] GC(43) Pause Young (Normal) (G1 Evacuation Pause) 345M->119M(410M) 14.162ms
[199.166s][info ][gc          ] GC(44) Pause Young (Concurrent Start) (G1 Evacuation Pause) 346M->120M(410M) 20.264ms
[199.166s][info ][gc          ] GC(45) Concurrent Mark Cycle
[199.354s][info ][gc          ] GC(45) Pause Remark 132M->132M(410M) 15.708ms
[199.439s][info ][gc          ] GC(45) Pause Cleanup 135M->135M(410M) 0.279ms
[199.445s][info ][gc          ] GC(45) Concurrent Mark Cycle 278.605ms

This shows the heap at the time of failure was only 410MB - which is well under the 1GB system limit.

This means we need to look at other memory consumption within the pod - both in terms of other java data as well as any other processes. From local tests, the other java usage should be small (which is why 80% for heap is a good starting point)

planetf1 · 2023-03-22T12:12:05Z

Restricting the metaspace to 100MB resulted in

Terminating due to java.lang.OutOfMemoryError: Metaspace

200MB & egeria runs fine, but even with the resource limit set at ~1.2GB (heap limit about 800, metaspace 800), we still exceed permitted memory usage & be terminated.

Even if we don't enable -XX:+ExitOnOutOfMemoryError (typical for k8s) we no sign from the jvm stating it has exceeded any of it's memory limits such as heap

This might suggest memory is allocated from the native process heap.

Kafka has a buffer.memory but this defaults to 32MB

planetf1 · 2023-03-22T12:19:30Z

As an example of java non-heap usage: - this is well into the test - locally - and we're around 300MB: (and only around 400MB long-term on the heap) - albeit with an 8GB jvm limit

planetf1 · 2023-03-22T12:23:47Z

Retesting locally, and this same Java process was reporting 3390MB at an OS level:

PID    COMMAND      %CPU  TIME     #TH    #WQ  #PORTS MEM    PURG   CMPRS  PGRP  PPID  STATE    BOOSTS             %CPU_ME %CPU_OTHRS UID  FAULTS    COW   MSGSENT    MSGRECV
53338  java         131.5 27:09.10 293/2  1    20591+ 3291M+ 0B     1175M- 53076 1     running  *0[1]              0.00000 0.00000    501  697133+   634   311638+    155746+

planetf1 · 2023-03-22T12:37:58Z

This was taken with heap limited to 1GB, but no other limits/jvm customization.

-Xmx:1024m
-> Heap using 313MB, though limit set to 910MB
-> non-heap getting toward 300MB
RSS - 1.35GB
-> About 120MB unaccounted for - kafka buffers? And with latency this could grow higher on the container environment?

planetf1 · 2023-03-22T12:46:45Z

Kafka produces will use up to buffer.size (ie 32MB) to store messages client side if they cannot be sent to the message broker quick enough, the client API will block for up to max.block.ms (60s) before returning a client API failure

this buffer size will apply to each kafka producer.

OMRS Instances will have the highest workload, and for coco we have 3 cohorts (so ~100MB ram)

Worth trying

Reduce buffer size -- though could result in other failures due to timeouts
Increase block timeout -- though could have other knock on effects
Increase memory allocation : worth experimenting to see if there is a fixed limit over and above java heap+non-heap that is needed to keep stable running
Improve kafka throughput - add more replicas

planetf1 · 2023-03-22T13:10:49Z

Tested again, 4GB resource, 1GB heap. Core & datalake still approach real memory

Core - maxes at 2.7GB
Datalake - maxes at 1.8GB
dev - maxes at 1.4GB
factory - maxes at 0.8GB

planetf1 · 2023-03-22T20:51:19Z

I’m going to persue:

Need to understand memory usage in more detail. There is a lot of potential in terms of file system buffers, even outside the heap & metaspace.

For now the prudent approach is to remove memory limits & jvm configuration -- the option to add them remains, and snippets are provided in config/values. This at least should mean most of the scenarios work most of the time on a decent configuration (the goal was to be more precise than this, but we don't have all the information to do this)

@dwolfson New charts are published. Would you like to try with the defaults?

juergenhemelt added bug Something isn't working triage labels Aug 16, 2022

planetf1 mentioned this issue Aug 18, 2022

Support capture of Java heap dump #180

Open

planetf1 self-assigned this Aug 24, 2022

planetf1 added bug Something isn't working and removed triage bug Something isn't working labels Sep 30, 2022

planetf1 added the containers label Dec 21, 2022

planetf1 transferred this issue from odpi/egeria Mar 9, 2023

planetf1 added a commit to planetf1/egeria-charts that referenced this issue Mar 21, 2023

odpi#235 adjust memory settings, jvm configuration

fcf4050

Signed-off-by: Nigel Jones <[email protected]>

planetf1 mentioned this issue Mar 21, 2023

#235 adjust memory settings, jvm configuration #242

Merged

planetf1 mentioned this issue Mar 22, 2023

Revert to defaults for resources #248

Merged

planetf1 assigned dwolfson and unassigned planetf1 Jun 13, 2023

[BUG] Egeria Metadata Server crashes silently while processing massive amount of elements #235

[BUG] Egeria Metadata Server crashes silently while processing massive amount of elements #235

Comments

juergenhemelt commented Aug 16, 2022

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Any Further Information?

planetf1 commented Aug 16, 2022 • edited Loading

planetf1 commented Aug 16, 2022

planetf1 commented Aug 16, 2022

planetf1 commented Aug 16, 2022

planetf1 commented Aug 17, 2022

planetf1 commented Aug 17, 2022

juergenhemelt commented Aug 18, 2022

planetf1 commented Aug 18, 2022

planetf1 commented Aug 18, 2022

planetf1 commented Aug 18, 2022

juergenhemelt commented Aug 19, 2022 • edited Loading

planetf1 commented Aug 19, 2022

juergenhemelt commented Aug 22, 2022

juergenhemelt commented Aug 23, 2022

planetf1 commented Aug 24, 2022

juergenhemelt commented Aug 24, 2022

planetf1 commented Aug 24, 2022 • edited Loading

github-actions bot commented Nov 30, 2022

github-actions bot commented Feb 20, 2023

dwolfson commented Mar 2, 2023

planetf1 commented Mar 9, 2023

planetf1 commented Mar 9, 2023

planetf1 commented Mar 9, 2023

planetf1 commented Mar 9, 2023

planetf1 commented Mar 9, 2023 • edited Loading

dwolfson commented Mar 14, 2023

planetf1 commented Mar 14, 2023 • edited Loading

planetf1 commented Mar 14, 2023

planetf1 commented Mar 14, 2023

dwolfson commented Mar 14, 2023

planetf1 commented Mar 14, 2023

planetf1 commented Mar 15, 2023

planetf1 commented Mar 15, 2023

dwolfson commented Mar 16, 2023

planetf1 commented Mar 16, 2023

planetf1 commented Mar 16, 2023

planetf1 commented Mar 20, 2023 • edited Loading

planetf1 commented Mar 22, 2023

planetf1 commented Mar 22, 2023

planetf1 commented Mar 22, 2023 • edited Loading

planetf1 commented Mar 22, 2023 • edited Loading

planetf1 commented Mar 22, 2023

planetf1 commented Mar 22, 2023

planetf1 commented Mar 22, 2023

planetf1 commented Mar 22, 2023

planetf1 commented Mar 22, 2023

planetf1 commented Aug 16, 2022 •

edited

Loading

juergenhemelt commented Aug 19, 2022 •

edited

Loading

planetf1 commented Aug 24, 2022 •

edited

Loading

planetf1 commented Mar 9, 2023 •

edited

Loading

planetf1 commented Mar 14, 2023 •

edited

Loading

planetf1 commented Mar 20, 2023 •

edited

Loading

planetf1 commented Mar 22, 2023 •

edited

Loading

planetf1 commented Mar 22, 2023 •

edited

Loading