Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process Isolation is very slow as compared to HyperV Containers on Server 2019 #459

Open
saraf-akshay opened this issue Jan 25, 2024 · 17 comments
Assignees
Labels
bug Something isn't working perf Speed, efficiency, optimization concerns

Comments

@saraf-akshay
Copy link

Describe the bug
Slowness in cloning source when running multiple containers simultaneously in process isolation.

Isolation Mode Time in Git clone Containers running in parallel Comments
Process 9 mins 1
HyperV 8.5 mins 1
Process 21 mins 10 <-- This is the problem
HyperV 11 mins 10

As the number of containers increases on the server, the performance of container slows down significantly but only in process isolation. I am not worried about minor performance differences. The same also happens when I compile in these containers using nmake. The performance degrades in process isolation.

These 10 containers I mentioned above are triggered by a Jenkins pipeline using Kubernetes. Here is the yaml code I used:

apiVersion: v1
kind: Pod
spec:
  tolerations:
  - effect: NoSchedule
    key: custom/build-hosts
    operator: Exists
  containers:
  - name: jnlp
    image: <image link redacted>
    command:
    - powershell
    args:
    - cp -R C:\\privconf\\*  C:\\Users\\ContainerAdministrator;
    - C:\\jenkinsscript\\jenkins.ps1
    resources:
      limits:
        cpu: 12
        memory: 16Gi
      requests:
        cpu: 12
        memory: 16Gi
    env:
    - name: MY_POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: MY_HOST_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName
    volumeMounts:
    - mountPath: /privconf
      name: credential-volume
    - mountPath: /gitcache
      name: cache-volume
    - mountPath: /jenkinsscript
      name: jenkins-script
  volumes:
  - hostPath:
      path: D:/agentconf
      type: ""
    name: credential-volume
  - hostPath:
      path: D:/agentcache
      type: ""
    name: cache-volume
  - configMap:
      defaultMode: 420
      name: jenkins-script
    name: jenkins-script
  nodeSelector:
    custom/fcds: test_akshay

The HyperV Data was gathered using Docker Swarm, as K8S doesn't support HyperV Isolation.

dockerSwarm {
    label "docker-agent"
    image "<image link redacted>"
    limitsNanoCPUs 12000000000
    limitsMemoryBytes 17179860384
    reservationsNanoCPUs 12000000000
    reservationsMemoryBytes 17179860384
}

The physical host that I ran it on is a bare metal server, with 208 logical cores (104 physical cores) after Hyperthreading enabled.

To Reproduce
Please trigger 10 parallel containers on the same host at the exact same time, cloning the exact same repository, and that way you should be able to reproduce the issue.

Expected behavior
The expectation is for Process Isolation to work on par or better than HyperV Isolation.

Configuration:

  • Edition: Windows Server 2019
  • Base Image being used: jenkins/inbound-agent:3107.v665000b_51092-7-jdk11-windowsservercore-ltsc2019
  • Container engine: Docker
  • Container Engine version:
Client:
Version:           25.0.0
API version:       1.44
Go version:        go1.21.6
Git commit:        e758fe5
Built:             Thu Jan 18 17:10:49 2024
OS/Arch:           windows/amd64
Context:           default

Server: Docker Engine - Community
Engine:
 Version:          25.0.0
 API version:      1.44 (minimum version 1.24)
 Go version:       go1.21.6
 Git commit:       615dfdf
 Built:            Thu Jan 18 17:09:34 2024
 OS/Arch:          windows/amd64
 Experimental:     false

Additional context

I have verified that there is no resource over provisioning and my Windows defender is disabled, and all my processes (including git and git-lfs) and directories where source code is checked out are part of exclusion list. As mentioned here: #149
Also verified I have the Defender fix, which was released here: #345

@saraf-akshay saraf-akshay added the bug Something isn't working label Jan 25, 2024
@ntrappe-msft ntrappe-msft added the triage New and needs attention label Jan 25, 2024
@fady-azmy-msft
Copy link
Contributor

fady-azmy-msft commented Jan 29, 2024

Hey @saraf-akshay, could you share what you're seeing with Windows Server 2022 process isolation?

We don't ship OS level fixes anymore for Windows Server 2019 because it is now out of mainstream support (only address security fixes): https://learn.microsoft.com/en-us/lifecycle/products/windows-server-2019

@saraf-akshay
Copy link
Author

@fady-azmy-msft : Thanks for your response. I'm working on preparing a server with Server 2022. It might take a couple days. I'll keep you posted.

@saraf-akshay
Copy link
Author

saraf-akshay commented Feb 6, 2024

@fady-azmy-msft ,@ntrappe-msft : There is still slowness.

Server 2022 is a lot better than Server 2019. Server 2019 was 2x slower, whereas Server 2022 is 1.25x slower in Process Isolation as compared to HyperV Isolation when I run 10 containers in parallel on a host, (essentially trying to run host at its full capacity) with resource (CPU and Memory) restriction as showed in my first comment's yaml file.

@fady-azmy-msft fady-azmy-msft removed the triage New and needs attention label Feb 13, 2024
@nickcva
Copy link

nickcva commented Feb 14, 2024

Here is what I have experienced with process isolation compared to Hyper-V isolation. I have seen cascading container failures and even containers that crash and cannot recover EVER they have to be redeployed. The performance is night and day better on my SHIR containers now with Hyper-V isolation.

Host Running 2019 DC
Container 2019 core latest

Azure/Azure-Data-Factory-Integration-Runtime-in-Windows-Container#7

@ntrappe-msft ntrappe-msft added the perf Speed, efficiency, optimization concerns label Feb 26, 2024
@saraf-akshay
Copy link
Author

Hello @Howard-Haiyang-Hao @fady-azmy-msft @ntrappe-msft
Just checking in, Any update on this?

Copy link
Contributor

This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.

2 similar comments
Copy link
Contributor

This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.

Copy link
Contributor

This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.

@nickcva
Copy link

nickcva commented Jun 4, 2024

Im now running 80+ SHIR containers with hyper-v isolation successfully with little to no issues. Without hyper isolation the max that I could run was about 25+- and that also created issues that cause the container to completely corrupt its self at random. Please make a Linux compatible SHIR application for ADF / Synapse!

Copy link
Contributor

This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.

@doctorpangloss
Copy link

can you run this without using host paths?

Copy link
Contributor

This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.

2 similar comments
Copy link
Contributor

This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.

Copy link
Contributor

This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.

@sbingham-MET
Copy link

sbingham-MET commented Oct 16, 2024

This is still very much an issue for AKS users, and was raised to MSFT support (#2309150040010155) back in October of 2023. With a similar finding here: https://forums.docker.com/t/docker-slower-to-copy-files-and-run-compiler-in-server-2019-than-windows-10/113938/2

Since AKS does not support hyper-v, only process isolation.
It's on their roadmap, but no date: Azure/AKS#1792

MSFT support eventually told us to try Linux containers since there was no resolution in sight. Unfortunate when you have to support some applications that are windows dependent. This was despite 4 months of back and forth with enterprise support.

Copy link
Contributor

This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.

1 similar comment
Copy link
Contributor

This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working perf Speed, efficiency, optimization concerns
Projects
None yet
Development

No branches or pull requests

7 participants