Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to execute scripts via Kubernetes Jobs #690

Merged
merged 30 commits into from
Nov 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
4666cec
Add Kubernetes Client SDK
APErebus Nov 15, 2023
8e74d6e
Support executing scripts via Kubernetes Job
APErebus Nov 16, 2023
b8d66f4
Continued changes
APErebus Nov 16, 2023
548af9d
Add ScriptStateStore support to k8s jobs
APErebus Nov 19, 2023
4b2cc93
Add support for supplying volume info via env vars
APErebus Nov 20, 2023
7f998e8
Set up for running in WSL
APErebus Nov 20, 2023
4d8b21d
Adjust run config
APErebus Nov 20, 2023
4a4670f
Fix a number of issues with running k8s scripts
APErebus Nov 21, 2023
ecb14b0
Update script runner with new code
APErebus Nov 21, 2023
282580e
Try building the scriptrunner as a self-contained app
APErebus Nov 22, 2023
0b2ad67
Build ScriptRunner as self-contained single file package
APErebus Nov 22, 2023
21d66e6
Run the script runner as the entry point
APErebus Nov 22, 2023
6fa5a4c
Run bash script to handle output and error logs
APErebus Nov 24, 2023
a057ba7
Read from job streams and write to Output.log
APErebus Nov 27, 2023
af2b333
Remove Kubernetes.ScriptRunner
APErebus Nov 27, 2023
966c6c2
Self-cleanup
APErebus Nov 27, 2023
f579f1c
Cleanup after self-review
APErebus Nov 27, 2023
8843207
More minor cleanup
APErebus Nov 27, 2023
f7cb8a8
Remove ScriptRunner configuration
APErebus Nov 27, 2023
5069e7a
Fix solution
APErebus Nov 27, 2023
b524284
Fix rebase issue
APErebus Nov 27, 2023
571df71
Revert global.json
APErebus Nov 27, 2023
6d2b9e5
Remove duplicate package references
APErebus Nov 27, 2023
cd625dd
Remove run configuration
APErebus Nov 27, 2023
5092c07
Merge branch 'main' into ap/k8s-jobs-script-execution
APErebus Nov 27, 2023
96ad11b
Update environment variables
APErebus Nov 28, 2023
4a20af7
Merge branch 'main' into ap/k8s-jobs-script-execution
APErebus Nov 28, 2023
97f639b
Change to exception filter
APErebus Nov 29, 2023
a2a48b7
PR feedback
APErebus Nov 29, 2023
e5f2f63
Merge branch 'main' into ap/k8s-jobs-script-execution
APErebus Nov 29, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@
<PackageReference Include="NuGet.Packaging.Core" Version="3.6.0-octopus-58692" />
<PackageReference Include="NuGet.Packaging.Core.Types" Version="3.6.0-octopus-58692" />
<PackageReference Include="NuGet.Versioning" Version="3.6.0-octopus-58692" />
<PackageReference Include="Portable.BouncyCastle" Version="1.8.10" />
<PackageReference Include="Portable.BouncyCastle" Version="1.9.0" />
<PackageReference Include="System.Collections" Version="4.3.0" />
<PackageReference Include="System.Collections.Concurrent" Version="4.3.0" />
<PackageReference Include="System.ComponentModel.Annotations" Version="5.0.0" />
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,11 @@ public void Dispose()
}

public void WriteOutput(ProcessOutputSource source, string message)
=> WriteOutput(source, message, DateTimeOffset.UtcNow);

public void WriteOutput(ProcessOutputSource source, string message, DateTimeOffset occurred)
{
Console.WriteLine($"{DateTime.UtcNow} {source} {message}");
Console.WriteLine($"{occurred} {source} {message}");
switch (source)
{
case ProcessOutputSource.Debug:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,13 @@
using Octopus.Diagnostics;
using Octopus.Tentacle.CommonTestUtils.Builders;
using Octopus.Tentacle.Configuration;
using Octopus.Tentacle.Configuration.Instances;
using Octopus.Tentacle.Contracts;
using Octopus.Tentacle.Contracts.ScriptServiceV3Alpha;
using Octopus.Tentacle.Diagnostics;
using Octopus.Tentacle.Kubernetes;
using Octopus.Tentacle.Scripts;
using Octopus.Tentacle.Scripts.Kubernetes;
using Octopus.Tentacle.Services.Scripts;
using Octopus.Tentacle.Util;

Expand All @@ -37,14 +40,12 @@ public void SetUp()
workspaceFactory = new ScriptWorkspaceFactory(octopusPhysicalFileSystem, homeConfiguration, new SensitiveValueMasker());
stateStoreFactory = new ScriptStateStoreFactory(octopusPhysicalFileSystem);

var scriptExecutorFactory = new ScriptExecutorFactory(
new Lazy<LocalShellScriptExecutor>(() =>
new LocalShellScriptExecutor(
PlatformDetection.IsRunningOnWindows ? new PowerShell() : new Bash(),
Substitute.For<ISystemLog>())));
var localShellScriptExecutor = new LocalShellScriptExecutor(
PlatformDetection.IsRunningOnWindows ? new PowerShell() : new Bash(),
Substitute.For<ISystemLog>());

service = new ScriptServiceV3Alpha(
scriptExecutorFactory,
localShellScriptExecutor,
workspaceFactory,
stateStoreFactory);
}
Expand Down
5 changes: 5 additions & 0 deletions source/Octopus.Tentacle/ExternalInit.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
namespace System.Runtime.CompilerServices
APErebus marked this conversation as resolved.
Show resolved Hide resolved
{
//This is needed to support `record` types in .NET 4.8
internal static class IsExternalInit {}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
using k8s;

namespace Octopus.Tentacle.Kubernetes
{
public interface IKubernetesClientConfigProvider
{
KubernetesClientConfiguration Get();
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
using System;
using k8s;

namespace Octopus.Tentacle.Kubernetes
{
class InClusterKubernetesClientConfigProvider : IKubernetesClientConfigProvider
{
public KubernetesClientConfiguration Get()
{
return KubernetesClientConfiguration.InClusterConfig();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we are running in a cluster, this will just use the ambient service account auth

}
}
}
33 changes: 33 additions & 0 deletions source/Octopus.Tentacle/Kubernetes/KubernetesClusterService.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
using System;
using System.Threading.Tasks;
using k8s;
using Nito.AsyncEx;

namespace Octopus.Tentacle.Kubernetes
{
public interface IKubernetesClusterService
{
Task<ClusterVersion> GetClusterVersion();
}

public class KubernetesClusterService : KubernetesService, IKubernetesClusterService
{
readonly AsyncLazy<ClusterVersion> lazyVersion;
public KubernetesClusterService(IKubernetesClientConfigProvider configProvider)
: base(configProvider)
{
//As the cluster version isn't going to change without restarting, we just cache the version in an AsyncLazy
lazyVersion = new AsyncLazy<ClusterVersion>(async () =>
{
var versionInfo = await Client.Version.GetCodeAsync();

return new ClusterVersion(int.Parse(versionInfo.Major), int.Parse(versionInfo.Minor));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the patch number of a k8s cluster is not provided by the cluster itself (and isn't important for what we need it for)

});
}

public async Task<ClusterVersion> GetClusterVersion()
=> await lazyVersion;
}

public record ClusterVersion(int Major, int Minor);
}
18 changes: 18 additions & 0 deletions source/Octopus.Tentacle/Kubernetes/KubernetesConfig.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
using System;

namespace Octopus.Tentacle.Kubernetes
{
public static class KubernetesConfig
{
public static string Namespace => GetRequiredEnvVar("OCTOPUS__K8STENTACLE__NAMESPACE", "Unable to determine Kubernetes namespace.");
public static string JobServiceAccountName => GetRequiredEnvVar("OCTOPUS__K8STENTACLE__JOBSERVICEACCOUNTNAME", "Unable to determine Kubernetes Job service account name.");
public static string JobVolumeYaml => GetRequiredEnvVar("OCTOPUS__K8STENTACLE__JOBVOLUMEYAML", "Unable to determine Kubernetes Job volume yaml.");
public static bool UseJobs => bool.TryParse(Environment.GetEnvironmentVariable("OCTOPUS__K8STENTACLE__USEJOBS"), out var useJobs) && useJobs;

public const int JobTtlSeconds = 1800; //30min

static string GetRequiredEnvVar(string variable, string errorMessage)
=> Environment.GetEnvironmentVariable(variable)
?? throw new InvalidOperationException($"{errorMessage} The environment variable '{variable}' must be defined with a non-null value.");
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace Octopus.Tentacle.Kubernetes
{
public interface IKubernetesJobContainerResolver
{
Task<string> GetContainerImageForCluster();
}

public class KubernetesJobContainerResolver : IKubernetesJobContainerResolver
{
readonly IKubernetesClusterService clusterService;

public KubernetesJobContainerResolver(IKubernetesClusterService clusterService)
{
this.clusterService = clusterService;
}

static readonly List<Version> KnownLatestContainerTags = new()
{
new(1, 26, 3),
new(1, 27, 3),
new(1, 28, 2),
};

public async Task<string> GetContainerImageForCluster()
{
var clusterVersion = await clusterService.GetClusterVersion();

//find the highest tag for this cluster version
var tagVersion = KnownLatestContainerTags.FirstOrDefault(tag => tag.Major == clusterVersion.Major && tag.Minor == clusterVersion.Minor);

var tag = tagVersion?.ToString(3) ?? "latest";

return $"octopuslabs/k8s-workertools:{tag}";
}
}
}
86 changes: 86 additions & 0 deletions source/Octopus.Tentacle/Kubernetes/KubernetesJobService.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
using System;
using System.Net;
using System.Threading;
using System.Threading.Tasks;
using k8s;
using k8s.Autorest;
using k8s.Models;
using Octopus.Tentacle.Contracts;

namespace Octopus.Tentacle.Kubernetes
{
public interface IKubernetesJobService
{
Task<V1Job?> TryGet(ScriptTicket scriptTicket, CancellationToken cancellationToken);
string BuildJobName(ScriptTicket scriptTicket);
Task CreateJob(V1Job job, CancellationToken cancellationToken);
void Delete(ScriptTicket scriptTicket);
Task Watch(ScriptTicket scriptTicket, Func<V1Job, bool> onChange, Action<Exception> onError, CancellationToken cancellationToken);
}

public class KubernetesJobService : KubernetesService, IKubernetesJobService
{
public KubernetesJobService(IKubernetesClientConfigProvider configProvider)
: base(configProvider)
{
}

public async Task<V1Job?> TryGet(ScriptTicket scriptTicket, CancellationToken cancellationToken)
{
var jobName = BuildJobName(scriptTicket);

try
{
return await Client.ReadNamespacedJobStatusAsync(jobName, KubernetesConfig.Namespace, cancellationToken: cancellationToken);
}
catch (HttpOperationException opException)
when (opException.Response.StatusCode == HttpStatusCode.NotFound)
{
return null;
}
}

public async Task Watch(ScriptTicket scriptTicket, Func<V1Job, bool> onChange, Action<Exception> onError, CancellationToken cancellationToken)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than polling the TryGet endpoint continually, this creates an IAsyncEnumerable which yields when there is a change

{
var jobName = BuildJobName(scriptTicket);

using var response = Client.BatchV1.ListNamespacedJobWithHttpMessagesAsync(
KubernetesConfig.Namespace,
//only list this job
fieldSelector: $"metadata.name=={jobName}",
watch: true,
timeoutSeconds: KubernetesConfig.JobTtlSeconds,
cancellationToken: cancellationToken);

await foreach (var (type, job) in response.WatchAsync<V1Job, V1JobList>(onError, cancellationToken: cancellationToken))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting pattern 👍

{
//we are only watching for modifications
if (type != WatchEventType.Modified)
continue;

var stopWatching = onChange(job);
if (stopWatching)
break;
}
}

public string BuildJobName(ScriptTicket scriptTicket) => $"octopus-{scriptTicket.TaskId}".ToLowerInvariant();

public async Task CreateJob(V1Job job, CancellationToken cancellationToken)
{
await Client.CreateNamespacedJobAsync(job, KubernetesConfig.Namespace, cancellationToken: cancellationToken);
}

public void Delete(ScriptTicket scriptTicket)
{
try
{
Client.DeleteNamespacedJob(BuildJobName(scriptTicket), KubernetesConfig.Namespace);
}
catch
{
//we are comfortable silently consuming this as the jobs have a TTL that will clean it up anyway
}
}
}
}
20 changes: 20 additions & 0 deletions source/Octopus.Tentacle/Kubernetes/KubernetesModule.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
using Autofac;

namespace Octopus.Tentacle.Kubernetes
{
public class KubernetesModule : Module
{
protected override void Load(ContainerBuilder builder)
{
builder.RegisterType<KubernetesJobService>().As<IKubernetesJobService>().SingleInstance();
builder.RegisterType<KubernetesClusterService>().As<IKubernetesClusterService>().SingleInstance();
builder.RegisterType<KubernetesJobContainerResolver>().As<IKubernetesJobContainerResolver>().SingleInstance();

#if DEBUG
builder.RegisterType<LocalMachineKubernetesClientConfigProvider>().As<IKubernetesClientConfigProvider>().SingleInstance();
#else
builder.RegisterType<InClusterKubernetesClientConfigProvider>().As<IKubernetesClientConfigProvider>().SingleInstance();
#endif
}
}
}
14 changes: 14 additions & 0 deletions source/Octopus.Tentacle/Kubernetes/KubernetesService.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
using k8sClient = k8s.Kubernetes;

namespace Octopus.Tentacle.Kubernetes
{
public abstract class KubernetesService
{
protected k8sClient Client { get; }

protected KubernetesService(IKubernetesClientConfigProvider configProvider)
{
Client = new k8sClient(configProvider.Get());
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
using System;
using k8s;

namespace Octopus.Tentacle.Kubernetes
{
class LocalMachineKubernetesClientConfigProvider : IKubernetesClientConfigProvider
{
public KubernetesClientConfiguration Get()
{
#if DEBUG
var kubeConfigEnvVar = Environment.GetEnvironmentVariable("KUBECONFIG");
return KubernetesClientConfiguration.BuildConfigFromConfigFile(kubeConfigEnvVar);
#else
throw new NotSupportedException("Local machine configuration is only supported when debugging.");
#endif
}
}
}
51 changes: 51 additions & 0 deletions source/Octopus.Tentacle/Kubernetes/bootstrapRunner.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#! /usr/bin/bash

WORK_DIR=$1
STDOUT_LOG="$WORK_DIR/stdout.log"
STDERR_LOG="$WORK_DIR/stderr.log"

format() {
now=$(date -u +"%Y-%m-%dT%H:%M:%S.%N%z")
echo "$now|$2" | tee -a "$1"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use tee so the logs are written to the job/pod logs as well as the log files

}

logStdOut() {
while read -r IN
do
format "$STDOUT_LOG" "$IN"
done
}

logStdErr() {
while read -r IN
do
format "$STDERR_LOG" "$IN"
done
}

#ensure these files exist
rm -f "$STDOUT_LOG";
rm -f "$STDERR_LOG";
touch "$STDOUT_LOG"
touch "$STDERR_LOG"

#pass the remaining args (skipping the first which is the working directory)
shift

BOOTSTRAP_SCRIPT=$1

#This is the args for the Bootstrap script
shift

exec > >(logStdOut)
exec 2> >(logStdErr >&2)

/bin/bash $BOOTSTRAP_SCRIPT "$@"

# Write a message to say the job has completed
echo "##octopus[stdout-verbose]"
echo "Kubernetes Job completed"
echo "##octopus[stdout-default]"

# This ungodly hack is to stop the pod from being killed before the last log has been flushed
sleep 0.250 #250ms
Comment on lines +50 to +51
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the pod is destroyed before the file logs are flushed. This little hack just gives the pod time to flush the last redirected output to the files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 yeah I feel your pain. Might be an idea to pipe one last verbose message to the logs so that we can tell if someone ever complains about logs getting cut out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a great idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a2a48b7

Loading