Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promitor to support Azure Workload Identity with UserAssignedManagedIdentity #2218

Open
dks0296586 opened this issue Jan 24, 2023 · 13 comments
Assignees
Labels
feature-request New feature requests

Comments

@dks0296586
Copy link

Proposal

With aad-pod-identity being deprecated in favor of Azure Workload Identity, Promitor should support Workload Identity.

In my testing using the current version of Resource Discovery, attempting to use Workload Identity results in the following error:
AADSTS70021: No matching federated identity record found for presented assertion.

I don't believe this is a configuration issue on my end, as I have verified the configuration using the azwi quick-start guide and got that working as expected.

Component

Resource Discovery, Scraper

Contact Details

[email protected]

@dks0296586 dks0296586 added the feature-request New feature requests label Jan 24, 2023
@dks0296586
Copy link
Author

I was able to find a workaround to get Workload Identity working by using these helm values:

    podLabels:  
      azure.workload.identity/use: "true"  
    rbac:  
      serviceAccount:  
        create: false  
        name: workload-identity-sa  
    azureAuthentication:  
      mode: "UserAssignedManagedIdentity"  
      identity:  
        id: <Client ID>  

azureAuthentication.identity.id needs to be set to the client id associated with the workload-identity-sa service account, so that it passes Promitor startup validation, and it needs to be the correct client id so that it doesn't override to an incorrect value when calling ManagedIdentityCredential:
tokenCredential = new ManagedIdentityCredential(authenticationInfo.IdentityId, tokenCredentialOptions);

I have made some small changes to the AzureAuthenticationFactory to make this a little more streamlined. @tomkerkhove would you like me to submit a PR for review?

@tomkerkhove
Copy link
Owner

100%, thanks a ton!

@dks0296586
Copy link
Author

I made an assumption that the changes/workaround config would also work for the Scraper. I was testing the Scraper and am now getting different Identity issues.

Will spend some time looking at the Azure Monitor auth flow and see if I can find a resolution.

[20:56:50 FTL] Failed to scrape resource for metric 'azure_storageaccount_success_server_latency_api_name_average'
System.Net.Http.HttpRequestException: Code: 400 ReasonReasonPhrase: Bad Request Body: {"error":"invalid_request","error_description":"Identity not found"}
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.RetrieveTokenFromIMDSWithRetryAsync(String resource, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetTokenFromIMDSEndpointAsync(String resource, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetAuthenticationHeaderForVirtualMachineAsync(String resource, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetAuthenticationHeaderAsync(CancellationToken cancellationToken)
at Microsoft.Rest.TokenCredentials.ProcessHttpRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.AzureCredentials.ProcessHttpRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsOperations.ListWithHttpMessagesAsync(String resourceUri, String metricnamespace, Dictionary`2 customHeaders, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsOperationsExtensions.ListAsync(IMetricDefinitionsOperations operations, String resourceUri, String metricnamespace, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsImpl.ListByResourceAsync(String resourceId, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsImpl.Microsoft.Azure.Management.Monitor.Fluent.IMetricDefinitions.ListByResourceAsync(String resourceId, CancellationToken cancellationToken)
at Promitor.Integrations.AzureMonitor.AzureMonitorClient.GetMetricDefinitionsAsync(String resourceId) in /src/Promitor.Integrations.AzureMonitor/AzureMonitorClient.cs:line 128
at Promitor.Integrations.AzureMonitor.AzureMonitorClient.QueryMetricAsync(String metricName, String metricDimension, AggregationType aggregationType, TimeSpan aggregationInterval, String resourceId, String metricFilter, Nullable`1 metricLimit) in /src/Promitor.Integrations.AzureMonitor/AzureMonitorClient.cs:line 84
at Promitor.Core.Scraping.AzureMonitorScraper`1.ScrapeResourceAsync(String subscriptionId, ScrapeDefinition`1 scrapeDefinition, TResourceDefinition resourceDefinition, AggregationType aggregationType, TimeSpan aggregationInterval) in /src/Promitor.Core.Scraping/AzureMonitorScraper.cs:line 54
at Promitor.Core.Scraping.Scraper`1.ScrapeAsync(ScrapeDefinition`1 scrapeDefinition) in /src/Promitor.Core.Scraping/Scraper.cs:line 78

@dks0296586 dks0296586 changed the title Promitor to support Azure Workload Identity with UserDefinedIdentity Promitor to support Azure Workload Identity with UserAssignedManagedIdentity Feb 3, 2023
@tomkerkhove
Copy link
Owner

Maybe this is also because of #2160 & #2209

@dks0296586
Copy link
Author

It looks to be due to Azure Monitor integration using Microsoft.Azure.Management.ResourceManager.Fluent.Authentication instead of the Azure SDK/Azure Identity library like Resource Discovery is using.

There is a way to get it working with Scraper by using the Azure Workload Identity proxy sidecar

@tomkerkhove
Copy link
Owner

Interesting, thanks for sharing!

@davecaplinger
Copy link

davecaplinger commented Jul 21, 2023

I'm having the same issue - workload-identity with resource-discovery works great using:

podLabels:
  azure.workload.identity/use: "true"
rbac:
  serviceAccount:
    create: false
    name: workload-identity-sa
azureAuthentication:
  mode: UserAssignedManagedIdentity

but similar settings for scraper don't work (note that I have to also explicitly add the clientId for the workload identity, and provide tenantId and a default subscriptionId, or startup validation will fail):

podLabels:
  azure.workload.identity/use: "true"
rbac:
  serviceAccount:
    create: false
    name: workload-identity-sa
azureAuthentication:
  mode: UserAssignedManagedIdentity
  identity:
    id: 00000000-0000-0000-0000-000000000000
azureMetadata:
  tenantId: 00000000-0000-0000-0000-000000000000
  subscriptionId: 00000000-0000-0000-0000-000000000000

and I get Multiple user assigned identities exist, please specify the clientId / resourceId of the identity in the token request:

[14:13:16 FTL] Failed to scrape resource for metric 'azure_service_bus_active_messages'
System.Net.Http.HttpRequestException: Code: 400 ReasonReasonPhrase: Bad Request Body: {"error":"invalid_request","error_description":"Multiple user assigned identities exist, please specify the clientId / resourceId of the identity in the token request"}
   at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.RetrieveTokenFromIMDSWithRetryAsync(String resource, CancellationToken cancellationToken)
   at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetTokenFromIMDSEndpointAsync(String resource, CancellationToken cancellationToken)
   at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetAuthenticationHeaderForVirtualMachineAsync(String resource, CancellationToken cancellationToken)
   at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetAuthenticationHeaderAsync(CancellationToken cancellationToken)
   at Microsoft.Rest.TokenCredentials.ProcessHttpRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.AzureCredentials.ProcessHttpRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsOperations.ListWithHttpMessagesAsync(String resourceUri, String metricnamespace, Dictionary`2 customHeaders, CancellationToken cancellationToken)
   at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsOperationsExtensions.ListAsync(IMetricDefinitionsOperations operations, String resourceUri, String metricnamespace, CancellationToken cancellationToken)
   at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsImpl.ListByResourceAsync(String resourceId, CancellationToken cancellationToken)
   at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsImpl.Microsoft.Azure.Management.Monitor.Fluent.IMetricDefinitions.ListByResourceAsync(String resourceId, CancellationToken cancellationToken)
   at Promitor.Integrations.AzureMonitor.AzureMonitorClient.GetMetricDefinitionsAsync(String resourceId) in /src/Promitor.Integrations.AzureMonitor/AzureMonitorClient.cs:line 128
   at Promitor.Integrations.AzureMonitor.AzureMonitorClient.QueryMetricAsync(String metricName, String metricDimension, AggregationType aggregationType, TimeSpan aggregationInterval, String resourceId, String metricFilter, Nullable`1 metricLimit) in /src/Promitor.Integrations.AzureMonitor/AzureMonitorClient.cs:line 84
   at Promitor.Core.Scraping.AzureMonitorScraper`1.ScrapeResourceAsync(String subscriptionId, ScrapeDefinition`1 scrapeDefinition, TResourceDefinition resourceDefinition, AggregationType aggregationType, TimeSpan aggregationInterval) in /src/Promitor.Core.Scraping/AzureMonitorScraper.cs:line 54
   at Promitor.Core.Scraping.Scraper`1.ScrapeAsync(ScrapeDefinition`1 scrapeDefinition) in /src/Promitor.Core.Scraping/Scraper.cs:line 78

Unfortunately I can't get the workload identity proxy sidecar workaround to work (I think because I don't have the legacy pod identity support in my AKS cluster).

I'm not a C# developer at all, but I suspect that if it were possible to get to the default case in this switch statement and rely on DefaultAzureCredential() then it might "just work" as a result of the environment variables that Workload Identity injects (AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_FEDERATED_TOKEN_FILE, and AZURE_AUTHORITY_HOST):

            switch (authenticationInfo.Mode)
            {
                case AuthenticationMode.ServicePrincipal:
                    tokenCredential = new ClientSecretCredential(tenantId, authenticationInfo.IdentityId, authenticationInfo.Secret, tokenCredentialOptions);
                    break;
                case AuthenticationMode.UserAssignedManagedIdentity:
                    var clientId = authenticationInfo.GetIdentityIdOrDefault();
                    tokenCredential = new ManagedIdentityCredential(clientId, tokenCredentialOptions);
                    break;
                case AuthenticationMode.SystemAssignedManagedIdentity:
                    tokenCredential = new ManagedIdentityCredential(options:tokenCredentialOptions);
                    break;
                default:
                    tokenCredential = new DefaultAzureCredential();   // <-- doesn't appear to be possible to get here, but I think this might work
                    break;
            }

(The reason I think this might work is because I've used Azure Identity's Python SDK with user-assigned managed identity and DefaultAzureCredential() with no problems.)

@tomkerkhove
Copy link
Owner

Based on the configuration, it should use ManagedIdentityCredential though which is reflected in the logs from above:

at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.RetrieveTokenFromIMDSWithRetryAsync(String resource, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetTokenFromIMDSEndpointAsync(String resource, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetAuthenticationHeaderForVirtualMachineAsync(String resource, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetAuthenticationHeaderAsync(CancellationToken cancellationToken)
at Microsoft.Rest.TokenCredentials.ProcessHttpRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.AzureCredentials.ProcessHttpRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsOperations.ListWithHttpMessagesAsync(String resourceUri, String metricnamespace, Dictionary`2 customHeaders, CancellationToken cancellationToken)

This is using aad identity though, not workload identity. I haven't used the proxy though, maybe @dks0296586 can help a bit on this?

@dks0296586
Copy link
Author

@davecaplinger
Here are the labels and annotations I needed to use for the sidecar proxy to work

securityContext:  
  runAsNonRoot: false #Required for Azure Workload Identity Proxy Injection  
podLabels:  
  azure.workload.identity/use: "true"  
annotations:  
  azure.workload.identity/inject-proxy-sidecar: "true"  
  azure.workload.identity/proxy-sidecar-port: "8080"  

I don't believe you need to have aad-pod-identity configured to make use of the workload identity sidecar. I think since the resource discovery is working with workload identity, its just a matter of getting the sidecar proxy working correctly.

@davecaplinger
Copy link

I was missing the securityContext setting; I'll give that a shot. Thanks!

@bartwitkowski
Copy link

Hi, we have same problem, but using securityContext .runAsNonRoot = false it works.

Unfortunately, this is not safe solution to add root privileges to the container.
We have Azure Defender on AKS firing alerts (recommendations) when using this setting.

Is there any way to not use it? Why agent-scraper needs root privileges on the node?

@tomkerkhove @dks0296586 any news on that?

@tomkerkhove
Copy link
Owner

No, unfortunately I am not actively contributing code anymore but happy to review PRs.

Learn more on https://blog.tomkerkhove.be/2023/12/09/the-future-of-promitor/

@jayendranarumugam
Copy link

Hi @dks0296586 / @davecaplinger , I have raised a PR #2578 for introducing SdkDefault which will use the DefaultAzureCreds, not sure whether this will directly help but seeing from the discussion this might be relevant perhaps !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature requests
Projects
None yet
Development

No branches or pull requests

5 participants