Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behavior and unable to update aws.ssm.Document #2555

Closed
aureq opened this issue Jun 8, 2023 · 20 comments · Fixed by #3353
Closed

Inconsistent behavior and unable to update aws.ssm.Document #2555

aureq opened this issue Jun 8, 2023 · 20 comments · Fixed by #3353
Assignees
Labels
area/providers awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). customer/feedback Feedback from customers kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Milestone

Comments

@aureq
Copy link
Member

aureq commented Jun 8, 2023

What happened?

(context from customer)

I have a simple Pulumi app that deploys an AWS SSM Document (Automation Runbook). When the Runbook doesn't exist, it is created as expected ✅.

However, if the YAML data is changed, then Pulumi fails to update the document ❌. The AWS API returns the following message.

error: 1 error occurred:
        * updating urn:pulumi:case-3012::zendesk::aws:ssm/document:Document::nodeBuildRunbook-doc: 1 error occurred:
        * updating SSM Document (nodeBuildRunbook): ValidationException: 1 validation error detected: Value at 'documentVersion' failed to satisfy constraint: Member must satisfy regular expression pattern: ([$]LATEST|[$]DEFAULT|[$]APPROVED|[$]PENDING_REVIEW|^[1-9][0-9]*$)
        status code: 400, request id: bad1c566-66aa-40cd-b474-17076ccff4f4

Furthermore, running the Pulumi app a second time, Pulumi returns no error. However, the SSM Document isn't updated either ‼ (I checked in the AWS Console). This makes things really complicated.

I've attached the verbose logs (-v=9) in case there's any relevant information.

logs-error.txt
logs-no-error.txt

Finally, to remove some possible confusion. The YAML data may contain a documentVersion field (under inputs) but setting or removing this field doesn't change anything.

Expected Behavior

  1. The Automation Runbook Document should be updated based on the updated YAML data.
  2. If the update fails, then Pulumi should consistently return an error.

Steps to reproduce

  1. Create a new Pulumi app (files provided below).
  2. Run the pulumi app pulumi up. The deployment should finish as expected. A new SSM document is visible in the AWS Console.
  3. Update the runbook.yaml and change a description field.
  4. Run pulumi up. The command ends with the error shown above ❌.
  5. Run pulumi up a second time (no changes). The command completes and it seems the Document was updated but the console shows the same previous content ‼.

Output of pulumi about

CLI          
Version      3.69.0
Go Version   go1.20.4
Go Compiler  gc

Plugins
NAME    VERSION
aws     5.41.0
awsx    1.0.2
docker  3.6.1
nodejs  unknown

Host     
OS       debian
Version  11.7
Arch     x86_64

This project is written in nodejs: executable='/usr/local/bin/node' version='v18.16.0'

Current Stack: menfin/zendesk/case-3012

TYPE                           URN
pulumi:pulumi:Stack            urn:pulumi:case-3012::zendesk::pulumi:pulumi:Stack::zendesk-case-3012
pulumi:providers:aws           urn:pulumi:case-3012::zendesk::pulumi:providers:aws::default_5_41_0
aws:ssm/parameter:Parameter    urn:pulumi:case-3012::zendesk::aws:ssm/parameter:Parameter::newRelicWinServiceConfig
aws:iam/role:Role              urn:pulumi:case-3012::zendesk::aws:iam/role:Role::ssmAutomation-role
aws:ssm/parameter:Parameter    urn:pulumi:case-3012::zendesk::aws:ssm/parameter:Parameter::newRelicWinEventConfig
aws:ssm/document:Document      urn:pulumi:case-3012::zendesk::aws:ssm/document:Document::nodeBuildRunbook-doc
aws:iam/rolePolicy:RolePolicy  urn:pulumi:case-3012::zendesk::aws:iam/rolePolicy:RolePolicy::eventBridge-nodeBuildRunbook-policy


Found no pending operations associated with case-3012

Backend        
Name           pulumi.com
URL            https://app.pulumi.com/aureq
User           aureq
Organizations  aureq, team-ce, menfin, menfin-team, demo, pulumi

Dependencies:
NAME            VERSION
@pulumi/pulumi  3.69.0
@types/node     16.18.34
handlebars      4.7.7
@pulumi/aws     5.41.0
@pulumi/awsx    1.0.2

Pulumi locates its logs in /tmp by default

Additional context

package.json

{
    "name": "l",
    "main": "index.ts",
    "devDependencies": {
        "@types/node": "^16"
    },
    "dependencies": {
        "@pulumi/aws": "^5.0.0",
        "@pulumi/awsx": "^1.0.0",
        "@pulumi/pulumi": "^3.0.0",
        "handlebars": "^4.7.7"
    }
}

index.ts

import * as aws from "@pulumi/aws";
import * as handlebars from 'handlebars';
import * as fs from 'fs';
import * as path from 'path';
import * as pulumi from '@pulumi/pulumi';

/**
 * Function to Handle All the resources need for instances entering Warm Pool
 * we well as Launching directly into the Active Pool.
 *
 * @param instanceLaunchLch
 * @param ssmAutomationRoleArn
 * @param eventBridgeRole
 * @param opts
 */

const ssmAutomationRole = new aws.iam.Role('ssmAutomation-role', {
    name: 'ssmAutomation-role',
    assumeRolePolicy: JSON.stringify({
        Version: '2012-10-17',
        Statement: [{
            Action: 'sts:AssumeRole',
            Effect: 'Allow',
            Sid: '',
            Principal: {
                Service: 'ssm.amazonaws.com',
            },
        }],
    }),
});


export const runbook = (ssmAutomationRole: aws.iam.Role) => {
    // Create File Object
    const runbookFile = handlebars.compile(
        fs.readFileSync(path.join(__dirname, './runbook.yml'), 'utf8'),
    );

    // Create SSM Parameter(s) for NewRelic Config
    const newRelicWinEventConfigParam = new aws.ssm.Parameter('newRelicWinEventConfig', {
        name: '/platform/newRelicWinEventConfig',
        type: 'String',
        value: 'test new relic event config value'
    });

    const newRelicWinServiceConfigParam = new aws.ssm.Parameter('newRelicWinServiceConfig', {
      name: '/platform/newRelicWinServiceConfig',
      type: 'String',
      value: 'test new relic service config value'
    });

  // Inject Parameters into Template
    const nodeBuildRunbookTemplate = pulumi.all([ssmAutomationRole.arn, newRelicWinEventConfigParam.name, newRelicWinServiceConfigParam.name,]).apply(([ssmRoleArn, newRelicWinEventConfigParamName, newRelicWinServiceConfigParamName,]) => {
        return runbookFile({
            assumeRoleArn: ssmRoleArn,
            newRelicWinEventConfigParamName,
            newRelicWinServiceConfigParamName,
            newRelicLicenseKey: '1234567890123456789',
            newRelicQueueDepth: '1000'
        });
    });

    // Create the SSM Automation Runbook
    const nodeBuildRunbookDoc = new aws.ssm.Document('nodeBuildRunbook-doc', {
        name: 'nodeBuildRunbook',
        content: nodeBuildRunbookTemplate,
        documentType: 'Automation',
        documentFormat: 'YAML',
    });

    const policyStatementResource = pulumi.all([nodeBuildRunbookDoc.arn]).apply(([arn]) => {
        return `${arn.replace('document', 'automation-definition',)}:${'$DEFAULT'}`
    });

    /*
    * This Role Policy grants permission to the Event Bridge Role to invoke
    * the Runbook for Building Instances.
    */
    new aws.iam.RolePolicy('eventBridge-nodeBuildRunbook-policy',{
        name: 'eventBridge-nodeBuildRunbook-policy',
        role: ssmAutomationRole.id,
        policy: {
            Version: '2012-10-17',
            Statement: [{
                Action: ['ssm:StartAutomationExecution'],
                Effect: 'Allow',
                Resource: policyStatementResource,
            }],
        },
    });

    return nodeBuildRunbookDoc;
};


// Emulating external call to function from parent typescript file in full project
runbook(ssmAutomationRole)

runbook.yaml

description: |
  *Some description*

schemaVersion: '0.3'
assumeRole: '\{{ AutomationAssumeRole }}'
parameters:
  AutomationAssumeRole:
    type: String
    default: '{{ assumeRoleArn }}'
    description: (Required) The ARN of the role that allows automation to perform the actions on your behalf.
  InstanceId:
    type: String
    description: (Required) AMI Source EC2 instance ID
  Region:
    type: String
    description: AWS Region
    default: 'us-west-2'
mainSteps:
  ##############################################################################
  - name: 'Wait_for_SSM_Agent'
    description: SSM Agent Needs to be Ready
    action: aws:waitForAwsResourceProperty
    timeoutSeconds: 3600
    inputs:
      Service: ssm
      Api: DescribeInstanceInformation
      InstanceInformationFilterList:
        - key: InstanceIds
          valueSet: ['\{{ InstanceId }}']
      PropertySelector: '$..PingStatus'
      DesiredValues:
        - Online
    isCritical: 'true'

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@aureq aureq added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Jun 8, 2023
@thomas11
Copy link
Contributor

thomas11 commented Jun 9, 2023

Hi @aureq, I'm not quite sure yet what's happening here but I notice the error is about the documentVersion property which your YAML runbook does not specify. I suspect we make a mistake sending some invalid default value. Could you try explicitly specifying a documentVersion like $LATEST?

@thomas11 thomas11 added awaiting-feedback Blocked on input from the author and removed needs-triage Needs attention from the triage team labels Jun 9, 2023
@dpreble-cisco
Copy link

dpreble-cisco commented Jun 18, 2023

I have tried setting documentVersion to $LATEST to no avail. Could you please provide an example of a full SSM aws.ssm.Document with the content setting the documentVersion?

@thomas11
Copy link
Contributor

Hi @dpreble-cisco, we don't have a full working example, I'm afraid. Were you able to solve this or work around it in the meantime?

@dpreble-cisco
Copy link

@thomas11 Unfortunately, no. Revisiting it now as I need to make an update and I am feeling the burn.

@oliparcol
Copy link

I have the exact same issue with a Command document type. The strange thing about the missing documentVersion is that it seems more like a value returned by the endpoint: https://docs.aws.amazon.com/systems-manager/latest/APIReference/API_CreateDocument.html (but the regex doesn't match fully).

@thomas11
Copy link
Contributor

Hi @dpreble-cisco and @oliparcol, sorry you're still struggling with this. We'll take another look at Pulumi.

In the meantime, you could unblock yourself by creating the resource outside of Pulumi (console, cli, or SDK) and then importing it.

@thomas11 thomas11 added area/providers and removed awaiting-feedback Blocked on input from the author labels Jul 24, 2023
@seanlogan-wh
Copy link

Looks similar to hashicorp/terraform-provider-aws#31131

@mnlumi mnlumi added the customer/feedback Feedback from customers label Jul 27, 2023
@dgivens
Copy link

dgivens commented Aug 25, 2023

I'm currently working around this by adding the following resource options

const doc = new aws.ssm.Document(name, {
...
},
{
  replaceOnChanges: ["*"],
  deleteBeforeReplace: true,
}

@eli-fine-res
Copy link

We were able to work around the issue by downgrading to
pulumi==3.75.0
pulumi-aws==5.32.0

@thomas11
Copy link
Contributor

thomas11 commented Sep 1, 2023

We were able to work around the issue by downgrading to
pulumi==3.75.0
pulumi-aws==5.32.0

Hi @eli-fine-res, that's a great data point. What versions did you downgrade from?

@eli-fine-res
Copy link

we try to be on pulumi-aws==5.42

@eli-fine-res
Copy link

we looked back through the pulumi-aws changelog and found the version that was using an earlier version of the hasicorp provider than the one described in the link above as having the issue

@thomas11
Copy link
Contributor

thomas11 commented Sep 1, 2023

Thanks. I suspect that pulumi-aws 5.34.0 is the last one that doesn't have this issue, corresponding to upstream 4.60. Their next one 4.61 has this issue which looks related.

@mikhailshilkov mikhailshilkov added the awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). label Oct 23, 2023
@mark-bixler
Copy link

Is there any update on this? Was hoping v6.x would include a fix. It does not 😢

@jcefoli
Copy link

jcefoli commented Nov 1, 2023

Is there any update on this? Was hoping v6.x would include a fix. It does not 😢

No, v6 just added other bugs from Terraform upstream 🤦

@iwahbe iwahbe self-assigned this Nov 22, 2023
@VenelinMartinov
Copy link
Contributor

Possibly related, also has a repro: pulumi/pulumi#15066

@t0yv0 t0yv0 assigned t0yv0 and unassigned iwahbe Jan 5, 2024
@t0yv0
Copy link
Member

t0yv0 commented Jan 5, 2024

I have a narrowed down repro that succeeds in TF but fails in Pulumi:

#!/usr/bin/env bash

set -euo pipefail

export AWS_PROFILE=devsandbox
export PATH="/Users/t0yv0/code/pulumi-aws/bin:$PATH"

rm -rf "*-log.json"
rm -rf "*-state.json"

pulumi destroy --yes
pulumi config set step 1
pulumi up --yes --skip-preview
pulumi stack export > step1-state.json
pulumi config set step 2
PULUMI_DEBUG_GPRC="$PWD/up12-log.json" pulumi up --yes --skip-preview || echo "IGNORE"
pulumi stack export > step2-state.json
import * as aws from "@pulumi/aws";
import * as pulumi from '@pulumi/pulumi';

let config = new pulumi.Config();
let step = config.getNumber("step") || 1;

const doc = `
---
schemaVersion: "0.3"
description: Executes a patching event on the instance followed by a healthcheck
parameters:
  InstanceIds:
    type: StringList
    description: The instance to target
mainSteps:
  - name: InvokePatchEvent
    action: aws:runCommand
    inputs:
      DocumentName: AWS-RunPatchBaseline
      InstanceIds: "{{ InstanceIds }}"
      OutputS3BucketName: "{output_s3_bucket_name}"
      OutputS3KeyPrefix: "{STEP}"
      Parameters:
        Operation: Scan
`;

let content = doc.replace("{STEP}", String(step));

// Create the SSM Automation Runbook
const nodeBuildRunbookDoc = new aws.ssm.Document('nodeBuildRunbook-doc', {
    name: 'nodeBuildRunbook',
    content: content,
    documentType: 'Automation',
    documentFormat: 'YAML',
});

@t0yv0
Copy link
Member

t0yv0 commented Jan 5, 2024

In resourceDocumentUpdate:

			input := &ssm.UpdateDocumentInput{
				Content:         aws.String(d.Get("content").(string)),
				DocumentFormat:  aws.String(d.Get("document_format").(string)),
				DocumentVersion: aws.String(d.Get("default_version").(string)),
				Name:            aws.String(d.Id()),
			}

The error is coming out of this call:

			output, err := conn.UpdateDocumentWithContext(ctx, input)

Because Pulumi is passing an empty string "" to DocumentVersion during the update.

In DiffCustomizer:

				if d.HasChange("content") {
                                        ...
					if err := d.SetNewComputed("default_version"); err != nil {
						return err
					}

This code seems to be called by Pulumi and resets default_version to "".

Although in the state prior to the update it is available as:

                    "defaultVersion": "1",

This is a Computed field:

			"default_version": {
				Type:     schema.TypeString,
				Computed: true,
			},

@t0yv0
Copy link
Member

t0yv0 commented Jan 8, 2024

Using differential debugging I narrowed down the difference to pulumi/pulumi-terraform-bridge#1505

Under actual TF, PlanResourceChange is done and produces a triple (config, state, plan) to pass to ApplyResourceChange. At the time it is doing ApplyResoureChange, it is recovering a Diff object using DiffFromValues that does not run diff customizers, intentionally:

https://github.com/hashicorp/terraform-plugin-sdk/blob/master/helper/schema/grpc_provider.go#L1013a

t0yv0 added a commit to pulumi/pulumi-terraform-bridge that referenced this issue Jan 29, 2024
Fixes #1505

Requires pulumi/terraform-plugin-sdk#35

sdk-v2 bridge has a new option that changes the implementation of
resource lifecycle to go through TF SDKv2 gRPC methods.

```go
// Selectively opt-in resources that pass the filter to using PlanResourceChange. Resources are
// identified by their TF type name such as aws_ssm_document.
func WithPlanResourceChange(filter func(tfResourceType string) bool) providerOption { //nolint:revive
```

Methods from the TF SDKv2 lifecycle integrated with this flag:

- PlanResourceChange
- ApplyResourceChange
- ReadResource 
- ImportResourceState

Enables fixing:  pulumi/pulumi-aws#2555

Known differences: state returned by the new method will include
explicit entries `"foo": null` for properties that are known by the
schema but not set in the state. This seems to be benign for repeated
diffs and refresh applications.
t0yv0 added a commit that referenced this issue Jan 30, 2024
Fixes #2555 - aws.ssm.Document no longer gets into incorrectly failed state on updates.  The fix is propagated via pulumi-terraform-bridge update.
@pulumi-bot pulumi-bot added the resolution/fixed This issue was fixed label Jan 30, 2024
@t0yv0 t0yv0 added this to the 0.100 milestone Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/providers awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). customer/feedback Feedback from customers kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.