892 words
4 minutes
Automating Cloud Resource Cleanup with AWS CloudFormation StackSets and Lambda

Introduction#

In dynamic cloud environments, resources are frequently created, modified, and deleted. Over time, this can lead to orphaned resources – resources that are no longer actively used but still consume costs and potentially pose security risks. Manually identifying and cleaning up these resources across multiple AWS accounts and regions is a tedious and error-prone task.

This article explores how to automate cloud resource cleanup using AWS CloudFormation StackSets and Lambda functions. We’ll leverage StackSets to deploy a cleanup Lambda function to multiple accounts and regions, enabling centralized management and consistent resource cleanup policies. This approach ensures that unused resources are identified and removed, optimizing costs and improving overall security posture.

Understanding the Components#

Before diving into the implementation, let’s understand the key components involved:

  • AWS CloudFormation StackSets: StackSets allow you to create, update, or delete stacks across multiple AWS accounts and regions with a single operation. This is crucial for deploying our cleanup Lambda function consistently across our entire organization.

  • AWS Lambda: Lambda is a serverless compute service that allows you to run code without provisioning or managing servers. We’ll use Lambda to execute the resource cleanup logic.

  • IAM Roles and Policies: Appropriate IAM roles and policies are essential for granting the Lambda function the necessary permissions to identify and delete resources in each account and region.

Implementation Steps#

Here’s a step-by-step guide to automating cloud resource cleanup:

1. Design the Cleanup Logic (Lambda Function):

The core of our automation is the Lambda function. This function will identify and delete unused resources based on predefined criteria. Here’s an example Python script using the Boto3 library to identify and delete unused EC2 instances:

import boto3
import datetime

def lambda_handler(event, context):
    ec2 = boto3.client('ec2')
    region = boto3.session.Session().region_name
    print(f"Cleaning up unused EC2 instances in region: {region}")

    # Define the inactivity threshold (e.g., 30 days)
    inactivity_threshold = 30

    # Get all EC2 instances
    instances = ec2.describe_instances()

    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            instance_id = instance['InstanceId']
            launch_time = instance['LaunchTime']

            # Calculate the age of the instance
            age = (datetime.datetime.now(datetime.timezone.utc) - launch_time).days

            # Check if the instance is stopped and older than the threshold
            if instance['State']['Name'] == 'stopped' and age > inactivity_threshold:
                print(f"Deleting stopped instance: {instance_id}, launched on: {launch_time}")
                try:
                    ec2.terminate_instances(InstanceIds=[instance_id])
                    print(f"Successfully terminated instance: {instance_id}")
                except Exception as e:
                    print(f"Error terminating instance {instance_id}: {e}")
            else:
                print(f"Instance {instance_id} is either running or not older than the threshold.")

    return {
        'statusCode': 200,
        'body': 'EC2 instance cleanup completed.'
    }

Important Considerations:

  • Resource Types: Extend this script to handle other resource types like EBS volumes, snapshots, S3 buckets, and more.
  • Cleanup Criteria: Define specific criteria for identifying unused resources. This might include inactivity periods, specific tags, or resource states.
  • Dry Run: Implement a “dry run” mode to simulate the cleanup process without actually deleting resources. This allows you to verify the cleanup logic before executing it in production.
  • Error Handling: Implement robust error handling to gracefully handle exceptions and prevent the Lambda function from failing.
  • Logging: Implement detailed logging to track the cleanup process and identify any issues.

2. Create an IAM Role for the Lambda Function:

The Lambda function needs an IAM role with permissions to describe and delete the target resources. Here’s an example IAM policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:TerminateInstances"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        }
    ]
}

3. Package and Deploy the Lambda Function:

Package the Python script and any dependencies into a ZIP file. Upload the ZIP file to an S3 bucket or directly deploy the Lambda function using the AWS CLI or CloudFormation.

4. Create a CloudFormation StackSet:

Create a CloudFormation template that defines the Lambda function and its associated IAM role. This template will be used by StackSets to deploy the function to multiple accounts and regions.

AWSTemplateFormatVersion: '2010-09-09'
Description: Deploys a Lambda function for cleaning up unused EC2 instances.

Parameters:
  LambdaFunctionName:
    Type: String
    Description: The name of the Lambda function.
    Default: EC2CleanupLambda

  S3BucketName:
    Type: String
    Description: The S3 bucket where the Lambda function code is stored.

  S3Key:
    Type: String
    Description: The S3 key of the Lambda function code.

Resources:
  LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Sub "${LambdaFunctionName}-ExecutionRole"
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: !Sub "${LambdaFunctionName}-Policy"
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - ec2:DescribeInstances
                  - ec2:TerminateInstances
                Resource: "*"
              - Effect: Allow
                Action:
                  - logs:CreateLogGroup
                  - logs:CreateLogStream
                  - logs:PutLogEvents
                Resource: "arn:aws:logs:*:*:*"

  CleanupLambdaFunction:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: !Ref LambdaFunctionName
      Handler: lambda_function.lambda_handler
      Role: !GetAtt LambdaExecutionRole.Arn
      Code:
        S3Bucket: !Ref S3BucketName
        S3Key: !Ref S3Key
      Runtime: python3.9
      Timeout: 300
      MemorySize: 128

5. Deploy the StackSet:

Use the AWS CLI or CloudFormation console to create and deploy the StackSet. Specify the target AWS accounts and regions where you want to deploy the Lambda function. You’ll need to ensure that the account you’re using to deploy the StackSet has the necessary permissions to create roles and deploy resources in the target accounts. This typically involves setting up a trust relationship between the management account and the target accounts.

Example AWS CLI command:

aws cloudformation create-stack-set \
    --stack-set-name ResourceCleanupStackSet \
    --template-body file://cleanup-lambda.yaml \
    --parameters ParameterKey=LambdaFunctionName,ParameterValue=EC2CleanupLambda ParameterKey=S3BucketName,ParameterValue=your-s3-bucket ParameterKey=S3Key,ParameterValue=lambda_function.zip \
    --capabilities CAPABILITY_IAM

aws cloudformation create-stack-instances \
    --stack-set-name ResourceCleanupStackSet \
    --accounts '["111122223333", "444455556666"]' \
    --regions '["us-east-1", "us-west-2"]'

6. Schedule the Lambda Function:

Use CloudWatch Events (now EventBridge) to schedule the Lambda function to run periodically (e.g., daily or weekly). This ensures that resource cleanup is performed automatically on a regular basis.

Example CloudWatch Event Rule:

{
  "name": "ResourceCleanupSchedule",
  "scheduleExpression": "cron(0 0 * * ? *)",  // Runs daily at midnight UTC
Automating Cloud Resource Cleanup with AWS CloudFormation StackSets and Lambda
https://en.dymripper.com/posts/2025-05-24-automating-cloud-resource-cleanup-with-aws-cloudformation-stacksets-and-lambda/
Author
DYMripper
Published at
2025-05-24