Introduction
In dynamic cloud environments, resources are frequently created, modified, and deleted. Over time, this can lead to orphaned resources – resources that are no longer actively used but still consume costs and potentially pose security risks. Manually identifying and cleaning up these resources across multiple AWS accounts and regions is a tedious and error-prone task.
This article explores how to automate cloud resource cleanup using AWS CloudFormation StackSets and Lambda functions. We’ll leverage StackSets to deploy a cleanup Lambda function to multiple accounts and regions, enabling centralized management and consistent resource cleanup policies. This approach ensures that unused resources are identified and removed, optimizing costs and improving overall security posture.
Understanding the Components
Before diving into the implementation, let’s understand the key components involved:
AWS CloudFormation StackSets: StackSets allow you to create, update, or delete stacks across multiple AWS accounts and regions with a single operation. This is crucial for deploying our cleanup Lambda function consistently across our entire organization.
AWS Lambda: Lambda is a serverless compute service that allows you to run code without provisioning or managing servers. We’ll use Lambda to execute the resource cleanup logic.
IAM Roles and Policies: Appropriate IAM roles and policies are essential for granting the Lambda function the necessary permissions to identify and delete resources in each account and region.
Implementation Steps
Here’s a step-by-step guide to automating cloud resource cleanup:
1. Design the Cleanup Logic (Lambda Function):
The core of our automation is the Lambda function. This function will identify and delete unused resources based on predefined criteria. Here’s an example Python script using the Boto3 library to identify and delete unused EC2 instances:
import boto3
import datetime
def lambda_handler(event, context):
ec2 = boto3.client('ec2')
region = boto3.session.Session().region_name
print(f"Cleaning up unused EC2 instances in region: {region}")
# Define the inactivity threshold (e.g., 30 days)
inactivity_threshold = 30
# Get all EC2 instances
instances = ec2.describe_instances()
for reservation in instances['Reservations']:
for instance in reservation['Instances']:
instance_id = instance['InstanceId']
launch_time = instance['LaunchTime']
# Calculate the age of the instance
age = (datetime.datetime.now(datetime.timezone.utc) - launch_time).days
# Check if the instance is stopped and older than the threshold
if instance['State']['Name'] == 'stopped' and age > inactivity_threshold:
print(f"Deleting stopped instance: {instance_id}, launched on: {launch_time}")
try:
ec2.terminate_instances(InstanceIds=[instance_id])
print(f"Successfully terminated instance: {instance_id}")
except Exception as e:
print(f"Error terminating instance {instance_id}: {e}")
else:
print(f"Instance {instance_id} is either running or not older than the threshold.")
return {
'statusCode': 200,
'body': 'EC2 instance cleanup completed.'
}
Important Considerations:
- Resource Types: Extend this script to handle other resource types like EBS volumes, snapshots, S3 buckets, and more.
- Cleanup Criteria: Define specific criteria for identifying unused resources. This might include inactivity periods, specific tags, or resource states.
- Dry Run: Implement a “dry run” mode to simulate the cleanup process without actually deleting resources. This allows you to verify the cleanup logic before executing it in production.
- Error Handling: Implement robust error handling to gracefully handle exceptions and prevent the Lambda function from failing.
- Logging: Implement detailed logging to track the cleanup process and identify any issues.
2. Create an IAM Role for the Lambda Function:
The Lambda function needs an IAM role with permissions to describe and delete the target resources. Here’s an example IAM policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:TerminateInstances"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
}
]
}
3. Package and Deploy the Lambda Function:
Package the Python script and any dependencies into a ZIP file. Upload the ZIP file to an S3 bucket or directly deploy the Lambda function using the AWS CLI or CloudFormation.
4. Create a CloudFormation StackSet:
Create a CloudFormation template that defines the Lambda function and its associated IAM role. This template will be used by StackSets to deploy the function to multiple accounts and regions.
AWSTemplateFormatVersion: '2010-09-09'
Description: Deploys a Lambda function for cleaning up unused EC2 instances.
Parameters:
LambdaFunctionName:
Type: String
Description: The name of the Lambda function.
Default: EC2CleanupLambda
S3BucketName:
Type: String
Description: The S3 bucket where the Lambda function code is stored.
S3Key:
Type: String
Description: The S3 key of the Lambda function code.
Resources:
LambdaExecutionRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "${LambdaFunctionName}-ExecutionRole"
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: !Sub "${LambdaFunctionName}-Policy"
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- ec2:DescribeInstances
- ec2:TerminateInstances
Resource: "*"
- Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource: "arn:aws:logs:*:*:*"
CleanupLambdaFunction:
Type: AWS::Lambda::Function
Properties:
FunctionName: !Ref LambdaFunctionName
Handler: lambda_function.lambda_handler
Role: !GetAtt LambdaExecutionRole.Arn
Code:
S3Bucket: !Ref S3BucketName
S3Key: !Ref S3Key
Runtime: python3.9
Timeout: 300
MemorySize: 128
5. Deploy the StackSet:
Use the AWS CLI or CloudFormation console to create and deploy the StackSet. Specify the target AWS accounts and regions where you want to deploy the Lambda function. You’ll need to ensure that the account you’re using to deploy the StackSet has the necessary permissions to create roles and deploy resources in the target accounts. This typically involves setting up a trust relationship between the management account and the target accounts.
Example AWS CLI command:
aws cloudformation create-stack-set \
--stack-set-name ResourceCleanupStackSet \
--template-body file://cleanup-lambda.yaml \
--parameters ParameterKey=LambdaFunctionName,ParameterValue=EC2CleanupLambda ParameterKey=S3BucketName,ParameterValue=your-s3-bucket ParameterKey=S3Key,ParameterValue=lambda_function.zip \
--capabilities CAPABILITY_IAM
aws cloudformation create-stack-instances \
--stack-set-name ResourceCleanupStackSet \
--accounts '["111122223333", "444455556666"]' \
--regions '["us-east-1", "us-west-2"]'
6. Schedule the Lambda Function:
Use CloudWatch Events (now EventBridge) to schedule the Lambda function to run periodically (e.g., daily or weekly). This ensures that resource cleanup is performed automatically on a regular basis.
Example CloudWatch Event Rule:
{
"name": "ResourceCleanupSchedule",
"scheduleExpression": "cron(0 0 * * ? *)", // Runs daily at midnight UTC
