Skip to main content

AWS Lambda Error Monitoring through Terraform using AWS Chatbot

· 4 min read

This post will detail how to configure AWS Chatbot to provide Slack notifications based on the error metrics of AWS Lambda functions.

What we will build at a high level is as follows:

  1. A simple Lambda Function
  2. A CloudWatch alarm to monitor errors of our Lambda Function
  3. An SNS Topic to receive alerts from the CloudWatch alarm and forward these alerts on
  4. Configure AWS Chatbot to subscribe to the SNS Topic and send alerts to Slack
  5. IAM permissions to tie everything together

Prerequisites:

An initialised Terraform directory
An AWS account with CLI credentials set up
A Slack Workspace

To start, we'll need to provide the initial configuration for AWS Chatbot in the AWS console. This involves provisioning the Chatbot service and providing some IAM permissions, as well as your Slack Workspace as a destination. Chatbot has the capability to allow you to run AWS CLI commands from your Slack Workspace - which could open up potential security concerns, so we'll restrict access to this functionality for this example. The full guide to initial Chatbot setup can be found here: https://docs.aws.amazon.com/chatbot/latest/adminguide/getting-started.html#setting-up

Now, let's jump into our Terraform code. Firstly, let's create a sample Lambda Function that we'll use. Create an src directory in your project and add an index.py file with the following code:

src/index.py
def lambda_handler():
print('hello from lambda')

We need an IAM Policy that will allow the Lambda to publish metrics and logs to CloudWatch:

main.tf
resource "aws_iam_policy" "lambda_policy" {
name = "example-lambda-policy"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"cloudwatch:PutMetricData"
]
Effect = "Allow"
Resource = [
"*"
]
}
]
})
}

We'll also need the IAM role itself for the Lambda Function to use, which references the IAM Policy:

main.tf
resource "aws_iam_role" "lambda_role" {
name = "example-lambda-role"
managed_policy_arns = [aws_iam_policy.lambda_policy.arn]
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Sid = ""
Principal = {
Service = "lambda.amazonaws.com"
}
},
]
})
}

Now we can create the Lambda Function itself, which will use the IAM Role:

main.tf
resource "aws_lambda_function" "test_lambda" {
filename = "src/index.py"
function_name = "hello-world"
role = aws_iam_role.lambda_role.arn
handler = "index.handler"
runtime = "python3.9"
}

Next, we can create an SNS Topic and a policy for that Topic, which will allow us to publish the messages it receives from CloudWatch.

main.tf
resource "aws_sns_topic" "lambda_errors_topic" {
name = "lambda_errors_topic"
}

resource "aws_sns_topic_policy" "lambda_errors_sns_topic_policy" {
arn = aws_sns_topic.lambda_errors_topic.arn
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"sns:Publish",
]
Effect = "Allow"
Resource = aws_sns_topic.lambda_errors_topic.arn,
Principal = {
Service : "cloudwatch.amazonaws.com"
}
},
]
})
}

Next, we can create a CloudWatch alarm that will trigger when the Lambda Function throws any errors:

main.tf
resource "aws_cloudwatch_metric_alarm" "lambda_alarm" {
alarm_name = "lambda-errors-alarm"
alarm_description = "Errors alarm for the lambda function"
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = 1
metric_name = "Errors"
namespace = "AWS/Lambda"
period = 3600
statistic = "Average"
threshold = 1
actions_enabled = "true"
alarm_actions = [aws_sns_topic.lambda_errors_topic.arn]
ok_actions = [aws_sns_topic.lambda_errors_topic.arn]
dimensions = {
FunctionName : "hello-world"
}
}

This code configures an alarm that will trigger in CloudWatch if one or more errors occur in our Lambda Function. The evaluation period is set to 3600 (one hour in seconds), meaning the errors metric will be evaluated against the rule once per hour. In the alarm_actions attribute, we've set the arn of our SNS Topic, meaning the alarm will forward the message on to the Topic.

Next, we need an IAM role that will allow the AWS Chatbot service to send messages to our Slack Workspace:

main.tf
resource "aws_iam_role" "chatbot_slack_configuration_role" {
name = "chatbot_slack_configuration_role"
assume_role_policy = <<-EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "chatbot.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
}

Finally, we can use a third party module which allows us to provide the configuration for our Chatbot service:

main.tf
module "chatbot_slack_configuration" {
source = "waveaccounting/chatbot-slack-configuration/aws"
version = "1.1.0"

configuration_name = "lambda_monitoring"
iam_role_arn = aws_iam_role.chatbot_slack_configuration_role.arn
logging_level = "NONE"
slack_channel_id = <YOUR_SLACK_CHANNEL_ID>
slack_workspace_id = <YOUR_SLACK_WORKSPACE_ID>

guardrail_policies = ["arn:aws:iam::aws:policy/AWSDenyAll"]
user_role_required = false

sns_topic_arns = [
aws_sns_topic.lambda_errors_topic.arn,
]
}

Now if we run Terraform plan, you should see a list of resources that Terraform will create. If all looks good, run Terraform apply and you should be alerted in Slack of any Lambda errors.