Finding S3 Batch Operations Failures in CloudTrail

At work, I recently got the distinct opportunity to copy millions of objects from one S3 bucket to another.

There are roughly a dozen separate ways to do this (as with everything in AWS), but the "right" way is to use an S3 Batch Operation to copy everything from an S3 Inventory Report.

The only problem with an S3 Batch Operation is that it fails in surprising and hidden ways, especially if there's a misconfigured IAM permission. For example, our most recent job was failing due to a missing KMS permission. To determine what the missing permission was, we would typically head to CloudTrail and hunt down the failed requests.

As S3 Batch Operations run as an assumed role, hunting these logs can be slightly more difficult, but we finally found the right way to accomplish it.

The first, most important, piece is to hunt down the S3 Batch Operation's Job ID. You'll find this on the details screen clear at the top.

S3 Batch Operation Job Details Screen

Next, shoot on over to CloudTrail and filter by User Name. The value you'll want to use is s3-batch-operations_{Job ID}, where {Job ID} is your S3 Batch Operation's Job ID retrieved in the previous step.

CloudTrail Filtering on Job ID User

Heads Up! If you've got KMS enabled for the Job, then you're going to get a whole heck of a lot of logs. I tend to download the logs to use Excel to look through it, but when you're moving more than 19 million records, you're going to have a bad time. If you need to dive into the reasons even more, I recommend using an Athena table.


In case anyone is curious, the missing permission for the role was kms:GenerateDataKey*. KMS is super fun.