What is the reason for the duplicate records?
The Lambda function did not advance the Kinesis data stream pointer to the next record after the error.
The Lambda event source used asynchronous invocation.
The Lambda function did not handle the error, and the Lambda service attempted to reprocess the data.
The Lambda function did not keep up with the amount of data that was coming from the Kinesis data stream.
Explanations:
When a Lambda function processes records from Kinesis, it reads from a specific shard and processes records sequentially. If it encounters an error, the Lambda function does not move the pointer (checkpoint) automatically. However, this would typically result in reprocessing the same record, not duplicate records downstream.
Asynchronous invocation in Lambda only applies to specific types of events and is not the default behavior for Kinesis. The event source for Kinesis uses synchronous invocation, so this is not the reason for the duplicate records.
When a Lambda function encounters an error (e.g., parsing a malformed record), it fails to process the record. The Lambda service automatically retries the failed batch of records, causing the same records to be processed again, potentially resulting in duplicate records downstream.
While it is possible for Lambda to miss data if it doesn’t keep up with the data stream, this would not result in duplicate records. Instead, it would cause records to be missed or delayed, not reprocessed or duplicated.