Where to go from here
Designing a scalable, fault-tolerant and high-performance system is no small feat. Here are a few tips:
-
Lambda reserved and provisioned concurrency.
-
Lambda failed destinations, dead-letter queue, invisible window, reporting batch item failures.
-
Does the ordering of messages matter?
-
Do all events go to the same SNS queue, or different queues because they are high volume?
-
Lambda has a maximum timeout of 15 minutes. Does your task take longer than that? Do you need a long-running solution like Docker to process your tasks?