Event Driven Architecture (EDA) is an architectural concept that uses events to communicate between applications in an asynchronous fashion. Events are messages that will be created when a change of state happens, like a customer orders a product or if an application has completed a task.
In earlier days, applications used to talk to each other in Synchronous or Asynchronous mode, but in the world of distributed systems and micro-services, the event driven architecture has provided loose coupling and allowed multiple/disparate services to be integrated quickly and seamlessly.
Why the shift?
The Asynchronous communication also provides a loose coupling model, however the sender has no idea about the status of the request and would end up polling to get it. This would involve more code (when to poll, how many times and the interval to poll) and network calls, imagine if multiple senders are doing the same to a server. To prevent this, the server can notify the senders about the status preferably via a Message Queue in between and the reduce the network calls.
Loose coupling means that the event producer has no idea who the consumer is, and how many are there. Every event will contain information about the producer, so the consumer can pick it up from there. The events will have unique event id to cover idempotency and the schema can be decided by the systems beforehand (share only what is required). However, ordering and idempotency is not guarenteed so the consumers have to handle them accordingly and if multiple messages are received, the storing/caching has to be handled.
Components of EDA architecture:
Event Producer or Publisher generating events to a message queue or any broker
Event Consumer which are target services picking up those events
Event Bus which can be any implementation like AWS SQS/EventBridge or RabbitMQ
Event: The message generated by an application containing metadata and payload about the state change.
How AWS Eventbridge is better
EventBridge is a serverless event bus that has support for multiple producers and consumers, event transformation with filters and rules. EventBrigde has become an Orchestration layer between the applications which can read the metadata and determine who the target sources are and respect the TTL. Event can be categorized as command/data, query/status, request/task, response. Add traceability attributes in the events for observability. Because EventBridge can do trnasformations, this means no lambda required resulting in less code, less failures, less IAM rules, and monthly costs. Also, it has native integrations with AWS services and API destination feature to talk to external apps using baisc auth, api keys and OAuth with Secrets manager to store credentials (and cost absorbed by EventBridge). There is rate limiting, retry mechanism and timeouts feature. There are partner endpoints like DAtadog, MongoDB, NewRelic, Salesforce that support native EventBridge integrations. Also, the feature of archive and play (archive a message based on type and replay after certain time) events along with a Circuit breaker as a service. However there is no ordering and speed control
When things start getting messed up is multiple applications generating events and failures happens during event routing which means dead letter queues need to be implemented and managed. Processing of dead letter queues will need some lambda functions and eventually the system becomes hard coupled again.
Best practice: Keep a EventBridge for local services to talk to each other, and a gatekeeper EventBridge to talk cross-accounts. This is domain driven design, respecting the domain boundary. Enterprise wide event bus vs Domain level vs Bounded Context bus. Complexity vs Ownership vs Governance vs Rules Management . Use SQS when you need FIFO ordering and batch processing and throughput control.
Factors affecting reliability:
Broken network connection / planned downtime / volume of traffic / response times
When not to use EDA
When you have a sequence of tasks and any one of the task wants to make an API call and the next steps in the sequence is dependent on the response of API call. This needs a synchronous API call.
New Feature: Step functions can make HTTP calls directly.