Skip to main content

Notification System


Notifications have become a staple in many applications in recent years, serving as a means of delivering critical information such as breaking news, updates, events, and promotions to users. This has made them an integral part of our daily routine. You are tasked with designing a notification system that encompasses more than just mobile push notifications. Three common notification formats include mobile push notifications, SMS messages, and emails.

Constructing a notification system capable of sending millions of notifications daily is a complex challenge. It requires an in-depth understanding of the notification landscape. The interview question is intentionally left open-ended and vague, requiring the candidate to ask clarifying questions.

We first could ask about the types of notifications supported, let's assume push notifications, SMS messages, and emails. Also let's assume that we're prioritizing quick delivery, but tolerating a slight delay during high workloads. System should accommodate iOS devices, Android devices, and laptops/desktops. Notification can be triggered by client applications or scheduled on the server-side and a user can opt-out frm receieving any notifications. For the daily volume let's assume we have 5million push notifications, 500k emails, and 3 million emails. As for the non-functional requirements we want a scalable, highly available and fault tolerant distribute system.

To begin, we will examine how each notification type operates at a high-level overview. To send an iOS push notification, three main components are necessary:

  • Provider: This component creates and sends notification requests to the Apple Push Notification Service (APNS). To build a push notification, the provider provides device token (unique identifier) and payload (JSON containing notification information)
  • APNS: A remote service provided by Apple to distribute push notifications to iOS devices.
  • iOS Device: The end client that receives push notifications.

For Android devices, the notification flow is similar. Instead of using APNS, Firebase Cloud Messaging (FCM) is frequently used to send push notifications. SMS messages are often sent using third-party SMS services like Twilio or Nexmo. Most of these services are commercial. While companies can set up their own email servers, many choose to use commercial email services like Sendgrid, which offers improved delivery rates and data analytics.

The process of sending notifications requires gathering user contact information, such as mobile device tokens, phone numbers, or email addresses. This information is collected by API servers and stored in a database when a user installs the app or signs up for the first time. Services, which can be a micro-service, cron job, or distributed system, trigger the notification sending events. For example, a billing service sends email reminders for due payments and a shopping website sends SMS messages about package delivery. The notification system, which is the core of the process, provides APIs for services and builds notification payloads for third-party services. Third-party services are responsible for delivering notifications to users and must be flexible enough to integrate or disconnect easily. However, it is important to consider that a third-party service might not be available in all markets. Users receive notifications on their iOS, Android, SMS, or email devices.

With only one instance of notification service we face problems like single point of failure, difficulty in scaling, and performance bottlenecks. To improve the design, the database and cache (check Facebook Memcached and Amazon Aurora to learn more about how we design distributed cache and databases) were moved out of the notification server and we use them to store metadata (info about notifications, users, settings), more notification servers were added with automatic horizontal scaling, and message queues were introduced to decouple system components.

image

In the improved design, services use APIs provided by notification servers to send notifications. The notification servers carry out basic validations, fetch metadata from the cache or database, and place notification data in message queues for processing. They also hide the logic of TLS termination, deduplication of requests, decryption of requests and encryption of responses. Cached information includes user info, device info, and notification templates. The database stores data about users, notifications, and settings. Message queues serve as buffers and allow for parallel processing of notifications, with each notification type assigned its own queue. Note that we could have also gone with SQL or NoSQL database or distributed cache option for the temporary storage which will keep the requested notification we want to bring to our clients. Since there are no casual relationship between notifications we don't need any complex quering or transactional databases thus a message queue will do just fine. Workers pull notification events from message queues and send notifications to third-party services, which then deliver them to user devices. We could have autoscalling of workers, we add additional workers when we experience peek of traffic, or remove some if there is a decrease in requests.

The crucial aspect of a notification system is its ability to prevent data loss. While delays or reordering in notifications can occur, complete loss of data must be avoided. This is accomplished by storing notification information in a database and incorporating a retry mechanism. The database serves as a persistent log to ensure data survival (this is something like WAL write-ahead log we had for database transactions to prevent data loss). The delivery of exactly one notification cannot be guaranteed in a distributed system as duplicates can arise. To minimize this occurrence, deduplication methods and idempotency are implemented. However, there is no assurance of "exactly once" delivery as network partitions or prolonged wait for acknowledgements can result in duplicate messages. To prevent users from being inundated with notifications, there may be a limit imposed on the number of notifications a user can receive. This is crucial as excessive notifications can lead to users disabling notifications completely (Check out Rate Limiter). If a third-party service fails to send a notification, it will be added to the message queue for retrying. If the issue persists, an alert will be sent to the developers. Push notification APIs for iOS or Android apps use appKey and appSecret for security purposes. Only authorized clients are allowed to send push notifications via our APIs. To comprehend customer behavior, it is important to track metrics such as click rate. The analytics service implements event tracking, and integration with the notification system is usually necessary. Check out Tiny URL for more information on how we can have analytics flow in our system.