Chat System
Functional requirements for a chat application include the following features:
- One-on-one chat with low latency to ensure a fast and responsive communication experience.
- Small group chat that allows up to 100 people to participate and exchange text, images, and videos.
- "Last seen" and online presence status to show users when their contacts were last active on the platform.
- Multi-device support so that users can log into the same account from multiple devices simultaneously.
- Push notifications to alert users of new messages and keep them up-to-date with their chats.
Non-functional requirements for a chat application include the following performance objectives:
- Low latency: The chat application should have minimal delay in delivering messages to ensure a real-time communication experience.
- High availability: The platform should be available and accessible at all times to ensure seamless communication.
- No lag: The chat application should have no lag or delay in performance, even during high traffic times.
- Scalability: The chat application should be capable of scaling to support 2 billion users, 1.5 billion monthly active users, and handle 60 billion messages per day. This means the system should be designed to handle large amounts of data and users while maintaining high performance and reliability.

Throughout the years, different methods have been used to establish a server-initiated connection, including polling, long polling, and WebSocket. HTTP with keep-alive can be an effective solution for maintaining a persistent connection with the chat service. This technique reduces the number of TCP handshakes and has been used by popular chat applications, such as Facebook. Polling involves the client repeatedly asking the server if there are any new messages. This technique can be resource-intensive and inefficient, as the server is often answering "no" to the client's inquiries. Long polling involves the client holding the connection open until new messages are available or a timeout is reached. However, this technique has some limitations, such as a lack of connection between the sender and receiver if a round-robin load balancing is used, difficulties detecting when a client has disconnected, and inefficiency due to constant periodic connections even when the user is not actively chatting. WebSocket provides a bi-directional and persistent connection that starts as a HTTP connection and can be upgraded to a WebSocket connection via a well-defined handshake. This persistent connection allows the server to send updates to the client and is generally not impacted by firewalls, as it uses the same ports as HTTP/HTTPS connections. Although HTTP is suitable for sending messages on the sender side, there is no technical reason not to use WebSocket for sending as well, given its bidirectional capabilities.
The chat system includes several components to provide its functionalities:
- Load Balancers - to uniformly distribute our load to WebSocket Handler and other services.
- WebSocket Handlers - It keeps a bi-directional connection with a certain user, it sends an events each time there is a new message for the user it maps to, leading to message appearing on user device. It also caches the mapping of connections and maintains the mapping for other users (to know where to send a message) for a short time.
- WebSocket Manager - This component manages the mapping of users to WebSockets. It sits in front of distributed cache (Redis) which store the mapping between users and websocket handlers.
- Message Service - This service is responsible for storing and retrieving messages and sending group message events (or any other events we're interested in) to Kafka cluster. It has a connection to Nosql database where the messsages are being stored. To make the right choice between relational databases and NoSQL databases, we need to consider the types of data and their read/write patterns. In the case of one-on-one chat apps, the ratio of reads to writes is approximately 1:1. Relational databases can struggle with large amounts of data, especially with large indexes leading to slow random access. For this reason, key-value stores are often used in reliable chat applications such as Facebook Messenger, which uses HBase, and Discord, which uses Cassandra.
- Media service to handle messages containing video/images connectiong to CDN and S3 storage. Frequenly accessed images will be stored inside CDN for fast retrieval. It makes sense to calculate multiple hashes of images (using for example identificator vectors/matrices) and then compare if we already have the image stored so we don't waste storage for duplication of media.
- User service is responsible for handling all user related data, like user profile, settings and permissions. It sits on top of relational database like Mysql.
- Group service holds information about groups like admin of groups, list of users and similar. Both User and Group service have connection to distributed cache to speed up the retrieval of frequenly accessed information.
- Group Message Handler listens for events from Kafka cluster and when it receives and event for sending a group message, it contact Group Service to get list of users belonging to the group and it calls WebSocket manager to get list of WebSocket Handlers to send the message to all users in the specific group.
- Analytics service collects events for each activity of a users and it can do any precomputation, batching before sending an event to Kafka cluster.
- Activity Log it's interface for archiving all the activity logs in a Nosql database (we could use column-wide types suitable for analytics).
- Spark and Hadoop cluster ran computation intensive operations using stream/batch processing approaches to get an insight in the message sent, events, trends and so on.
- Presence service listens to heartbeat from users and keep the online status updated in Nosql database.
To ensure unique message identification and sequencing, a local sequence number generator is used. This means that the IDs are only unique within a specific group. This approach works well because maintaining the sequence of messages in one-on-one or group channels is sufficient. Since we have to also ensure multi-device support each device also keep the ID of the latest message and it's own ID, when the new batch of message is received from the WebSocket Handler only message which have higher ID than the latest message recorded on the device are considered new. For the monitoring and horizontal scalling when we have a load increase we have the same situation as in Social Network.