Caching

Caching is a common technique used to enhance system performance by temporarily copying frequently accessed data to fast storage located near the application. It is most effective when an application instance repeatedly reads the same data, especially when the original data store is slow relative to the cache, subject to high contention, or far away, causing network latency to slow access. There are two main types of cache commonly used by cloud applications: in-memory cache and shared cache.

With in-memory cache, data is held locally on the computer running an instance of an application, while shared cache can be accessed by several instances of an application running on different computers. While shared caching ensures that different application instances see the same view of cached data, it can be slower to access, and implementing a separate cache service may add complexity to the solution.

Determining whether to prepopulate or on-demand load the cache, or a combination of both, requires performance testing and usage analysis. Caching may be less useful for dynamic data, as either the cached information can become stale very quickly or the overhead of keeping the cache synchronized with the original data store reduces the effectiveness of caching.

Some commercial caching solutions implement read-through and write-through caching, caching data on demand, and modifying data in the cache. When an application fetches data, the underlying cache service determines whether the data is currently held in the cache, and if not the cache service retrieves the data from the original data store and adds it to the cache before returning the data to the application. For systems that do not provide read-through and write-through caching, implementing the Cache-Aside pattern can emulate these caching strategies. In some scenarios, caching data that experiences high volatility without immediately persisting changes to the original data store can be advantageous.

Caches may expire, causing data to be removed from the cache and forcing the application to retrieve the data from the original data store. Cache services may also evict data on a least-recently-used (LRU) basis, or based on other policies.

Depending on the data's characteristics and the probability of collisions, two different concurrency approaches can be utilized:

Optimistic concurrency approach, which involves the application checking if the data in the cache has been updated before making changes. If the data is the same, the changes can be made, but if not, the application must decide whether to update it. This method is suitable for situations where updates are rare or the likelihood of collisions is low.

Pessimistic concurrency approach, which involves the application locking the data in the cache when retrieving it to prevent other instances from making changes. Although this method ensures that collisions cannot occur, it may block other instances that need to access the same data, potentially affecting solution scalability. It should be used only for brief operations. This method may be suitable for situations where collisions are more likely, particularly when an application updates multiple items in the cache and must ensure consistent application of these changes.