Discover the intricacies of managing server-side persistent sessions in software development, from understanding their definition and use cases to navigating potential pitfalls. In this blog post, we delve into a hypothetical scenario, explore typical characteristics, pitfalls to avoid, and optimization strategies, providing a comprehensive overview of the challenges and solutions associated with server-side persistent sessions.
By Senne Dirkx
On Sun Oct 08 2023
In my relatively short career, I've encountered several tasks that demanded fixing or designing features falling under the umbrella of application-level server-side persistent sessions. While navigating through these challenges, I made a few mistakes. However, each misstep became a valuable lesson. Today, I'm excited to share these insights with you.
What are server-side persistent sessions?
In the context of software systems, a session refers to a period of interaction between a user and a software system. It could involve tracking user actions, storing temporary information, or ensuring the user remains logged in. Server-side persistent sessions are data objects that capture information or actions about users on the server and are persisted in a database. They are predominantly stateful, meaning that they store information about the user or the user's actions. It is important to note they are entirely different from client-side browser sessions or sticky sessions.
Opting for server-side persistent sessions can be motivated for numerous reasons. This includes hiding sensitive session information from the client-side, data collection for future analysis and service improvement, sharing user sessions between the user's many devices, authorization and access control policies and much more. They are widely used in authentication/authorization systems, the finance/banking sector, e-commerce applications, and highly regulated industries, among others.
Let's delve into a hypothetical scenario to illustrate the concept. Suppose you're designing a banking application where a user must perform a lengthy multi-step transaction that the user is only allowed to perform 20 times per month. Here's how a server-side persistent session could enhance the user experience:
The user could start the transaction on a mobile device but quit in the middle of it and continue it later that day on his personal computer.
The software system can block the completion of the session if it detects 20 completed sessions within the past month, preventing misuse.
The data can be processed at a later stage to determine how exactly users use your application.
Server-side persistent sessions exhibit distinctive characteristics—they are stored over extended periods, often spanning years, and sometimes indefinitely. However, their active phase is relatively short-lived, typically lasting only a few days, and in some cases, just minutes. This imparts a unique usage pattern to persistent sessions: they experience frequent updates in their initial stages, evolving into read-only entities as they reach their conclusion.
From my own experience, without targeted measures, this database query pattern is difficult for databases to optimize. The table will most likely become one of your system's largest. The more data it has, the more irrelevant data it has to scan through while doing queries. What makes things even worse is that sessions are usually updated on every user action. And if your system has countless users doing many actions, things can go bad fast for the overall system's performance.
Let's say our system persists server-side sessions containing the following properties/columns in a session database table:
ID(integer, auto-incrementing primary key)
SessionID(uuid, unique, stored on the user's client to relate user actions to a server-side session)
CreatedAt(timestamp, when the session was created)
UpdatedAt(timestamp, when the session was last updated)
NbPosts(integer, number of blog posts the user has visited during the session)
UserID(integer, the unique ID of the user associated with the session)
The user's client would store the
SessionID and send this along with every request the client makes. The system would try to fetch the session related with this
SessionID. If a session does not exist yet, the system would check for a previous user session (using the
UserID) and use that if it hasn't expired yet.
On each and every post the user opens in the system, the session's
NbPosts must be updated. A new session is created if the user's previous session was inactive for more than 1 hour. Let's say your system has 1000 daily active users with each user having 3 sessions per day and 10 posts visited in a session on average. This counts up to 30000 session updates per day. After 1 month, the system would already store 90000 sessions and after a year already over a million sessions would be persisted.
This setup emphasizes the intricate dynamics of server-side persistent sessions, shedding light on the nuanced challenges they pose in database management and performance optimization.
A good software engineer or database administrator would know that a database like this without carefully chosen indexes would not behave well. Therefore let's create an index on the
SessionID column and perhaps a composite index on the
UserId column together with the
UpdatedAt column. Perfectly reasonable, right? The index on the
SessionID will greatly improve lookups when the user provides a
SessionID and the composite index will improve searches for the last session for a user. However, we'll see that this is not a great idea.
An important pitfall of an index is that simply the drawbacks are not considered. I've often seen developers just add indexes on every column that (they think) might need an index... maybe. This is obviously not without a cost. Indexes can claim large chunks of storage space. To add tot that, the speed of inserts and updates is firmly reduced by adding indexes. With each row insertion, the database not only places it in the table but also in a strategically correct spot within the indexes.
Index on a highly volatile column
Adding an index on a column like
UpdatedAt is suboptimal. It's values are ever-increasing, which can actually help with insert performance. But the value itself is updated so many times for each follow-up user action that it's initial placement within the index is quickly outdated. This leads to page splits, a fundamental cause of index fragmentation.
Indexes on UUID columns like
SessionID are notorious for index fragmentation. This is because the inserted values are randomly generated and not ever-increasing, which some database engines have difficulty to effectively build indexes on. Regular index maintenance and rebuilding are strongly advised to mitigate this issue.
In navigating these pitfalls, it becomes evident that thoughtful consideration is crucial when deciding where to apply indexes. It's not just about the benefits they bring but also about understanding the potential drawbacks and finding a balance that optimizes both read and write operations. Now, let's delve into optimization strategies.
Caching is a great and (once set up) relatively easy-to-use mechanism to improve any data fetching operation. Tools like Redis and Memcached are exceptionally fast. When looking for a session given
SessionID or given a
UserId, a simple lookup in the cache will cost you practically nothing. And if it's a hit, you may completely skip the slowing database query and thus you are improving general performance of your application.
But like most things in life, there's some downsides. And with in-memory caching, one of them is... well, memory. Memory is a finite resource, and while an in-memory cache is advantageous, it's not limitless. To add on top of that, the bigger the cache, the slower it tends to be.
The second downside is that your session state in the cache will not always be in perfect sync with the sessions stored in the database. Depending on the cache writing policy, it may take some time for cache or database items to be fully updated. You often want to give the cache items a relatively small lifetime, so that the cache is updated using the database's state fairly regularly. Striking a balance between cache lifetime and consistency is essential, especially for applications requiring strong consistency.
Because there will probably be loads of sessions and a limited in-memory cache with items with a short lifetime, cache misses will often occur. Which means, the application will fall back to our slow database query. Thus, it's crucial to complement caching with improvements to the underlying queries.
Asynchronous state building
If you don't have the business requirement (such as an access policy) to create the sessions synchronously and you're already using an event-based architecture, it is likely a great idea to build these sessions asynchronously from transactional events. Every action the user makes will persist an event in your system. A background worker/daemon asynchronously follows the traces of these events to build the session entities from these events. This is what's known as event sourcing projections.
Frankly, this strategy just delays the (possibly inefficient) session fetching and updating to a later stage, but it can still drastically improve the user's experience on the application, because it is done in the background. A drawback of this approach is that you'll likely need to store many individual events that will eventually take up a considerably amount a storage. This is an example of the classic speed and memory trade off.
Split up the session table
To actually improve the underlying query's performance, we can split up the session table into two distinct tables. One is a lookup table, containing all expired and old sessions with the necessary indexes for fast lookup. The second one is a volatile and much smaller table solely used for creating and updating active sessions. If this table can stay relatively small, no indexes are necessary, improving insertion and deletions speeds. A background worker or daemon will move sessions once they are expired from the volatile table to the lookup table.
This approach offers the best of both worlds. Because there are no indexes, they will not be able to cause index fragmentation and they will not be able to slow down the insert and update operations. However, extensive querying can be efficiently performed on the lookup table, which has all necessary indexes. One disadvantage of this approach is the fact that active sessions will not be part of extensive query operations because they are simply not in the lookup table yet.
These optimizations present diverse approaches to addressing the challenges of server-side persistent sessions, each with its advantages and trade-offs. The key is to tailor your strategy to the specific needs and nuances of your application architecture.
In the realm of server-side persistent sessions, careful consideration and strategic decision-making are paramount. Navigating through pitfalls such as inefficient database queries requires a thoughtful approach. We explored the potential downsides of indexing and discussed optimization strategies, from in-memory caching to asynchronous state building and table splitting. Each strategy comes with its own set of advantages and trade-offs, emphasizing the importance of tailoring solutions to the specific needs of your application architecture.