Skip to content

Proposal: Implementation of Short-Term In-Memory Cache in Express for Mitigating Traffic Spikes and Preventing DDoS #6395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
andrehrferreira opened this issue Mar 16, 2025 · 8 comments

Comments

@andrehrferreira
Copy link

andrehrferreira commented Mar 16, 2025

Good morning everyone =)

Currently, Express is widely used by applications of various sizes and purposes, serving as the foundation for numerous critical web services. However, a common challenge in high-traffic applications is the need to handle unexpected spikes in simultaneous access, which can be caused by both legitimate events (such as promotions, launches, and viral content) and DDoS attacks. While there are various strategies to address these situations, the current Express architecture requires that all requests go through routing, middleware, and handlers, potentially generating unnecessary overhead before a request is even processed.

The proposed solution is the implementation of a short-term in-memory cache, with an expiration time between 2 and 5 seconds, to store and reuse the most frequently accessed responses within this short period. This concept is already widely used in high-performance HTTP servers and complements traditional strategies such as CDNs, Redis, and other external caching layers, significantly reducing the impact of traffic spikes on an application.

This approach consists of intercepting requests before they reach the routing and middleware layer, checking whether the same response can be delivered to multiple requests within the same short timeframe. This technique proves particularly efficient for GET requests, where most responses are identical for different users over a short period, such as static HTML pages, public query APIs, and frequently accessed content.

The benefits of this strategy include:

  • Reduction in server load, preventing the need to repeatedly process identical requests.
  • Fewer unnecessary database and external service accesses, reducing latency and operational costs.
  • Minimization of the impact of traffic spikes, preventing bottlenecks and improving application stability.
  • Enhanced resilience against DDoS attacks, as cached responses reduce processing overhead.

To ensure an efficient implementation, a robust cache key generation mechanism is essential, avoiding collisions and ensuring that stored responses accurately match incoming requests. The use of MurmurHash3 for fast, low-collision hash generation, combined with URLSearchParams for request parameter normalization, has proven to be an effective approach for this scenario.

Unlike solutions relying solely on Redis or other external caching systems, this approach eliminates the latency associated with TCP-based queries, as the cache resides directly in the application’s memory. Additionally, due to the short cache expiration time, there is no risk of serving outdated information in scenarios where data changes rapidly.

Implementing this system within Express would provide applications with a native mechanism for handling massive access loads, without relying on external solutions or additional processing layers. This approach has already proven effective in various modern HTTP server scenarios and could significantly impact Express’s scalability and resilience.

Automatic Cache Cleanup and Memory Control

To ensure efficient memory management, an automatic cache cleanup system should be implemented. Since this cache is short-lived, a timed eviction mechanism can be used to remove expired cache entries, ensuring that outdated responses are not stored unnecessarily.

Additionally, a maximum memory threshold should be defined, preventing the cache from growing uncontrollably and consuming excessive system resources. When the memory limit is reached, the system should adopt a Least Recently Used (LRU) strategy to remove older or less frequently accessed cache entries, keeping only the most relevant responses available.

Why This Implementation Should Not Be an Optional Module

For this solution to be truly effective, it should not be implemented as an optional middleware, but rather be directly integrated into the core request handling layer of Express. If implemented as a middleware, Express would still need to resolve the route and execute the entire middleware stack before reaching the caching logic. This would lead to a significant performance loss, as each request would still pass through unnecessary processing steps before benefiting from the cache.

Currently, Express's router is not as efficient as other solutions like find-my-way, used in frameworks such as Fastify. Routing in Express involves iterating over registered routes and middleware, which introduces additional overhead, especially in high-throughput applications. By integrating the cache mechanism before route resolution, the server can immediately return cached responses, avoiding unnecessary routing computations.

Furthermore, the effectiveness of a caching system diminishes if the request has already passed through a large middleware stack before reaching it. The more middleware an application has, the less noticeable the performance gain will be, as Express will still need to process the request through multiple layers before determining if a cached response exists.

To maximize efficiency, this caching mechanism must be implemented at the pre-processing stage of the handler, intercepting and analyzing the raw HTTP request before Express begins routing and executing middleware. By doing so, the system can determine within microseconds whether a cached response can be returned, avoiding unnecessary computations and significantly improving response times in high-load environments.

Conclusion

The proposed implementation will significantly reduce infrastructure costs, especially in scenarios with a high volume of repeated requests. By integrating short-term in-memory caching directly into the core of Express, the framework becomes more robust and better equipped to handle large amounts of traffic on a single instance.

Additionally, this approach enhances Express's resilience against DDoS attacks and sudden traffic spikes, ensuring that frequently accessed routes do not repeatedly consume computational resources unnecessarily. By reducing the need for redundant request processing, database queries, and middleware execution, this implementation allows Express to operate with greater efficiency and scalability.

Ultimately, this improvement would make Express a stronger choice for high-performance applications, enabling it to compete more effectively with modern alternatives while maintaining its simplicity and widespread adoption.

@dpopp07
Copy link
Contributor

dpopp07 commented Mar 17, 2025

Thanks for taking the time to write this up! It's an interesting proposal. I'm inclined to think that this is out of scope for an intentionally-minimalist project like Express, and is better handled by other components in the overall architecture for a high-traffic service, but I'm curious what others think.

Implementing this system within Express would provide applications with a native mechanism for handling massive access loads, without relying on external solutions or additional processing layers.

If a service truly is handling massive loads, it probably already should be relying on external components, and response caching is something that is likely better handled by a purpose-built reverse proxy set up in front of the application. That's currently the recommendation in the Express documentation.

@andrehrferreira
Copy link
Author

andrehrferreira commented Mar 17, 2025

I fully agree that using a load balancer, such as Nginx, in front of the application is the best practice. This remains an effective solution, but in scenarios involving DDoS attacks or massive traffic spikes, even with a load balancer distributing requests, the application can still become overwhelmed.

I experienced this firsthand while using Express during Black Friday when my Google Analytics recorded traffic surges of thousands of requests per second due to a promotion by a major Brazilian YouTuber. We observed a 90/10 distribution pattern, where 90% of the requests were concentrated on just 10% of the endpoints.

In this scenario, Express was repeatedly processing the same requests, resolving routes, and executing multiple middleware layers to serve identical content to different users. For instance, if a request had to go through 10 middleware functions before reaching the controller—where it would then query Redis to retrieve a JSON response—this introduced unnecessary latency, directly impacting the application's scalability and performance.

The goal of this approach is not to replace load balancers but to complement them by preventing unnecessary calls to the router and middleware, which can be incredibly slow depending on the scenario. By optimizing how frequently these layers are executed for repetitive requests, we can significantly reduce overhead and improve response times.

@xxsuperdreamxx
Copy link

@andrehrferreira, a practical approach would be to split your Express backend into microservices and use load balancing for those services. As you mentioned, 90% of requests are concentrated on a small number of endpoints, so there's no need to distribute the entire Express backend.

When it comes to DDoS prevention, the most effective strategies operate at the server level rather than within Express itself. In our setup, we utilize four DDoS prevention systems—Cloudflare, Sucuri, FortiDDoS, and another system I cannot disclose. Each system runs on separate virtual machines to avoid adding unnecessary overhead.

For serving static or repetitive content, leveraging a reverse proxy to cache responses is a great optimization. Additionally, we've implemented strategies like storing repetitive data in Local Storage or JWTs, which has helped us reduce traffic by 40% on certain endpoints.

Using 10 middleware functions seems excessive. It might be a good time to review and optimize your code logic. It's also worth considering whether some of the libraries you're using could be replaced with native Node.js or Bun alternatives. For instance, I’ve swapped out bcrypt for the native Node scrypt module, which has worked well. Another optimization we implemented was migrating from RSA encryption to EdDSA. These small adjustments can have a significant impact, particularly since we both know that JavaScript, being inherently single-threaded, isn't the most efficient language for certain tasks.

I hope that helps!

@andrehrferreira
Copy link
Author

@xxsuperdreamxx

Hey there,

Just to clarify — the example I mentioned was purely hypothetical. In practice, my projects already have a custom HTTP server in place. That said, I’d like to share my thoughts on the topic.

First of all, I completely disagree with the unnecessary use of microservices when there’s no real justification. Valid use cases for microservices include things like consuming queues, handling heavy workloads such as payment processing, image or video manipulation, and so on. Outside of those scenarios, a well-structured monolith can be far more efficient and maintainable.

Caching layers, WAFs, Cloudflare, and NGINX still play a critical role in any solid architecture. The problem is that many developers don’t properly configure caching headers. For example, if your application relies on ETag to return a 304, even for an OPTIONS request, Express will still resolve routes, process middlewares, and generate the response body to compute the ETag. Without proper Cache-Control headers, all your efforts to leverage CDN-level caching on the API become pointless.

Furthermore, tools like Cloudflare’s WAF and DDoS protection are effective when dealing with actual attacks — but that doesn't solve the issue of legitimate high traffic reaching the application, which, while not extremely common, is still a valid scenario for public APIs.

That’s why I’d like to open a broader discussion on introducing an optional intermediate layer between Node’s native HTTP server and the router. The goal is to reduce unnecessary processing — especially considering the known performance bottleneck in the creation of req and res objects, which I’ve discussed in a separate issue #5998.

In summary, I believe this feature — even as an optional toggle — could serve as a valuable optimization strategy. It can be enabled or disabled as needed, and in many cases, it would result in noticeable performance improvements across the board.

@Harsh0
Copy link

Harsh0 commented Apr 6, 2025

Thanks for opening this discussion @andrehrferreira — it’s a solid proposal, and I agree with @dpopp07 that this is best left as an external module rather than being embedded directly into Express core.

While short-term in-memory caching (2–5s) at the application layer can reduce redundant work during traffic spikes, it’s important to consider some operational caveats before implementing it directly inside Express:

  1. High HTTP Connection Overhead: Express apps under heavy load often struggle not due to CPU or memory limitations, but due to an explosion in active HTTP connections, especially when upstream network bandwidth and TCP handling aren’t optimized. Adding an in-process cache doesn’t reduce the number of TCP connections the server has to handle — which often remains the bottleneck.
  2. Limited Global Reach: App-layer caching is bound by the instance’s region or deployment zone. If your app serves a global user base, caching at the CDN level provides low-latency experiences globally, something an Express-local cache cannot.
  3. Memory Management Complexity: As discussed in the proposal, LRU and memory thresholds can help, but caching in a long-lived server process requires careful memory monitoring and lifecycle handling — something Express core isn’t naturally designed to manage across varying workloads and environments.

Instead, a more scalable and resilient approach is to offload the majority of repeated traffic to a smart CDN caching layer, as I detail in this article. With well-configured edge caching and cache-key normalization, CDNs like Cloudflare, Akamai, or CloudFront can serve content rapidly and absorb both legitimate spikes and certain classes of DDoS attacks without hitting your origin server at all.

Benefits of such smart CDN caching:

  • Handles legitimate high-traffic events with low latency globally
  • Absorbs high QPS load at the edge, preserving your app server
  • Reduces origin connection churn and keeps your Express stack lean

App-layer caching (as a userland module) can still complement this for APIs or pages that can’t be cached at CDN due to auth/state constraints, but in most cases, pushing caching upstream yields better resilience and performance.

So, +1 for the idea, but I’d recommend keeping this as a well-documented module and encouraging CDN-based edge caching as the first line of defense.

@andrehrferreira
Copy link
Author

@Harsh0

I agree that the best practice for using any large web application is to use CDN. This is not the point of discussion. What I bring for analysis is the layer that sits between the receipt of the request by HTTP from node.js and the preprocessing of the router. I will try to explain better. Today, the Express lifecycle observes all requests and sends them to a handler function that will take the standard HTTP request/res and will inject methods and parameters using techniques that today greatly harm the processing performance. Before that, the router will be checked to see if there is a route in the request's path and method patterns. This also has a certain level of latency compared to other solutions such as find-my-way. If the programmer has defined middleware layers such as etag, compression, body-parser (the basics), we will have a serial loop until reaching the handler function specified in the application. When there is a caching element by Redis, for example, now a TCP request enters to retrieve data from the memory of the server hosting Redis, often stored in a JSON string, and/or do all the necessary pipeline to resolve the route, access the database, etc. Do you notice the turnaround that was taken for the API to return a simple JSON?

@Harsh0
Copy link

Harsh0 commented Apr 6, 2025

Thanks for the clarification, @andrehrferreira — I now better understand the motivation behind targeting the pre-router, pre-middleware layer in the request lifecycle. You’re absolutely right: for high-traffic public GET endpoints, even small latencies from route resolution, middleware stack (etag, body-parser, etc.), and handler wrapping can accumulate and degrade performance.

That said, I still believe the Express core should stay minimal and unopinionated. But your idea is still very much implementable as a reusable external module that hooks into the app at the earliest possible point.

Here’s a simplified example showing how one might build such a caching module for short-lived GET responses, using a Map and attaching it directly in the middleware chain before the router:

// smartShortCache.js
function smartShortCache(ttl = 3000) {
  const cache = new Map();
  return function (req, res, next) {
    if (req.method !== 'GET') return next();
    const key = req.originalUrl;
    const now = Date.now();
    if (cache.has(key)) {
      const { value, expires } = cache.get(key);
      if (now < expires) {
        res.setHeader('X-Cache-Hit', 'true');
        return res.send(value);
      }
      cache.delete(key); // Expired
    }
    // Hijack res.send to store the response
    const originalSend = res.send.bind(res);
    res.send = (body) => {
      cache.set(key, { value: body, expires: now + ttl });
      res.setHeader('X-Cache-Hit', 'false');
      originalSend(body);
    };
    next();
  };
}
module.exports = smartShortCache;

Usage in Express app:

const express = require('express');
const smartShortCache = require('./smartShortCache');
const app = express();
app.use(smartShortCache(3000)); // 3s short-term cache
app.get('/api/status', (req, res) => {
  res.json({ status: 'ok', timestamp: Date.now() });
});
app.listen(3000, () => console.log('Server running on port 3000'));

This way, developers can use this early short-term caching layer as needed, without altering Express internals. It aligns with your design intent — intercepting the request early, before routing and heavy middleware — while still keeping Express core lightweight and composable.

You’ve brought up an important gap in most developers’ awareness: how much processing happens ### before your handler even runs. And I think with your idea implemented like this, it can be adopted by anyone facing the problem — and even optimized further with LRU caching, stats tracking, etc.

Would love to collaborate or refine this module together if you’re open to it!

@andrehrferreira
Copy link
Author

@Harsh0

Yes, I can help, but it is not part of my context. I am not using Express for now until the problem #5998 is resolved.

I chose to rewrite the entire code, solving the most critical performance problems for my application. I'll explain why the solution you proposed doesn't solve the problem. Today, Express' biggest performance problem is in the handler that creates the req/res in the format due to the setPrototypeOf and Object.defineProperty functions. When you apply the solution to a middleware, this project has already been done so much that within its function return function (req, res, next) it already receives the processed req and res. Therefore, the main objective, which is to reduce the number of times this process is done, didn't serve much purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants