FeaturesEdit this page.md

HTTP layer

The production HTTP server in @lazarv/react-server is built on Node.js node:http (or node:http2 for HTTPS without proxy) and includes built-in support for keep-alive management, request timeouts, admission control, health check endpoints, and graceful shutdown. These features are critical when running behind a load balancer (e.g. AWS ALB/NLB, k8s Ingress) to prevent 502 errors, connection exhaustion, and dropped requests during deployments.

All HTTP layer options live under the server section of your config file. Every value has a safe default that works well with common load balancer configurations.

react-server.config.mjs
export default { server: { keepAliveTimeout: 65000, headersTimeout: 66000, requestTimeout: 30000, maxConcurrentRequests: 100, maxBodyBytes: 32 * 1024 * 1024, shutdownTimeout: 25000, }, };
OptionDefaultDescription
keepAliveTimeout65000How long (ms) the server keeps idle connections open. Must exceed your load balancer's idle timeout to prevent 502 errors. AWS ALB defaults to 60s, so 65s is a safe starting point.
headersTimeout66000Maximum time (ms) to wait for the client to send the full request headers. Must exceed keepAliveTimeout.
requestTimeout30000Maximum time (ms) for the client to send the complete request (headers + body). Set to 0 to disable.
maxConcurrentRequests0Maximum number of concurrent requests before the server responds with 503 Service Busy. Set to 0 to disable admission control.
maxBodyBytes0 (disabled)Pre-parse cap on the raw request body in bytes. Enforced before the WHATWG Request is constructed. Set to a positive value (e.g. 32 * 1024 * 1024) to apply the cap directly in the runtime.
shutdownTimeout25000After receiving SIGTERM/SIGINT, the server stops accepting new connections and waits up to this duration (ms) for in-flight requests to complete before force-exiting. Should be less than your k8s terminationGracePeriodSeconds (default 30s).

The body cap defaults to 0 (disabled) — most production deployments terminate body limits at a reverse proxy, CDN, or platform edge, and a second runtime-level cap doesn't add defence in depth in that topology. Set maxBodyBytes to a positive value when you want the runtime itself to apply the cap, typically when running without a proxy in front (single-host deployments, local-only services, or as a belt-and-braces setting alongside an upstream limit).

When the cap is active, oversized request bodies are rejected at the HTTP layer before any handler sees the request. Two paths handle the cap:

  1. Declared Content-Length check. If the client sent a Content-Length greater than the cap, the server responds 413 Payload Too Large immediately and reads zero body bytes. This is the cheap path — honest clients with a declared length bail here, with a clean response status.
  2. Streaming counter during read. Handles missing or lying Content-Length (chunked transfer, attacker-controlled headers). Bytes are counted as they arrive through a wrapping Transform; on overflow the underlying socket is destroyed immediately to bound resource usage. The connection close surfaces on the client side as a socket-level error rather than a 413 status — this is the trade-off for not reading the rest of the attacker-controlled payload just to deliver a courtesy status code.

The cap applies to every body-bearing POST / PUT / PATCH / DELETE regardless of route or content-type. It is independent of, and runs before, the per-decode limits in serverFunctions.limits.* — those still apply afterwards inside the Server Function decoder.

Memory peak is bounded by the wrapping stream's highWaterMark (~16 KiB) regardless of the rejected payload's size — the wrapper observes bytes as they flow but never buffers them. Time is bounded by the HTTP server's requestTimeout (default 30s, configurable via server.requestTimeout).

server.maxBodyBytes bounds total wire bytes but cannot defend against attacks that fit inside any reasonable body cap:

server.multipart.* lets you cap per-part shape during streaming parse. When any sub-limit is set to a positive value, multipart requests are parsed via busboy (instead of the platform Request.formData()), enforcing the configured limits as bytes flow. Overflow on any limit rejects with HTTP 413 before the offending part is fully buffered.

react-server.config.mjs
export default { server: { multipart: { maxFileSize: 10 * 1024 * 1024, // 10 MiB per file maxFieldSize: 1 * 1024 * 1024, // 1 MiB per text field maxFiles: 10, // up to 10 files per request maxFields: 100, // up to 100 text fields per request maxParts: 200, // 200 total parts (files + fields) maxFieldNameSize: 200, // 200 bytes per field name }, }, };
LimitDefends against
maxFileSizeOversized file uploads, even within a generous body cap
maxFieldSizeFile-as-field smuggling; oversized text values
maxFilesMany-file submissions allocating many File wrappers
maxFieldsHigh-cardinality field attacks
maxPartsTotal entries cap (files + fields combined)
maxFieldNameSizeLong-field-name string allocation attacks

All sub-limits default to 0 (disabled). When every sub-limit is disabled, busboy is never invoked and multipart bodies pass through to the platform parser unchanged — zero overhead.

The parsed FormData is functionally equivalent to what the platform parser would have produced (filename, MIME type, size, and bytes are preserved). Only Content-Transfer-Encoding per part diverges — the HTML5 spec dropped it for multipart/form-data and modern browsers never emit it, so this affects nothing in practice. An A/B equivalence test in the integration suite asserts the property.

The cap applies on every adapter target the runtime ships. The Node path consumes the raw incoming request directly with busboy; the edge / serverless path adapts the Web Request body to the same parser via Node's Web Streams interop. Per-part cap semantics are identical on both paths because they share the same parser core. The body cap (server.maxBodyBytes) is similarly portable — declared Content-Length is checked from the headers, then the body is read up to maxBodyBytes + 1 and rejected with 413 immediately if it overflows. On native-edge runtimes without Node-compatibility APIs, the per-part multipart cap silently downgrades to the platform parser; the body cap continues to apply.

server.csrf defends server-function action POSTs against Cross-Site Request Forgery by validating the request's Origin (or Referer) header against a trusted-origin set.

The threat is narrower than it first looks. JS-driven action calls — fetch() with the custom react-server-action header — are already safe: any custom header makes the request not CORS-simple, so the browser preflights it, and the runtime refuses unsolicited cross-origin preflights. What needs explicit defence is form-submit action POSTs: <form method="POST"> with a multipart/form-data body and a $ACTION_ID_<token> field. That shape is CORS-simple — browsers send it without preflight — so a malicious site can submit such a form cross-origin unless the receiving app validates the source.

react-server.config.mjs
export default { server: { csrf: { mode: "lax", // default allowedOrigins: [ "https://host.example.com", /^https:\/\/[^.]+\.partner\.com$/, ], }, }, };
ModeOrigin / Referer missingOrigin present & trustedOrigin present & untrusted
"lax" (default)allowallow403
"strict"403allow403
false / "off"allowallowallow

The trusted-origin set is built implicitly from your existing config:

  1. The request's own resolved origin (proxy-aware), so same-origin form posts always work without configuration
  2. server.origin — the canonical configured identity
  3. server.cors.origin / origins when configured with explicit values (not */true) — CORS-trusted partners are usually CSRF-trusted too
  4. server.csrf.allowedOrigins — explicit additions for cases where CSRF trust differs from CORS trust

Remote components: the case that needs explicit configuration. When a host app embeds remote components from this app, the user's browser sees forms whose action targets the remote (this app). On submit, the browser POSTs cross-origin to the remote with Origin: <host origin>. Without an entry in server.csrf.allowedOrigins, the remote rejects the legitimate form submit with 403. This is by design — the remote operator must explicitly declare which host origins may invoke their action endpoints.

remote-app/react-server.runtime.config.mjs
export default { server: { cors: true, csrf: { allowedOrigins: [ "https://host.example.com", "https://staging-host.example.com", ], }, }, };

Rejection response: HTTP 403 Forbidden with header x-react-server-action-error: csrf_origin_mismatch (or csrf_origin_missing in strict mode without an Origin). The handler never runs and the body is never parsed.

Out of scope for this feature: token-based CSRF (double-submit cookie / per-session nonce). That's a stricter defence appropriate for high-value actions, but it requires session awareness that the runtime can't synthesize on your behalf. Apps that need it can implement it as a middleware in front of the action-dispatch.

Node.js defaults keepAliveTimeout to 5 seconds, which is far too low for environments with a load balancer. If the server closes an idle connection before the load balancer does, the load balancer may send a request on a connection the server has already torn down, resulting in a 502 Bad Gateway.

The default values in @lazarv/react-server are chosen to avoid this:

When maxConcurrentRequests is set to a value greater than 0, the server tracks in-flight requests and responds with 503 Service Busy (with a Retry-After: 1 header) when the limit is reached. This prevents thundering-herd scenarios where all requests compete for CPU/memory simultaneously, causing all of them to be slow rather than serving some fast and rejecting others.

The counter is decremented after the response is fully sent, ensuring accurate tracking even for streaming responses. On error paths, the counter is also properly decremented.

@lazarv/react-server ships with an adaptive backpressure system that is enabled by default in production. It uses Event Loop Utilization (ELU)performance.eventLoopUtilization() — as a direct measure of Node.js event loop saturation. Unlike CPU% or latency-based algorithms, ELU is unaffected by workload heterogeneity (switching between fast and slow routes) and only rises when the event loop itself is genuinely saturated.

The control loop uses AIMD (Additive Increase, Multiplicative Decrease):

The limiter starts wide open (initialLimit = maxLimit) and has zero overhead on the fast path — it is invisible under normal load and only tightens when the event loop is genuinely saturated.

To customize or disable it, use server.backpressure:

react-server.config.mjs
export default { server: { backpressure: { enabled: true, // set to false to disable initialLimit: 1000, // starting limit (defaults to maxLimit) minLimit: 1, // floor maxLimit: 1000, // ceiling eluMax: 0.95, // skip queuing above 95% ELU sampleWindow: 1000, // recalculate every 1s smoothingFactor: 0.2, // EWMA latency smoothing queueSize: 100, // max requests waiting for a slot queueTimeout: 5000, // max wait time (ms) before 503 }, }, };
OptionDefaultDescription
enabledtrueEnable adaptive backpressure. Set to false to disable and fall back to static maxConcurrentRequests.
initialLimitmaxLimitStarting concurrency limit. Defaults to maxLimit (start wide open, tighten under overload).
minLimit1Floor — the adaptive limit never drops below this.
maxLimit1000Ceiling — capped by maxConcurrentRequests when both are set.
eluMax0.95ELU level (0–1) where the limit decreases and excess requests skip the queue.
sampleWindow1000Interval (ms) for recalculation and ELU sampling.
smoothingFactor0.2EWMA factor (0–1) for latency smoothing. Higher = more reactive.
queueSize100Maximum requests waiting in the backpressure queue. When full, additional requests are immediately rejected with 503.
queueTimeout5000Maximum time (ms) a request waits in the queue before being rejected with 503. Should be shorter than your load balancer's request timeout.

When both backpressure.enabled and maxConcurrentRequests are configured, the static limit acts as the hard ceiling for the adaptive limit. This gives you a safety net: the algorithm can explore up to maxConcurrentRequests but never exceed it.

How the queue works

Instead of immediately rejecting requests when the concurrency limit is reached, the limiter places them in a bounded FIFO queue. When an in-flight request completes, the freed slot is handed directly to the next queued waiter rather than returning to the general pool — ensuring fair ordering.

Requests are removed from the queue when:

This absorbs short traffic bursts transparently while still shedding load during sustained overload.

Tip: Start with the defaults and monitor. The limiter exposes stats (current limit, inflight count, queue depth, ELU, smoothed latency) that you can pipe into your observability stack to tune the parameters for your workload.

The production server exposes two built-in endpoints for Kubernetes liveness and readiness probes. These endpoints are registered at the very top of the middleware chain, bypassing all other middleware for minimal latency.

EndpointPurposeResponse
/__react_server_health__Liveness probe200 ok — the process is alive
/__react_server_ready__Readiness probe200 ok when the worker thread is running, 503 not ready when the worker has exited

Example Kubernetes pod spec:

livenessProbe: httpGet: path: /__react_server_health__ port: 3000 initialDelaySeconds: 5 periodSeconds: 10 readinessProbe: httpGet: path: /__react_server_ready__ port: 3000 initialDelaySeconds: 3 periodSeconds: 5

Tip: Point your liveness probe at /__react_server_health__ rather than /. The health endpoint returns instantly without touching the SSR pipeline, so it won't false-fail under heavy rendering load.

When the server receives SIGTERM or SIGINT:

  1. It stops accepting new connections
  2. In-flight requests are allowed to complete
  3. After shutdownTimeout milliseconds, the process force-exits

In cluster mode, the primary process waits for all workers to drain before exiting. If a worker dies unexpectedly during normal operation, it is automatically restarted — rather than taking down the entire service.

This ensures zero-downtime rolling deployments on Kubernetes and other container orchestrators. The default shutdownTimeout of 25 seconds leaves a 5-second buffer within the default k8s terminationGracePeriodSeconds of 30 seconds.