Skip to main content

Liteset benchmark results

A side-by-side comparison of Liteset (Litestar + Uvicorn, async) against Apache Superset 6.0.0 (Flask + Gunicorn, sync) on identical hardware, dataset and load profile. The only variable between runs is the backend's concurrency model — both stacks run 4 workers against the same constrained PostgreSQL 16 instance and the same SSB Scale Factor 10 dataset (~60 M rows). See the methodology for the full test-bench description.

Headline metrics

Throughput (RPS)

↑ better

Dashboard Fan-Out, 200 concurrent users

Apache Superset1.27 req/s
Liteset10.57 req/s

Median response time

↓ better

Dashboard Fan-Out, 200 concurrent users

Apache Superset134,000 ms
Liteset4,500 ms

Error rate

↓ better

Dashboard Fan-Out, 200 concurrent users

Apache Superset32.8 %
Liteset7.4 %

Throughput at 1 s I/O latency

↑ better

Controlled IO Latency Sweep, 50 users

Apache Superset2.47 req/s
Liteset25.52 req/s

These four metrics tell a single story: moving the web layer from blocking Gunicorn workers to a single ASGI event loop changes the cost model of an IO-bound Superset deployment. While one request waits on a 10–50 s analytical query, the async worker keeps serving others; a sync worker is simply blocked on socket.recv() and cannot accept new work.

Scenario 1 — Dashboard Fan-Out

Models a typical analyst: loading a dashboard with 3–13 charts, each firing its own SQL query to the analytical DB. Run at 200 concurrent users for 15 minutes.

Scenario 1 — throughput (RPS) across the run

Throughput (RPS) over the run. Chart titles are in Russian — figures are reproduced from the diploma testing report.

Scenario 1 — p95 tail latency across the run

p95 tail latency over the run.

MetricApache Superset (sync)Liteset (async)Δ
Requests served1 1299 5108.4×
RPS (aggregate)1.2710.578.3×
Median response (ms)134 0004 50029.8×
p95 (ms)300 000 (timeout)133 0002.3×
Error rate32.8 %7.4 %−25.4 pp
CPU avg / max (%)12 / 169121 / 386
RAM max (MB)856900+5 %

Throughput rose 8.3× and median latency dropped from 134 s to 4.5 s (29.8×). In the sync system, 132 of 200 csrf_token requests — a trivial call that never touches the analytical DB — timed out, because every worker was blocked on a heavy query. The async backend's higher CPU usage (121 % avg vs 12 %) reflects that Uvicorn is actively working in the event loop, while Gunicorn spends most of its time blocked on system calls.

Scenario 2 — SQL Lab Interactive Session

Models a data engineer running SSB queries sequentially via the SQL Lab API. Run at 50 concurrent users for 10 minutes.

Scenario 2 — throughput (RPS) across the run

Throughput (RPS) over the run.

Scenario 2 — p95 tail latency across the run

p95 tail latency over the run.

Because each user runs queries sequentially and the bottleneck is the 4-vCPU PostgreSQL instance, heavy-query throughput is at parity — that is the expected result. The difference shows up in the responsiveness of infrastructure endpoints that don't touch the analytical DB:

Infrastructure endpoint response time (median, ms):

EndpointApache Superset (sync)Liteset (async)Δ
/security/csrf_token8101362×
/security/login26 00038068×
/database/1 8001909.5×

Aggregate results:

MetricApache Superset (sync)Liteset (async)Δ
Requests served262262
Errors10 (3.8 %)9 (3.4 %)−0.4 pp
p95 aggregate (ms)291 000278 000−4.5 %
Median SSB queries (ms)135 000–184 000101 000–233 000≈ parity
CPU avg / max (%)3.8 / 975.5 / 223
RAM max (MB)822975+19 %

In the sync stack, light endpoints degrade by two orders of magnitude under load — csrf_token median climbed to 810 ms and reached 103 s at p90, while login took 26 s. For the user this is the difference between a UI that hangs on sign-in and one that stays responsive while analytical queries run in the background.

Scenario 3 — Controlled IO Latency Sweep

Isolates the effect of IO latency on throughput by replacing real SQL with pg_sleep at fixed delays (10 ms – 5 s). Run at 50 concurrent users, 2 minutes per delay. This is the cleanest comparison of the two architectures because DB variability is removed entirely.

Scenario 3 — throughput (RPS) vs controlled IO delay (log scale)

Throughput (RPS) against controlled pg_sleep IO delay (log-scale x-axis).

RPS at each IO delay (50 users):

IO delayApache Superset RPSLiteset RPSRatio
10 ms12.8730.292.4×
50 ms9.2430.803.3×
100 ms5.4430.645.6×
500 ms3.9530.237.7×
1 s2.4725.5210.3×
5 s0.595.9010.0×

Async throughput stays roughly constant (~30 RPS) from 10 ms to 500 ms — the event loop simply switches between coroutines at each await. Sync throughput falls in proportion to the delay, tracking the theoretical 4 workers / delay ceiling (e.g. ~4 RPS at 1 s). The advantage grows with IO latency, reaching 10× at 1–5 s — exactly the regime BI platforms operate in. The drop to 25.52 RPS at 1 s and 5.90 RPS at 5 s on Liteset is the asyncpg connection pool saturating, not the event loop.

Memory

The testing report records resident memory under Dashboard Fan-Out at 900 MB for Liteset against 856 MB for Apache Superset (+5 %), which it attributes to holding coroutine state and the asyncpg connection pool. Measured against readiness criterion НФТ-2 (RSS ≤ baseline × 1.15), this is reported as met.

Caveats

  • These are macro benchmarks of end-to-end backend behaviour, not the speed of any individual SQLAlchemy query or chart-rendering routine.
  • The PostgreSQL instance is deliberately under-provisioned (4 vCPU against ~60 M rows) so that analytical queries take 5–50 s. This makes the workload IO-bound, which is where the async model helps most and where real BI deployments live.
  • Both stacks run 4 workers to isolate the concurrency model as the single variable between runs.
  • Frontend behaviour is identical by construction; nothing here measures rendering time.

Reproducing

Hardware, PostgreSQL tuning, the SSB SF=10 fixture and the Locust scripts are described on the methodology page. Full Locust CSV exports (stats, stats_history, failures) and docker stats captures for every run are archived in the diploma testing report; each scenario was repeated 3 times for statistical stability.