Skip to main content

Benchmark methodology

This page describes how Liteset was benchmarked against upstream Apache Superset 6.0.0. The goal is fair, reproducible measurement — same hardware, same dataset, same load profile, only the backend's concurrency model changes.

The single variable between every pair of runs is the backend: Flask + Gunicorn (4 sync workers) versus Litestar + Uvicorn (4 workers + uvloop). The database, data, network and load generator are identical.

A personal note from the author

This is my first load-testing effort at this scale, and it was done as part of a diploma thesis rather than in a dedicated benchmarking lab. I've tried to keep the methodology fair and reproducible, but I may well have made mistakes somewhere. Please don't judge too harshly — and if you're interested and notice any errors or weak spots, I'm genuinely open to criticism and feedback.

Test bench

ComponentvCPURAMDiskNotes
Backend VM48 GB100 % vCPU cap; Flask/Gunicorn or Litestar/Uvicorn
Locust VM24 GBload generator, same LAN
PostgreSQL 16416 GB50 GB NVMemetadata + analytical data; the IO bottleneck

Host: Dell PowerEdge R420, 2× Xeon E5-2420 v2 (2.6 GHz), DDR3 ECC, NVMe SSD, Rocky Linux 9. All services run in Docker Compose.

PostgreSQL is deliberately under-provisioned relative to the data volume so analytical queries take 5–50 s, making the workload IO-bound. Tuning: shared_buffers = 4 GB, work_mem = 64 MB, effective_cache_size = 12 GB, random_page_cost = 8, max_parallel_workers = 4.

Software under test

VariantStack
Apache Superset 6.0.0Flask + Gunicorn, 4 sync workers, 300 s timeout
LitesetLitestar + Uvicorn, 4 workers + uvloop, async event loop

Equal worker counts (4 each) isolate the concurrency model: a sync worker handles one request at a time and blocks on IO; an async worker multiplexes many coroutines in one process. Both run against the same PostgreSQL instance with no schema changes — superset_config.py is loaded unmodified.

Dataset

The analytical database is loaded with the Star Schema Benchmark (SSB) at Scale Factor 10 — ~60 M rows in LINEORDER plus four dimension tables (CUSTOMER, SUPPLIER, PART, DATE). SSB is a standard analytical benchmark with deterministic generation (ssb-dbgen), realistic JOIN/aggregation queries and wide academic use. On top of it, 10 dashboards of 6–12 charts each implement the four standard SSB Query Flights (Q1.1–Q4.3).

Load generator

Load is generated with Locust from a separate VM on the same LAN to minimise network latency.

Workloads

Scenario 1 — Dashboard Fan-Out

Loading a dashboard with 3–13 charts, each firing its own SQL query. 200 concurrent users, 15 minutes. Each user loads a random dashboard with a 3–5 s think time. Stresses the backend's ability to run many IO-bound operations in parallel.

Scenario 2 — SQL Lab Interactive Session

A data engineer running SSB queries sequentially via the SQL Lab API, waiting for each result. 50 concurrent users, 10 minutes, 10 queries each from the SSB Query Flights. Measures latency under concurrent access to the analytical DB — and the responsiveness of infrastructure endpoints alongside heavy queries.

Scenario 3 — Controlled IO Latency Sweep

Queries a pg_sleep-based virtual dataset at fixed delays (10 ms, 50 ms, 100 ms, 500 ms, 1 s, 5 s). 50 concurrent users, 2 minutes per delay. Removes SQL variability so the only variable is the concurrency model — the clearest demonstration of the async advantage.

Metrics

Six metric groups per scenario:

MetricWhy it matters
Throughput (RPS)Successful requests per second (Locust)
Response timeMedian, p95 and p99 (Locust) — tail latency reveals queueing
Error ratePercentage of 4xx/5xx responses
CPU usageAverage and peak backend-container CPU (docker stats)
Resident memory (RSS)Backend process footprint (docker stats)
PostgreSQL connectionsActive connections via pg_stat_activity — detects connection leaks

Procedure

Each scenario follows a standardised five-step procedure:

  1. Reset — restart the backend and Redis containers to clear state.
  2. Warm-up — 60 s of minimal load, discarded.
  3. Main run — 5 minutes of steady load at the target user count (per-scenario durations noted above).
  4. Capture — export Locust CSVs (stats, stats_history, failures) and docker stats.
  5. Repeat — each scenario is run 3 times for statistically stable results.

Comparability

Both backends are deployed:

  • In the same Docker Compose stack — only the backend image is swapped
  • With identical PostgreSQL, Redis and Celery — unchanged between runs
  • With the same dataset and dashboard fixtures — bootstrapped from the same SSB snapshot
  • With the same authentication path — itsdangerous-signed session cookies issued once and reused

Reproducing

The exact Locust scripts, infrastructure manifests and raw CSV results are kept alongside the diploma testing report. Each result on the Results page corresponds to one of the three scenarios above.