FAQ

How does Liteset differ from Apache Superset?

Liteset is an async port of Apache Superset 6.0.0, rewritten from Flask/WSGI onto Litestar/ASGI. Business logic and user-visible behaviour are identical — only the web-layer implementation differs. See the comparison page for the full breakdown.

Can I drop in Liteset on top of an existing Apache Superset install?

Yes. Liteset guarantees compatibility at the level of:

the metadata DB schema (same Alembic revisions),
the REST API (same routes, response shapes, field names),
Flask-signed session cookies and CSRF tokens,
the SPA frontend (frontend code is not modified),
Celery workers (used as-is).

Stop Apache Superset, start Liteset against the same database — users keep working without re-login.

Where do I find the testing report?

The full methodology, workloads and benchmark results live under Benchmarks.

How big of a dataset can Liteset handle?

Liteset can work with even gigantic databases! Liteset acts as a thin layer above your underlying databases or data engines, which do all the processing. Liteset simply visualizes the results of the query.

The key to achieving acceptable performance in Liteset is whether your database can execute queries and return results at a speed that is acceptable to your users. If you experience slow performance with Liteset, benchmark and tune your data warehouse.

What are the computing specifications required to run Liteset?

The specs of your Liteset installation depend on how many users you have and what their activity is, not on the size of your data. Superset admins in the community have reported 8GB RAM, 2vCPUs as adequate to run a moderately-sized instance; Liteset's async stack typically lowers the resident-memory floor versus a Gunicorn pre-fork model. To develop Liteset, e.g., compile code or build images, you may need more power.

Monitor your resource usage and increase or decrease as needed. Note that Liteset usage has a tendency to occur in spikes, e.g., if everyone in a meeting loads the same dashboard at once.

Liteset's application metadata does not require a very large database to store it, though the log file grows over time.

Can I join / query multiple tables at one time?

Not in the Explore or Visualization UI. A Liteset SQLAlchemy datasource can only be a single table or a view.

When working with tables, the solution would be to create a table that contains all the fields needed for your analysis, most likely through some scheduled batch process.

A view is a simple logical layer that abstracts an arbitrary SQL queries as a virtual table. This can allow you to join and union multiple tables and to apply some transformation using arbitrary SQL expressions. The limitation there is your database performance, as Liteset effectively will run a query on top of your query (view). A good practice may be to limit yourself to joining your main large table to one or many small tables only, and avoid using GROUP BY where possible as Liteset will do its own GROUP BY and doing the work twice might slow down performance.

Whether you use a table or a view, performance depends on how fast your database can deliver the result to users interacting with Liteset.

However, if you are using SQL Lab, there is no such limitation. You can write SQL queries to join multiple tables as long as your database account has access to the tables.

How do I create my own visualization?

We recommend reading the instructions in Creating Visualization Plugins.

Can I upload and visualize CSV data?

Absolutely! Read the instructions here to learn how to enable and use CSV upload.

Why are my queries timing out?

There are many possible causes for why a long-running query might time out.

For running long query from Sql Lab, by default Liteset allows it run as long as 6 hours before it being killed by celery. If you want to increase the time for running query, you can specify the timeout in configuration. For example:

SQLLAB_ASYNC_TIME_LIMIT_SEC = 60 * 60 * 6

If you are seeing timeouts (504 Gateway Time-out) when loading dashboard or explore slice, you are probably behind gateway or proxy server (such as Nginx). If it did not receive a timely response from the Liteset server (which is processing long queries), these web servers will send 504 status code to clients directly. Liteset has a client-side timeout limit to address this issue. If query didn’t come back within client-side timeout (60 seconds by default), Liteset will display warning message to avoid gateway timeout message. If you have a longer gateway timeout limit, you can change the timeout settings in superset_config.py:

SUPERSET_WEBSERVER_TIMEOUT = 60

Why is the map not visible in the geospatial visualization?

You need to register a free account at Mapbox.com, obtain an API key, and add it to .env at the key MAPBOX_API_KEY:

MAPBOX_API_KEY = "longstringofalphanumer1c"

How to limit the timed refresh on a dashboard?

By default, the dashboard timed refresh feature allows you to automatically re-query every slice on a dashboard according to a set schedule. Sometimes, however, you won’t want all of the slices to be refreshed - especially if some data is slow moving, or run heavy queries. To exclude specific slices from the timed refresh process, add the timed_refresh_immune_slices key to the dashboard JSON Metadata field:

{
   "filter_immune_slices": [],
    "expanded_slices": {},
    "filter_immune_slice_fields": {},
    "timed_refresh_immune_slices": [324]
}

In the example above, if a timed refresh is set for the dashboard, then every slice except 324 will be automatically re-queried on schedule.

Slice refresh will also be staggered over the specified period. You can turn off this staggering by setting the stagger_refresh to false and modify the stagger period by setting stagger_time to a value in milliseconds in the JSON Metadata field:

{
    "stagger_refresh": false,
    "stagger_time": 2500
}

Here, the entire dashboard will refresh at once if periodic refresh is on. The stagger time of 2.5 seconds is ignored.

Why does Liteset freeze/hang/not respond when started (my home directory is NFS mounted)?

By default, Liteset creates and uses an SQLite database at ~/.superset/superset.db. SQLite is known to not work well if used on NFS due to broken file locking implementation on NFS.

You can override this path using the SUPERSET_HOME environment variable.

Another workaround is to change where Liteset stores the sqlite database by adding the following in superset_config.py:

SQLALCHEMY_DATABASE_URI = 'sqlite:////new/location/superset.db?check_same_thread=false'

You can read more about customizing Liteset using the configuration file here.

What if the table schema changed?

Table schemas evolve, and Liteset needs to reflect that. It’s pretty common in the life cycle of a dashboard to want to add a new dimension or metric. To get Liteset to discover your new columns, all you have to do is to go to Data -> Datasets, click the edit icon next to the dataset whose schema has changed, and hit Sync columns from source from the Columns tab. Behind the scene, the new columns will get merged. Following this, you may want to re-edit the table afterwards to configure the Columns tab, check the appropriate boxes and save again.

What database engine can I use as a backend for Liteset?

To clarify, the database backend is an OLTP database used by Liteset to store its internal information like your list of users and dashboard definitions. While Liteset supports a variety of databases as data sources, only a few database engines are supported for use as the OLTP backend / metadata store.

Liteset targets PostgreSQL as its primary metadata backend (the async runtime uses asyncpg). SQLite is supported for local development. MySQL is inherited from the Apache Superset surface but is not exercised by the Liteset async driver matrix. It has been reported that Microsoft SQL Server does not work as a Superset backend. Column-store, non-OLTP databases are not designed for this type of workload.

How can I configure OAuth authentication and authorization?

Liteset ports Flask-AppBuilder's auth model to AsyncSecurityManager, so OAuth provider configuration keys (OAUTH_PROVIDERS, AUTH_TYPE = AUTH_OAUTH, AUTH_USER_REGISTRATION, AUTH_USER_REGISTRATION_ROLE, etc.) are honoured as-is in superset_config.py. See the OAuth section of Configuring Liteset for the full list of supported settings. The original Flask-AppBuilder configuration example remains a useful reference for the provider dictionary shape.

Is there a way to force the dashboard to use specific colors?

It is possible on a per-dashboard basis by providing a mapping of labels to colors in the JSON Metadata attribute using the label_colors key. You can use either the full hex color, a named color, like red, coral or lightblue, or the index in the current color palette (0 for first color, 1 for second etc). Example:

{
    "label_colors": {
        "foo": "#FF69B4",
        "bar": "lightblue",
        "baz": 0
    }
}

Does Liteset work with [insert database engine here]?

The Connecting to Databases section provides the best overview for supported databases. Database engines not listed on that page may work too. We rely on the community to contribute to this knowledge base.

For a database engine to be supported in Liteset through the SQLAlchemy connector, it requires having a Python compliant SQLAlchemy dialect as well as a DBAPI driver defined. Database that have limited SQL support may work as well. For instance it’s possible to connect to Druid through the SQLAlchemy connector even though Druid does not support joins and subqueries. Another key element for a database to be supported is through the Liteset Database Engine Specification interface. This interface allows for defining database-specific configurations and logic that go beyond the SQLAlchemy and DBAPI scope. This includes features like:

date-related SQL function that allow Liteset to fetch different time granularities when running time-series queries
whether the engine supports subqueries. If false, Liteset may run 2-phase queries to compensate for the limitation
methods around processing logs and inferring the percentage of completion of a query
technicalities as to how to handle cursors and connections if the driver is not standard DBAPI

Liteset also ships a set of native async engine specs (postgres.py via asyncpg, mysql.py via asyncmy, clickhouse.py via aiochclient, trino.py via aiotrino) plus a sync_fallback.py wrapper that runs any sync SQLAlchemy dialect on a thread executor.

Beyond the SQLAlchemy connector, it’s also possible, though much more involved, to extend Liteset and write your own connector. The only example of this at the moment is the Druid connector, which is getting superseded by Druid’s growing SQL support and the recent availability of a DBAPI and SQLAlchemy driver. If the database you are considering integrating has any kind of SQL support, it’s probably preferable to go the SQLAlchemy route. Note that for a native connector to be possible the database needs to have support for running OLAP-type queries and should be able to do things that are typical in basic SQL:

aggregate data
apply filters
apply HAVING-type filters
be schema-aware, expose columns and types

Does Liteset offer a public API?

Yes, a public REST API, and the surface of that API formal is expanding steadily. You can read more about this API and interact with it using Swagger here.

Some of the original vision for the collection of endpoints under /api/v1 was originally specified in upstream SIP-17; Liteset reproduces that contract 1:1 across its 37+ async controllers.

The API is documented using Swagger and is auto-generated by Litestar from the controller handlers and msgspec DTOs. It is exposed at /swagger/v1.

There are other undocumented [private] ways to interact with Liteset programmatically that offer no guarantees and are not recommended but may fit your use case temporarily:

using the ORM (SQLAlchemy 2.0 async) directly
altering the source code in your fork

How can I see usage statistics (e.g., monthly active users)?

This functionality is not included with Liteset, but you can extract and analyze Liteset's application metadata to see what actions have occurred. By default, user activities are logged in the logs table in Liteset's metadata database. One company has published a write-up of how they analyzed Superset usage, including example queries.

What Does Hours Offset in the Edit Dataset view do?

In the Edit Dataset view, you can specify a time offset. This field lets you configure the number of hours to be added or subtracted from the time column. This can be used, for example, to convert UTC time to local time.

Does Liteset collect any telemetry data?

No. Liteset does not ship a telemetry pixel, does not call out to Scarf, and does not publish images to the apache/superset registry. Container images built from this repository are push-only to the operator's own registry and contain no third-party analytics hooks.

Does Liteset have an archive panel or trash bin from which a user can recover deleted assets?

No. Currently, there is no way to recover a deleted Liteset dashboard/chart/dataset/database from the UI. The upstream Apache Superset project has an ongoing discussion about implementing such a feature.

Hence, it is recommended to take periodic backups of the metadata database. For recovery, you can launch a recovery instance of a Liteset server with the backed-up copy of the DB attached and use the Export Dashboard button in the Liteset UI (or the superset export-dashboards CLI command). Then, take the .zip file and import it into the current Liteset instance.

Alternatively, you can programmatically take regular exports of the assets as a backup.

I ran a security scan of the Liteset container image and it showed dozens of "high" and "critical" vulnerabilities! Can you release a version of Liteset without these?

You are talking about dependency CVEs: identified vulnerabilities in software that Liteset uses. Most of these CVEs are in the Linux kernel or Python, both of which have many other people working on their security.

Liteset inherits its CVE posture from Apache Superset 6.0.0 plus the additional async stack (Litestar, asyncpg, uvloop). Dependencies are regularly bumped to newer versions, and pull requests that fix dependency CVEs are welcome.

For vulnerabilities in Liteset itself, see the Liteset CVEs page for the list of tracked issues and reporting instructions. Inherited Apache Superset CVEs are also tracked at the upstream Apache Superset CVEs page.

How does Liteset differ from Apache Superset?​

Can I drop in Liteset on top of an existing Apache Superset install?​

Where do I find the testing report?​

How big of a dataset can Liteset handle?​

What are the computing specifications required to run Liteset?​

Can I join / query multiple tables at one time?​

How do I create my own visualization?​

Can I upload and visualize CSV data?​

Why are my queries timing out?​

Why is the map not visible in the geospatial visualization?​

How to limit the timed refresh on a dashboard?​

What if the table schema changed?​

What database engine can I use as a backend for Liteset?​

How can I configure OAuth authentication and authorization?​

Is there a way to force the dashboard to use specific colors?​

Does Liteset work with [insert database engine here]?​

Does Liteset offer a public API?​

How can I see usage statistics (e.g., monthly active users)?​

What Does Hours Offset in the Edit Dataset view do?​

Does Liteset collect any telemetry data?​

Does Liteset have an archive panel or trash bin from which a user can recover deleted assets?​

I ran a security scan of the Liteset container image and it showed dozens of "high" and "critical" vulnerabilities! Can you release a version of Liteset without these?​