Denormalize Data #1
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Was debugging some issues while implementing ScyllaDB support and received a warning that the query I was using (based off on the original Fake DB queries) was not efficient. GH Copilot offered a suggestion that seemed fair. Note that it is mentioning the users queries, this could be applied elsewhere as well.
The warning occurs because ScyllaDB (and Cassandra) requires that queries use indexed columns or primary key components for efficient lookups. When you query by a non-primary key column (like
nameindbGetUserByName), ScyllaDB needs to scan the entire table, which can lead to unpredictable performance. By default, ScyllaDB blocks such queries unless you explicitly allow them usingALLOW FILTERING.Options to Resolve the Warning
1. Add
ALLOW FILTERING(Quick Fix)You can append
ALLOW FILTERINGto the query to bypass the restriction. However, this is not recommended for production use because it can lead to performance issues on large datasets.When to Use: Only use this approach if the dataset is small or if you are certain that the query will not cause performance issues.
2. Create a Secondary Index
If you frequently query by
name, you can create a secondary index on thenamecolumn. This allows ScyllaDB to efficiently look up rows byname.CQL Command to Create the Index:
Updated Query:
Once the index is created, you can use the same query without
ALLOW FILTERING:When to Use: Use this approach if querying by
nameis a common operation and the dataset is large. Secondary indexes work well for low-cardinality columns (e.g.,name).3. Denormalize the Data
Another approach is to create a separate table specifically for querying by
name. This is a common pattern in ScyllaDB and Cassandra, where data is denormalized to optimize for specific query patterns.CQL Schema:
Insert Data into Both Tables:
When adding a user, insert the data into both the
userstable and theusers_by_nametable.Query by Name:
Now you can query the
users_by_nametable efficiently:When to Use: Use this approach if querying by
nameis critical and you want to avoid the overhead of secondary indexes.Recommendation
ALLOW FILTERINGas a quick fix.Denormalization is the most scalable and performant solution for large datasets, as it aligns with ScyllaDB's design principles.
Fixed in latest commit, ended up just doing a secondary index