Scaling multi-tenant Go applications: Choosing the right database partitioning approach

Wait 5 sec.

Multi-tenant applications face a fundamental challenge: how to efficiently store and query data for tenants of vastly different sizes? Consider the typical scenario where your platform serves both enterprise clients with hundreds of thousands of users, as well as small businesses with just a handful. With traditional database partitioning strategies you are likely to run into these common issues: Partition imbalance: Large tenants create oversized partitions while small tenants waste allocated resources Hot partitions: High-activity tenants overwhelm individual database partitions, creating performance bottlenecks Inefficient queries: User-specific lookups require scanning entire tenant datasets Resource contention: Mixed workloads compete for the same database resourcesAzure Cosmos DB has been a go-to solution for multi-tenant applications due to its global distribution, automatic scaling, and flexible data models. Its partition-based architecture naturally aligns with tenant isolation requirements, making it attractive for SaaS platforms, IoT applications, and content management systems.However, even with these capabilities, the fundamental multi-tenant partitioning challenges persist. This blog post examines how these issues might manifest and explores an approach to solving multi-tenant scaling challenges in Go applications using Azure Cosmos DB. You'll learn how to implement this using the Go SDK for Azure Cosmos DB, focusing on how to achieve efficient data distribution and query performance.[alert type="note" heading=""]Check the GitHub repository for the code examples used in this blog[/alert]Challenges with a multi-tenant SaaS solutionImagine you're building a multi-tenant SaaS platform that manages user sessions and activities across different organizations using Azure Cosmos DB. In such a setup, tenant variability is a significant challenge. Enterprise clients may have over 50,000 users generating millions of session events, while small businesses might only have 10 to 50 users with minimal activity. Mid-market companies typically fall in between, with 500 to 5,000 users and moderate usage. This wide range of tenant sizes and activity levels creates unique challenges for data partitioning and resource allocation in the database.This is how you might define your user session data model using a single partition key:type UserSession struct { ID string `json:"id"` TenantID string `json:"tenantId"` // Single partition key UserID string `json:"userId"` SessionID string `json:"sessionId"` Activity string `json:"activity"` Timestamp time.Time `json:"timestamp"`}This approach has several challenges. First, partition size imbalance occurs as enterprise tenants generate massive 20GB+ partitions, while small tenants use minimal storage, resulting in uneven resource utilization across physical partitions. Second, hot partition bottlenecks can develop when large tenants reach the 10,000 RU/s physical partition limit during peak usage periods. Third, user queries become inefficient because looking up individual user sessions requires scanning entire tenant partitions, which consumes unnecessary Request Units. Cross-tenant analytics also suffer, as queries spanning multiple tenants become expensive cross-partition operations.Hierarchical Partition Keys to the rescueHierarchical partition keys (HPKs) help implement subpartitioning that allows you to define up to three levels of partition key hierarchy. This leads to better data distribution and query routing compared to traditional single-level partitioning. Instead of forcing all tenant data into a single partition boundary, you are able to create logical subdivisions that align with your actual access patterns.Mapping this to the the multi-tenant solution challenges, hierarchical partition keys allow you to define a three-level partitioning scheme: Level 1: Primary partition key (e.g., tenantId) - provides tenant isolation Level 2: Secondary partition key (e.g., userId) - distributes data within tenants Level 3: Tertiary partition key (e.g., sessionId) - provides fine-grained distributionThis creates a logical partition path like instead of just ["Enterprise-Corp"]. Large tenants can be subdivided by user and session, eliminating hot partitions. Instead of one massive "Enterprise-Corp" partition, you get manageable partitions like: ["Enterprise-Corp", "user-1001", "session-abc123"], ["Enterprise-Corp", "user-1002", "session-def456"], etc.Now, we can refactor the user session data model as such:type UserSession struct { ID string `json:"id"` TenantID string `json:"tenantId"` // Level 1: Tenant isolation UserID string `json:"userId"` // Level 2: User distribution SessionID string `json:"sessionId"` // Level 3: Session granularity Activity string `json:"activity"` Timestamp time.Time `json:"timestamp"`}Your queries can now be efficiently routed to only the subset of physical partitions that contain the relevant data. Specifying the full or partial subpartitioned partition key path effectively avoids a cross-partition query across all the parititions, which is a common problem with single partition keys. Session details: WHERE tenantId = 'Enterprise-Corp' AND userId = 'user-1001' AND sessionId = 'session-abc123' provides single-partition access User-specific queries: WHERE tenantId = 'Enterprise-Corp' AND userId = 'user-1001' pinpoints exact data location Tenant-wide queries: WHERE tenantId = 'Enterprise-Corp' only targets relevant partitionsEach logical partition (tenant-user-session combination) can scale independently, allowing tenant data to exceed the traditional 20GB limit and maintain optimal performance. Targeted queries consume fewer Request Units by avoiding unnecessary cross-partition scans, directly reducing operational expenses.Hierarchical Partition Keys in action with the Go SDK for Azure Cosmos DBTo explore the concepts, we will use a Go application that loads sample user session data into Azure Cosmos DB and queries it using the hierarchical partition keys.Load data into Azure Cosmos DBRun the loader to populate the database with sample data that uses hierarchical partition keys. Its a CLI application that generates user session data for users in different tenant types (Enterprise, Mid-market, Small business) and inserts it into the Azure Cosmos DB container.Clone the repository and change into the load directory:git clone https://github.com/abhirockzz/cosmosdb-go-hierarchical-partition-keyscd cosmosdb-go-hierarchical-partition-keys/loadBuild the data loader application and run it. The database and container will be created automatically if they do not exist.go build -o data-loader main.go./data-loader -rows 100 -database -container -endpoint "https://your-account.documents.azure.com:443/"Here is the code snippet demonstrating how the container is created with hierarchical partition keys://...partitionKeyDef := azcosmos.PartitionKeyDefinition{ Kind: azcosmos.PartitionKeyKindMultiHash, Version: 2, // Version 2 is required for hierarchical partition keys Paths: []string{ "/tenantId", // Level 1: Tenant isolation "/userId", // Level 2: User distribution "/sessionId", // Level 3: Session granularity },}// Create container propertiescontainerProperties := azcosmos.ContainerProperties{ ID: containerName, PartitionKeyDefinition: partitionKeyDef,}// Create container with 400 RU/s throughputthroughputProperties := azcosmos.NewManualThroughputProperties(400)_, err = databaseClient.CreateContainer(ctx, containerProperties, &azcosmos.CreateContainerOptions{ ThroughputProperties: &throughputProperties,})//....UpsertItem is used to add data:// Create hierarchical partition key (TenantID, UserID, SessionID)partitionKey := azcosmos.NewPartitionKeyString(session.TenantID). AppendString(session.UserID). AppendString(session.SessionID)// Insert the record using UpsertItem (insert or update if exists)_, err = containerClient.UpsertItem(ctx, partitionKey, sessionJSON, nil)Lets dive into the queries that demonstrate how to retrieve data using hierarchical partition keys.Query patternsLet's examine how different query patterns perform with hierarchical partition keys. To execute these examples, you can comment out the relevant sections in the main function of the query/main.go file, set the required environment variables, and run the application.export COSMOS_DB_ENDPOINT=https://your-account.documents.azure.com:443/export COSMOS_DB_DATABASE_NAME=export COSMOS_DB_CONTAINER_NAME=cd cosmosdb-go-hierarchical-partition-keys/querygo run main.go1. Point ReadThis is the most efficient query type, where you retrieve a single item using its unique ID and full partition key path. This avoids any cross-partition overhead.Take a look at the executePointRead function that performs a point read operation:func executePointRead(id, tenantId, userId, sessionId string) { // Create a partition key using the full partition key values pk := azcosmos.NewPartitionKeyString(tenantId).AppendString(userId).AppendString(sessionId) // Perform a point read operation resp, err := container.ReadItem(context.Background(), pk, id, nil) if err != nil { log.Fatalf("Failed to read item: %v", err) } var queryResult QueryResult err = json.Unmarshal(resp.Value, &queryResult) //.....}2. Session-Specific DataThis query is routed to the single logical and physical partition that contains the data for the specified values of tenantId, tenantId, and sessionId.func queryWithFullPartitionKey(tenantID, userID, sessionID string) { query := "SELECT * FROM c WHERE c.tenantId = @tenantId AND c.userId = @userId AND c.sessionId = @sessionId" pkFull := azcosmos.NewPartitionKeyString(tenantID).AppendString(userID).AppendString(sessionID) pager := container.NewQueryItemsPager(query, pkFull, &azcosmos.QueryOptions{ QueryParameters: []azcosmos.QueryParameter{ {Name: "@tenantId", Value: tenantID}, {Name: "@userId", Value: userID}, {Name: "@sessionId", Value: sessionID}, }, }) for pager.More() { page, err := pager.NextPage(context.Background()) if err != nil { log.Fatal(err) } for _, _item := range page.Items { var queryResult QueryResult err = json.Unmarshal(_item, &queryResult) // log the results } }}3. User-Specific Data (Targeted Cross-Partition)This is a targeted cross-partition query that returns data for a specific user in the tenant and routed to specific subset of logical and physical partition(s) that contain data for the specified values of tenantId and userId.func queryWithTenantAndUserID(tenantID, userID string) { query := "SELECT * FROM c WHERE c.tenantId = @tenantId AND c.userId = @userId" // since we don't have the full partition key, we use an empty partition key emptyPartitionKey := azcosmos.NewPartitionKey() pager := container.NewQueryItemsPager(query, emptyPartitionKey, &azcosmos.QueryOptions{ QueryParameters: []azcosmos.QueryParameter{ {Name: "@tenantId", Value: tenantID}, {Name: "@userId", Value: userID}, }, }) for pager.More() { page, err := pager.NextPage(context.Background()) if err != nil { log.Fatal(err) } fmt.Println("==========================================") for _, _item := range page.Items { var queryResult QueryResult err = json.Unmarshal(_item, &queryResult) // log the results } }}4. Tenant-Wide Data (Efficient Cross-Partition)This is a targeted cross-partition query that returns data for all users in a tenant and routed to a specific subset of logical and physical partition(s) that contain data for the specified value of tenantId.The queryWithSinglePKParameter is a function that lets you query with a single partition key parameter – this can be either tenantId, userId, or sessionId.func queryWithSinglePKParameter(paramType, paramValue string) { if paramType != "tenantId" && paramType != "userId" && paramType != "sessionId" { log.Fatalf("Invalid parameter type: %s", paramType) } query := fmt.Sprintf("SELECT * FROM c WHERE c.%s = @param", paramType) emptyPartitionKey := azcosmos.NewPartitionKey() pager := container.NewQueryItemsPager(query, emptyPartitionKey, &azcosmos.QueryOptions{ QueryParameters: []azcosmos.QueryParameter{ {Name: "@param", Value: paramValue}, }, }) for pager.More() { page, err := pager.NextPage(context.Background()) if err != nil { log.Fatal(err) } fmt.Printf("Results for %s: %s\n", paramType, paramValue) fmt.Println("==========================================") for _, _item := range page.Items { var queryResult QueryResult err = json.Unmarshal(_item, &queryResult) // log the results } }}5. User or Session Across All Tenants (Fan-Out)Both types of queries will be routed to all physical partitions, resulting in a fan-out cross-partition query.SELECT * FROM c WHERE c.userId = 'user-1001'SELECT * FROM c WHERE c.sessionId = 'session-abc123'[alert type="warning" heading=""]This type of query is not efficient and should be avoided in production scenarios. It is included here for completeness, but you should design your application to avoid such queries whenever possible. [/alert]ConclusionMulti-tenant applications face inherent scaling challenges with traditional single-level partitioning: tenant size variability, hot partitions, and inefficient query patterns that impact both performance and cost. Hierarchical partition keys in Azure Cosmos DB address these issues by enabling intelligent data distribution across multiple partition levels, maintaining tenant isolation while achieving better resource utilization. By aligning your partition strategy with actual access patterns, you can build applications that scale naturally with tenant growth while maintaining predictable performance characteristics.Check out the documentation for the azcosmos package (Go SDK). For more information on hierarchical partition keys, refer to the official documentation.About Azure Cosmos DBAzure Cosmos DB is a fully managed and serverless distributed database for modern app development, with SLA-backed speed and availability, automatic and instant scalability, and support for open-source PostgreSQL, MongoDB, and Apache Cassandra. To stay in the loop on Azure Cosmos DB updates, follow us on X, YouTube, and LinkedIn.To easily build your first database, watch our Get Started videos on YouTube and explore ways to dev/test free.