As a Database-as-a-Service platform hosting several hundred thousand MongoDB cloud databases, we field plenty of calls from developers looking to increase their database’s performance. And, without question, the most common performance issue we see is a result of improper indexing (or a complete lack thereof). The good news: It’s fixable.
Proper indexing is critical because even one unindexed query is enough to prompt significant performance degradation. MongoDB performs a collection scan when a query is not indexed, iterating through every single document in a collection to find results. By contrast, an indexed query performs and scales much better because it inspects far fewer documents in memory. Without sufficient indexing, the hardware costs for a performant database increase significantly.
Here are some indexing tips for keeping your database happy, healthy, and humming.
Create Indexes on Query Select Fields
Let’s say you have a collection named “people” and you want to search for all documents where the person is named “Adam.” The query would be:
db.people.find({"name": "Adam"})
To properly index this query, you would create a Single Field Index on the “name” field:
db.collection.createIndex({"name": 1})
By creating an index, the query no longer has to examine all the documents in the “people” collection. Instead, only documents matching {“name”: “Adam”} will enter memory.
What if you want to search on multiple fields? For example, find all documents where the “name” field matches “Adam” and the “age” field matches 30. The query would be:
db.people.find({"name": "Adam", "age": 30})
To properly index this query, you would create a Compound Index on both the “name” and “age” fields:
db.collection.createIndex({"name": 1, "age": 1})
Similarly, only documents matching {“name”: “Adam”, “age”: 30}) will enter memory.
Sort Efficiently Using Indexes
When you use the MongoDB sort() method, you can specify the sort order—ascending (1) or descending (-1)—for the result set. If you do not index for the sort field, MongoDB will sort the results at query time. Sorting at query time uses CPU resources and delays the response to the application. However, when an index includes all fields used to select and sort the result set in proper order, MongoDB does not need to sort at query time. Instead, results are already sorted in the index, and can be returned immediately.
Let’s revisit the “people” collection and search for all documents where the person is named “Adam,” but now sort all the results by shortest to tallest. The query would be:
db.people.find({"name": "Adam"}).sort({"height": 1})
To properly index this query, you would create a Compound Index on both the “name” and “height” fields:
db.collection.createIndex({"name": 1, "height": 1})
With this index, you also can efficiently sort the results from tallest to shortest because MongoDB can traverse an index in either direction.
Note, however, that searching on multiple fields with a sort is more complex. If you are using a sort() with compound indexes, you should review the MongoDB documentation.
Handle Complex Fields as Appropriate
MongoDB offers developers many powerful ways to structure data. You may choose to utilize more complex data fields like arrays, subdocuments, or geospatial features. You also will want to ensure that your queries and indexes are properly configured for these fields. More information on best indexing practices for these fields can be found in the MongoDB documentation:
- Arrays: https://docs.mongodb.com/manual/core/index-multikey/
- Subdocuments: https://docs.mongodb.com/manual/core/index-single/#create-an-index-on-embedded-document
- Geospatial: https://docs.mongodb.com/manual/applications/geospatial-indexes/
Exercise Care with New Application Queries and ad hoc Reporting
Unindexed queries have the potential to significantly disrupt your database’s working set and performance. It’s not uncommon to unintentionally run an unindexed query. You might be excited to test a new application feature that utilizes a new query, or need to generate an ad hoc report for the marketing team. Before you execute a new query, be sure to first check the collection you’re running the query on to see if the right index is there! You can list all the indexes on a given collection with the getIndexes() command:
db.people.getIndexes()
Ideally, an optimal index already exists for the query. However, if there’s no index, you’ll want to consult with your team to determine next steps. You may decide to create a new index for the query, or perhaps run the query during off-peak hours on a Secondary replica set member.