Data Modeling in NoSQL: Embedding vs Referencing
In MongoDB, "Data that is accessed together, should be stored together." Unlike SQL, there is no generic "best" schema; it depends entirely on your application's query patterns.
1. Embedding (One-to-Few)
Embedding documents inside each other (e.g., tags in a post) results in high performance because the data is retrieved in a single IO operation.
json code{ "_id": 1, "title": "MongoDB Basics", "tags": ["NoSQL", "Database", "Performance"] }
2. Referencing (One-to-Many / Many-to-Many)
When data is large or frequently updated independently, use referencing (Normalizing). You link documents using their _id.
- User Document and its Posts. You don't want to embed 10,000 posts inside a user document (16MB limit!).
3. The 16MB Limit and Outliers
A single document in MongoDB cannot exceed 16MB. If you have "outliers" (e.g., a post with 1 million comments), you must use a separate collection for comments.
Schema Design Patterns
- Bucket Pattern: Grouping data by time (e.g., 100 sensor readings in one document).
- Schema Versioning Pattern: Keeping a
schema_versionfield to handle data migrations gracefully without downtime.
Avoid "Joins" in your Code
While MongoDB has $lookup, it is expensive. If you find yourself using $lookup too often, your schema is probably too normalized. Aim for data locality.