Data Modeling in NoSQL: Embedding vs Referencing

In MongoDB, "Data that is accessed together, should be stored together." Unlike SQL, there is no generic "best" schema; it depends entirely on your application's query patterns.

1. Embedding (One-to-Few)

Embedding documents inside each other (e.g., tags in a post) results in high performance because the data is retrieved in a single IO operation.

json code
{
  "_id": 1,
  "title": "MongoDB Basics",
  "tags": ["NoSQL", "Database", "Performance"]
}

2. Referencing (One-to-Many / Many-to-Many)

When data is large or frequently updated independently, use referencing (Normalizing). You link documents using their _id.

User Document and its Posts. You don't want to embed 10,000 posts inside a user document (16MB limit!).

3. The 16MB Limit and Outliers

A single document in MongoDB cannot exceed 16MB. If you have "outliers" (e.g., a post with 1 million comments), you must use a separate collection for comments.

Schema Design Patterns

Bucket Pattern: Grouping data by time (e.g., 100 sensor readings in one document).
Schema Versioning Pattern: Keeping a schema_version field to handle data migrations gracefully without downtime.

Avoid "Joins" in your Code

While MongoDB has $lookup, it is expensive. If you find yourself using $lookup too often, your schema is probably too normalized. Aim for data locality.

Study Guide

Data Modeling

Learning Objectives