MongoDB

Complete MongoDB cheat sheet covering CRUD, query operators, aggregation pipeline, Mongoose ODM, indexing, and schema design patterns.

9 sections28 cards

MongoDB is a document database. Data is stored as BSON documents (binary JSON) in collections. No tables, no rows, no fixed schema by default.

Hierarchy: Database → Collections → Documents. A document is a JSON-like object. A collection is a group of documents (loosely like a SQL table). Documents in the same collection don't need to have the same fields.

Every document gets a unique _id field automatically — an ObjectId by default. ObjectId encodes a timestamp, machine id, and random bytes — it's sortable by creation time.

MongoDB is schema-flexible but that doesn't mean schemaless in practice. Use Mongoose (ODM) in Node to enforce structure, validations, and relationships at the application layer.

insert

insertOne({ name: "Alice", age: 25 }) — insert one document. Returns insertedId.

insertMany([doc1, doc2]) — insert array. Returns insertedIds.

If _id is not provided, MongoDB generates an ObjectId automatically.

If _id already exists, insertOne throws a duplicate key error.

find

findOne({ email: "a@b.com" }) — first matching document or null.

find({ age: { $gt: 18 } }) — returns a cursor. Call .toArray() or iterate.

find({}).sort({ name: 1 }) — 1 = ascending, -1 = descending.

find({}).limit(10).skip(20) — pagination.

find({}, { projection: { name: 1, _id: 0 } }) — include/exclude fields. 1 = include, 0 = exclude. Can't mix include and exclude (except _id).

countDocuments(filter) — count matching documents.

distinct("field", filter) — unique values of a field.

update

updateOne(filter, update) — update first match.

updateMany(filter, update) — update all matches.

replaceOne(filter, newDoc) — replace entire document (keeps _id).

Always use update operators — don't pass raw object or it replaces the doc.

Update operators:

$set: { field: val } — set field value.

$unset: { field: "" } — remove a field.

$inc: { count: 1 } — increment by value.

$push: { tags: "new" } — push to array.

$pull: { tags: "old" } — remove from array by value.

$addToSet: { tags: "x" } — push only if not already present.

$pop: { arr: 1 } — remove last (1) or first (-1) element.

$rename: { oldName: "newName" } — rename a field.

{ upsert: true } option — insert if not found.

findOneAndUpdate(filter, update, { returnDocument: "after" }) — returns the document. Useful to get updated doc in one operation.

delete

deleteOne(filter) — delete first match.

deleteMany(filter) — delete all matches.

deleteMany({}) — delete ALL documents in collection (dangerous).

findOneAndDelete(filter) — delete and return the document.

drop() — delete the entire collection including indexes.

comparison

$eq — equal (default, rarely written explicitly)

$ne — not equal

$gt / $gte — greater than / or equal

$lt / $lte — less than / or equal

$in: [val1, val2] — value is in array

$nin: [val1, val2] — value is NOT in array

Example: { age: { $gte: 18, $lte: 65 } }

logical

$and: [cond1, cond2] — all conditions must match. Implicit when you use multiple fields.

$or: [cond1, cond2] — at least one must match.

$nor: [cond1, cond2] — none must match.

$not: { $gt: 5 } — inverts the condition.

Example: { $or: [{ age: { $lt: 18 } }, { age: { $gt: 65 } }] }

element & evaluation

$exists: true/false — field exists or not.

$type: "string" — field is of specified BSON type.

$regex: /pattern/ — regex match on string field.

$expr: { $gt: ["$field1", "$field2"] } — compare two fields in same document. Use aggregation expressions.

$where: "fn" — JS expression. Avoid — slow and security risk.

array operators

{ tags: "mongodb" } — matches if array contains this value.

{ tags: { $all: ["a", "b"] } } — array must contain all values.

{ tags: { $size: 3 } } — array has exactly 3 elements.

$elemMatch: { score: { $gt: 80 } } — at least one array element matches all conditions. Important when matching multiple conditions on the same element.

Without $elemMatch: { score: { $gt: 80 }, grade: "A" } could match conditions across different array elements.

The aggregation pipeline transforms documents through a sequence of stages. Each stage takes documents in, outputs documents to the next stage. Extremely powerful — can join, group, reshape, compute, and filter.

db.collection.aggregate([stage1, stage2, ...])

filtering & shaping

$match — filter documents. Same syntax as find queries. Put early in pipeline to reduce documents before expensive stages.

$project — include/exclude/reshape fields. Can add computed fields. { fullName: { $concat: ["$first", " ", "$last"] } }

$limit — keep only first N documents.

$skip — skip N documents.

$sort: { field: 1 } — sort documents.

$count: "total" — output single document with count.

grouping

$group: { _id: "$category", total: { $sum: "$price" } } — group by field, compute accumulations.

Accumulators: $sum, $avg, $min, $max, $count, $push (array of values), $addToSet (unique array), $first, $last.

_id: null — group ALL documents into one (compute grand totals).

_id: { year: { $year: "$date" }, month: { $month: "$date" } } — group by multiple fields or expressions.

joins & arrays

$lookup — left join from another collection:

{ from: "orders", localField: "_id", foreignField: "userId", as: "orders" } — results in an array field.

$unwind: "$arrayField" — deconstruct array — outputs one document per array element. Use after $lookup if you want flat documents instead of nested arrays.

$unwind: { path: "$arr", preserveNullAndEmpty: true } — keep documents with missing/empty arrays.

$addFields — add or overwrite fields without removing others (vs $project).

$replaceRoot: { newRoot: "$nested" } — promote a nested document to the top level.

aggregation expressions

String: $concat, $toUpper, $toLower, $substr, $trim, $split

Math: $add, $subtract, $multiply, $divide, $mod, $round, $abs

Date: $year, $month, $dayOfMonth, $dateToString

Array: $size, $slice, $arrayElemAt, $filter, $map, $reduce

Conditional: $cond: { if, then, else }, $ifNull: [field, default], $switch

Type: $type, $toInt, $toString, $toDate, $convert

schema & model

Schema defines the structure and validation rules. Model is the compiled schema — your interface to the collection.

new Schema({ name: String, age: Number }) — shorthand types.

Full field definition: { type: String, required: true, unique: true, default: "active", trim: true, lowercase: true, minlength: 3, maxlength: 50, enum: ["a","b"], match: /regex/ }

Nested objects: just nest schema objects. Or use new Schema({}) for subdocument with its own _id.

Array of strings: [String]. Array of subdocuments: [new Schema({...})].

Ref (populate): { type: Schema.Types.ObjectId, ref: "User" }

mongoose.model("User", userSchema) — model name = singular PascalCase. Mongoose pluralizes and lowercases to find the collection ("users").

schema options & virtuals

{ timestamps: true } — auto-adds createdAt and updatedAt fields. Use this always.

{ versionKey: false } — removes the __v field.

Virtuals — computed properties not stored in DB:

schema.virtual("fullName").get(function() { return this.first + " " + this.last })

{ toJSON: { virtuals: true } } — include virtuals when converting to JSON (e.g. res.json(doc)).

Transform: { toJSON: { transform(doc, ret) { delete ret.__v; ret.id = ret._id; delete ret._id } } } — clean up output.

mongoose CRUD

new User(data) then .save() — create and save. Runs validators.

User.create(data) — shorthand for new + save.

User.find(filter) — returns array of Mongoose documents.

User.findById(id) — shorthand for findOne({ _id: id }).

User.findByIdAndUpdate(id, update, { new: true, runValidators: true })new: true returns updated doc. runValidators applies schema validation to update.

User.findByIdAndDelete(id)

User.findOneAndUpdate(filter, update, options)

User.updateMany(filter, update)

User.deleteMany(filter)

User.countDocuments(filter)

Query chaining: User.find().sort().limit().select().lean()

.lean() — returns plain JS objects instead of Mongoose documents. Much faster for read-only operations. No .save(), no virtuals, no getters.

middleware (hooks)

Run functions before or after operations. Defined on schema before compiling to model.

schema.pre("save", async function() { ... }) — before saving. this = document. Use for hashing passwords.

schema.post("save", function(doc) { ... }) — after saving.

schema.pre("findOneAndUpdate", function() { ... })this = query, not document. Use this.getFilter(), this.getUpdate().

Document middleware: save, validate, remove, init.

Query middleware: find, findOne, updateOne, deleteOne, etc.

Aggregate middleware: aggregate.

findByIdAndUpdate bypasses document middleware — validators and pre-save hooks don't run unless you pass runValidators: true and use query middleware.

populate

User.findById(id).populate("posts") — replaces ObjectId reference with actual document.

Select specific fields: .populate("posts", "title createdAt")

Nested populate: .populate({ path: "posts", populate: { path: "comments" } })

Multiple fields: .populate("author").populate("category")

Populate is multiple DB queries under the hood. For complex cases, $lookup in an aggregation pipeline is more efficient.

instance & static methods

Instance method — called on a document instance:

schema.methods.comparePassword = async function(candidate) { return bcrypt.compare(candidate, this.password) }

Static method — called on the Model:

schema.statics.findByEmail = function(email) { return this.findOne({ email }) }

Query helper — chain on queries:

schema.query.byStatus = function(status) { return this.where({ status }) } then User.find().byStatus("active")

indexing strategy

Indexes make queries fast. Without an index, MongoDB scans every document (collection scan). With an index, it jumps directly to matching documents.

db.collection.createIndex({ field: 1 }) — single field index. 1 = ascending, -1 = descending.

createIndex({ field1: 1, field2: -1 }) — compound index. Order matters — matches queries that filter/sort by field1, or field1+field2. Not field2 alone.

createIndex({ field: 1 }, { unique: true }) — unique constraint.

createIndex({ field: 1 }, { sparse: true }) — only index documents where field exists.

createIndex({ field: 1 }, { expireAfterSeconds: 3600 }) — TTL index. Auto-deletes documents after time. Field must be a Date.

createIndex({ title: "text", body: "text" }) — text index. Enables $text search.

createIndex({ location: "2dsphere" }) — geospatial index.

embedding vs referencing

Embed when: data is always accessed together, child data belongs exclusively to parent, document won't exceed 16MB limit, array won't grow unbounded.

Reference when: data is shared across documents, child data grows unboundedly (comments on viral post), you need to query child independently.

Example — embed: user's address inside user document. Reference: user's orders in a separate orders collection.

Hybrid: embed a summary, reference the full object. E.g. store last 10 comments embedded, rest in separate collection.

common patterns

Bucket pattern — group time-series data into buckets (e.g. one document per hour of sensor readings). Reduces document count, improves aggregation.

Outlier pattern — store typical case efficiently, use a flag to handle rare large cases differently.

Computed pattern — pre-compute and store aggregated values (e.g. average rating) that would be expensive to compute every request. Update on write.

Attribute pattern — for documents with many similar fields that vary (product attributes). Store as array of key-value pairs instead of many sparse fields. Easier to index.

Subset pattern — store hot data (recent 10 orders) in main document, rest in a separate collection.

relationships

One-to-one — embed the related document or reference by ID.

One-to-many (few) — embed array of documents directly.

One-to-many (many/unbounded) — reference from child back to parent (userId on each order). Don't embed.

Many-to-many — two collections, each stores array of IDs referencing the other. Or a join collection if relationship has its own data.

The key question: how is this data accessed? Optimize schema around your access patterns, not your data relationships.

transactions

MongoDB supports multi-document ACID transactions since v4.0 (replica sets) and v4.2 (sharded clusters).

session = await client.startSession()

session.withTransaction(async () => { ... }) — all operations in the callback are atomic. Auto-retries on transient errors.

Pass { session } to every operation inside the transaction.

Transactions have performance overhead. For single-document operations, MongoDB is already atomic — no transaction needed. Design schema to minimize cross-document transactions.

Mongoose connection: mongoose.connect(MONGO_URI, { dbName: "myapp" })

Connection string formats: local mongodb://localhost:27017/dbname, Atlas mongodb+srv://user:pass@cluster.mongodb.net/dbname.

Connection events: mongoose.connection.on("connected", fn), "error", "disconnected".

Connection pooling: Mongoose maintains a pool of connections. Default pool size = 5. Increase with { maxPoolSize: 10 } for high-load apps.

Graceful shutdown: mongoose.connection.close() on SIGTERM. Close connection before process exits.

mongoose.set("debug", true) — logs all Mongoose operations to console. Useful in development to see generated queries.

mongoose.set("strict", true) — default. Fields not in schema are ignored on save. Set to false to allow extra fields (not recommended).

common traps

Mongoose findByIdAndUpdate skips pre("save") hooks and validators by default. Pass { runValidators: true } and write query middleware separately.

Updating without operators replaces the document. Always use $set.

$push allows duplicates. Use $addToSet for unique arrays.

Model.find() returns Mongoose documents (heavy). Use .lean() for plain objects when you don't need document methods.

ObjectId is not a string — compare with .equals() or toString() in JS. Mongoose handles this in queries but not in JS comparisons.

16MB document limit — embedding unbounded arrays (all comments, all messages) will eventually break.

things to know cold

MongoDB is horizontally scalable via sharding — distributes data across multiple machines using a shard key.

Replica sets — one primary + multiple secondaries. Primary handles writes. Secondaries replicate and handle reads. Automatic failover if primary goes down.

Atlas — MongoDB's managed cloud service. Handles replica sets, backups, scaling automatically.

BSON types beyond JSON: ObjectId, Date, Binary, Decimal128, Int32/Int64, RegExp.

Aggregation vs find: for simple queries use find. For grouping, joining, computing, reshaping — use aggregation.

$text search requires a text index. For production full-text search, use Atlas Search (built on Lucene) or Elasticsearch.