SQL COUNT and DISTINCT: Practical Guide with Real-World Examples & Performance Tips

Hey there! So you're working with SQL databases and need to figure out how many unique orders came through last quarter? Or maybe identify how many distinct customers bought specific products? That's exactly where count and distinct in SQL become your best friends. I remember struggling with this early in my career – writing queries that returned duplicate records and wondering why my reports looked wrong. That frustration taught me to master these tools properly.

What COUNT Actually Does (And Where People Mess Up)

Let's cut straight to it: COUNT() tallies rows in your results. But here's where folks get tripped up:

Syntax	What It Counts	NULL Handling	Real-Life Use Case
`COUNT(*)`	All rows in table/results	Includes NULLs	Total website visits
`COUNT(column)`	Non-NULL values in that column	Excludes NULLs	Completed user registrations

I once saw a junior dev spend hours debugging why user counts didn't match – turned out they used COUNT(email) when emails could be NULL. Rookie mistake. Use COUNT(*) when you need absolute row counts. Simple as that.

Common COUNT Patterns You'll Use Daily

Total records: SELECT COUNT(*) FROM orders
Active users: SELECT COUNT(user_id) FROM users WHERE last_login > '2023-01-01'
Orders by status: SELECT status, COUNT(*) FROM orders GROUP BY status

DISTINCT: Your Duplicate Data Killer

DISTINCT eliminates duplicate rows from your results. But it's not magic – I've seen queries slow to a crawl because someone used DISTINCT on a huge table without indexes. Here's what you need to know:

Scenario	Without DISTINCT	With DISTINCT	Why It Matters
Product colors table	Red, Red, Blue, Green	Red, Blue, Green	Accurate inventory options
Customer countries	USA, UK, USA, FR, UK	USA, UK, FR	Marketing region planning

Where DISTINCT bites you: When applied to multiple columns. SELECT DISTINCT city, country gives unique combos – "Paris, France" and "Paris, Texas" count as different entries. Makes sense when you think about it, but catches many off guard.

When You SHOULDN'T Use DISTINCT

On primary keys (they're already unique!)
As quick fix for JOIN duplicates (fix the JOIN condition instead)
With large text/BLOB columns (kills performance)

The Power Combo: COUNT(DISTINCT)

This is where count and distinct in SQL becomes magical. Need to know how many unique visitors your site had yesterday? SELECT COUNT(DISTINCT user_id) FROM site_activity WHERE date = CURRENT_DATE Done. But watch these gotchas:

Database Compatibility Note

Most databases support COUNT(DISTINCT column) but some (like older MySQL versions) choke on multiple columns. For counting distinct pairs:

SELECT COUNT(*) FROM (SELECT DISTINCT city, country FROM customers) AS temp

Real talk: I once tried COUNT(DISTINCT) on a 500-million-row table without proper indexes. The query ran for 40 minutes before I killed it. Lesson learned – always check execution plans!

Essential COUNT(DISTINCT) Patterns

Business Question	SQL Solution	Performance Tip
How many unique products sold per category?	`SELECT category, COUNT(DISTINCT product_id) FROM sales GROUP BY category`	Add index on (category, product_id)
Daily unique visitors	`SELECT visit_date, COUNT(DISTINCT user_id) FROM visits GROUP BY visit_date`	Partition table by date
Customers buying multiple items	`SELECT COUNT(DISTINCT customer_id) FROM orders WHERE item_count > 1`	Filter before counting distinct

Performance Tuning: Making COUNT DISTINCT Fly

Let's be honest - count distinct in SQL can be slow. Here's what I've learned optimizing these queries:

Index smartly: Add indexes on columns used in DISTINCT, WHERE, and GROUP BY
Approximate counts: Use APPROX_COUNT_DISTINCT() in BigQuery/SparkSQL for 97% accurate results at 10x speed
Pre-aggregate: Create summary tables nightly for frequent queries

Warning: NULLs in COUNT DISTINCT

COUNT(DISTINCT email) ignores NULL values completely. If you need to count NULLs as distinct values, do this:


    SELECT COUNT(DISTINCT COALESCE(email, 'NULL_PLACEHOLDER'))

(But honestly? Reconsider your data model if NULLs need special counting)

GROUP BY vs DISTINCT: Which to Choose?

Both deduplicate data but serve different purposes:

Operation	Best For	Performance	My Preference
`DISTINCT`	Simple duplicate removal	Faster for small datasets	When I need just unique values
`GROUP BY`	Aggregations (COUNT, SUM, AVG)	Better for large grouped data	When counting distinct per group

Pro tip: For complex aggregations, GROUP BY almost always outperforms DISTINCT + subqueries. Test both with EXPLAIN PLAN.

When GROUP BY Replaces DISTINCT

Instead of:


  SELECT DISTINCT department FROM employees

You can write:


  SELECT department FROM employees GROUP BY department

They return identical results but GROUP BY often executes faster (especially with proper indexes).

Real-World Problems Solved by Count and Distinct in SQL

Let's get practical. Here are actual scenarios where these commands save the day:

E-Commerce Analysis

Unique daily shoppers: COUNT(DISTINCT customer_id)
Products in multiple categories: COUNT(DISTINCT category_id) per product
Abandoned carts: COUNT(DISTINCT session_id) WHERE checkout_complete = 0

User Analytics

Monthly active users (MAU): COUNT(DISTINCT user_id) WHERE last_active BETWEEN ...
Feature adoption rate: COUNT(DISTINCT user_id) who used feature X
Cross-platform usage: COUNT(DISTINCT device_id) per user

Honestly? I use some form of count distinct SQL in almost every analytics report I build. It's that fundamental.

Advanced Tactics: Window Functions and CTEs

When basic COUNT DISTINCT isn't enough:

Counting Distinct Over Time

Rolling 7-day unique users:


  SELECT

    date,

    COUNT(DISTINCT user_id) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

  FROM visits

Complex Counting with CTEs

Users purchasing from multiple categories:


  WITH user_cats AS (

    SELECT user_id, COUNT(DISTINCT category) AS cat_count

    FROM purchases

    GROUP BY user_id

  )

  SELECT

    COUNT(*) FILTER (WHERE cat_count >= 3) AS power_users,

    COUNT(*) FILTER (WHERE cat_count = 1) AS single_cat_users

  FROM user_cats

Your COUNT DISTINCT FAQ Answered

Does COUNT(DISTINCT) work with multiple columns?

In standard SQL, no. Use a subquery: SELECT COUNT(*) FROM (SELECT DISTINCT col1, col2 FROM table) or check your DB's docs (some like Redshift support COUNT(DISTINCT col1, col2)).

Why is my COUNT DISTINCT query so slow?

Three main culprits: Missing indexes on the distinct columns, huge dataset sizes, or doing DISTINCT before filtering. Add WHERE clauses first, create appropriate indexes, and consider approximate counts.

How does NULL behave in COUNT DISTINCT?

All NULLs are treated as identical. COUNT(DISTINCT nullable_col) counts NULL as one distinct value if present. But COUNT(DISTINCT col) excludes NULLs entirely - careful with this inconsistency!

Can I use DISTINCT and ORDER BY together?

Absolutely: SELECT DISTINCT department FROM employees ORDER BY department. But avoid ordering unselected columns as some databases might complain.

What's faster: DISTINCT or GROUP BY?

For simple deduplication, they're similar. But for aggregations, GROUP BY usually outperforms COUNT DISTINCT in SQL. Always test with your specific data and indexes.

Mistakes I've Made (So You Don't Have To)

After 10 years of SQL work, here's my hall of shame with count and distinct in SQL:

Overusing DISTINCT as a band-aid: Masked underlying JOIN issues that later caused data inconsistencies
Forgetting NULLs in COUNT: Led to undercounted metrics in financial reports
COUNT DISTINCT on UUID columns: Brought analytics database to its knees
Assuming DISTINCT applies to first column only: Wasted hours debugging "wrong" counts

The worst? Running a COUNT DISTINCT on production during peak hours. Got paged at 2 AM when the system slowed to a crawl. Don't be like me - test big queries on replicas first!

Choosing the Right Tool for the Job

Alternatives to COUNT DISTINCT and when they shine:

Technique	Best Used When	Example
EXISTS()	Checking for presence (ignore counts)	"Did customer buy product X?"
ROW_NUMBER()	Getting first/last occurrence	"Customer's initial purchase"
Approximate functions	Speed critical, precision optional	Real-time dashboard metrics
Bitmaps	Extremely high cardinality data	User activity across billions

At the end of the day, nothing beats count and distinct SQL for straightforward unique value counting. Just use it wisely.

Got war stories with COUNT DISTINCT? I once spent three days debugging why counts decreased after a "fix" - turned out someone changed a LEFT JOIN to INNER JOIN. The joys of SQL! What's your battle scar?

September 26, 2025