The Hidden Cost of ORDER BY NEWID()

Home > Uncategorized > The Hidden Cost of ORDER BY NEWID()

The Hidden Cost of ORDER BY NEWID()

March 19, 2026 Infinite Loop Development Ltd Leave a comment Go to comments

Fetching a random row from a table is a surprisingly common requirement — random banner ads, sample data, rotating API credentials. The instinctive solution in SQL Server is elegant-looking but conceals a serious performance trap.

			
-- Looks innocent. Isn't.
SELECT TOP 1 * FROM LARGE_TABLE ORDER BY NEWID()

What SQL Server actually does

NEWID() generates a fresh GUID for every single row in the table. SQL Server must then sort the entire result set by those GUIDs before it can hand back the top one. On a table with a million rows you are generating a million GUIDs, sorting a million rows, and discarding 999,999 of them.

The problem: On large tables, ORDER BY NEWID() performs a full table scan and a full sort — O(n log n) work — regardless of how many rows you need. It cannot use any index for ordering.

A faster alternative: seek, don’t sort

The key insight is to convert the “random sort” into a “random seek”. If we can generate a random Id value cheaply and then let the clustered index do the work, we avoid scanning the table entirely.

			
DECLARE @Min INT = (SELECT MIN(Id) FROM LARGE_TABLE)
DECLARE @Max INT = (SELECT MAX(Id) FROM LARGE_TABLE)
SELECT TOP 1 *
FROM LARGE_TABLE
WHERE Id >= @Min + ABS(CHECKSUM(NEWID()) % (@Max - @Min + 1))
ORDER BY Id ASC

		

MAX(Id) and MIN(Id) are single index seeks on the primary key. CHECKSUM(NEWID()) generates a random integer without sorting anything. The WHERE Id >= clause then performs a single index seek from that point forward, and ORDER BY Id ASC TOP 1 picks up the very next row.

The result: Two index seeks to get the range, one index seek to find the row. Constant time regardless of table size.

Performance at a glance

Approach	Reads	Sort	Scales with table size?
ORDER BY NEWID()	Full scan	Full sort	O(n log n)
CHECKSUM seek	3 index seeks	None	O(1)

Three caveats to know

1. Id gaps cause mild bias. If rows have been deleted, gaps in the Id sequence mean rows immediately after a gap are slightly more likely to be selected. For most use cases — sampling, rotation, A/B testing — this is an acceptable trade-off.

2. Ids may not start at 1. This is why we use @Min rather than hardcoding zero. If your identity seed started at 1000, NEWID() % MAX(Id) would generate values 0–999, which would never match any row and you’d always get the first row in the table.

3. CHECKSUM can return INT_MIN. ABS(INT_MIN) overflows back to negative in SQL Server. The fix is to apply the modulo before the ABS, keeping the intermediate value safely within range.

When you don’t need randomness at all

For round-robin rotation across a fixed set of rows — such as alternating between API credentials or cookie sessions — true randomness is unnecessary overhead. A deterministic slot based on the current second is even cheaper:

			
-- Rotates across N accounts, one per second, no writes required
WHERE Slot = DATEPART(SECOND, GETUTCDATE()) % TotalAccounts

This resolves to a constant integer comparison — effectively a single index seek — and scales to any number of accounts automatically. No tracking table, no writes, no contention.

The takeaway: whenever you reach for ORDER BY NEWID(), ask whether you actually need true randomness or just approximate distribution. In most production scenarios, a cheap seek beats an expensive sort by several orders of magnitude.

Categories: Uncategorized Tags: ai, database, sql, sql-server, technology

Comments (0) Trackbacks (0) Leave a comment Trackback

No comments yet.

No trackbacks yet.

Network Programming in .NET