Archive
The Hidden Cost of ORDER BY NEWID()
Fetching a random row from a table is a surprisingly common requirement — random banner ads, sample data, rotating API credentials. The instinctive solution in SQL Server is elegant-looking but conceals a serious performance trap.
-- Looks innocent. Isn't.SELECT TOP 1 * FROM LARGE_TABLE ORDER BY NEWID()
What SQL Server actually does
NEWID() generates a fresh GUID for every single row in the table. SQL Server must then sort the entire result set by those GUIDs before it can hand back the top one. On a table with a million rows you are generating a million GUIDs, sorting a million rows, and discarding 999,999 of them.
The problem: On large tables, ORDER BY NEWID() performs a full table scan and a full sort — O(n log n) work — regardless of how many rows you need. It cannot use any index for ordering.
A faster alternative: seek, don’t sort
The key insight is to convert the “random sort” into a “random seek”. If we can generate a random Id value cheaply and then let the clustered index do the work, we avoid scanning the table entirely.
DECLARE @Min INT = (SELECT MIN(Id) FROM LARGE_TABLE)DECLARE @Max INT = (SELECT MAX(Id) FROM LARGE_TABLE)SELECT TOP 1 *FROM LARGE_TABLEWHERE Id >= @Min + ABS(CHECKSUM(NEWID()) % (@Max - @Min + 1))ORDER BY Id ASC
MAX(Id) and MIN(Id) are single index seeks on the primary key. CHECKSUM(NEWID()) generates a random integer without sorting anything. The WHERE Id >= clause then performs a single index seek from that point forward, and ORDER BY Id ASC TOP 1 picks up the very next row.
The result: Two index seeks to get the range, one index seek to find the row. Constant time regardless of table size.
Performance at a glance
| Approach | Reads | Sort | Scales with table size? |
|---|---|---|---|
| ORDER BY NEWID() | Full scan | Full sort | O(n log n) |
| CHECKSUM seek | 3 index seeks | None | O(1) |
Three caveats to know
1. Id gaps cause mild bias. If rows have been deleted, gaps in the Id sequence mean rows immediately after a gap are slightly more likely to be selected. For most use cases — sampling, rotation, A/B testing — this is an acceptable trade-off.
2. Ids may not start at 1. This is why we use @Min rather than hardcoding zero. If your identity seed started at 1000, NEWID() % MAX(Id) would generate values 0–999, which would never match any row and you’d always get the first row in the table.
3. CHECKSUM can return INT_MIN. ABS(INT_MIN) overflows back to negative in SQL Server. The fix is to apply the modulo before the ABS, keeping the intermediate value safely within range.
When you don’t need randomness at all
For round-robin rotation across a fixed set of rows — such as alternating between API credentials or cookie sessions — true randomness is unnecessary overhead. A deterministic slot based on the current second is even cheaper:
-- Rotates across N accounts, one per second, no writes requiredWHERE Slot = DATEPART(SECOND, GETUTCDATE()) % TotalAccounts
This resolves to a constant integer comparison — effectively a single index seek — and scales to any number of accounts automatically. No tracking table, no writes, no contention.
The takeaway: whenever you reach for ORDER BY NEWID(), ask whether you actually need true randomness or just approximate distribution. In most production scenarios, a cheap seek beats an expensive sort by several orders of magnitude.
Cost-Effective SQL Server Database Restore on Microsoft #Azure: Using SMB Shares

1) Motivation Behind the Process
Managing costs efficiently on Microsoft Azure is a crucial aspect for many businesses, especially when it comes to managing resources like SQL Server databases. One area where I found significant savings was in the restoration of SQL Server databases.
Traditionally, to restore databases, I was using a managed disk. The restore process involved downloading a ZIP file, unzipping it to a .bak file, and then restoring it to the main OS disk. However, there was a significant issue with this setup: the cost of the managed disk.
Even when database restores happened only once every six months, I was still paying for the full capacity of the managed disk—500GB of provisioned space. This means I was paying for unused storage space for extended periods, which could be a significant waste of resources and money.
To tackle this issue, I switched to using Azure Storage Accounts with file shares (standard, not premium), which provided a more cost-effective approach. By restoring the database from an SMB share, I could pay only for the data usage, rather than paying for provisioned capacity on a managed disk. Additionally, I could delete the ZIP and BAK files after the restore process was complete, further optimizing storage costs.
2) Issues and Solutions
While the transition to using an Azure Storage Account for database restores was a great move in terms of cost reduction, it wasn’t without its challenges. One of the main hurdles I encountered during this process was SQLCMD reporting that the .bak file did not exist, even though it clearly did.
Symptoms of the Problem
The error message was:
3201, Level 16, State 2, Server [ServerName], Line 1
Cannot open backup device '\\<UNC Path>\Backups\GeneralPurpose.bak'. Operating system error 3(The system cannot find the path specified.)
Msg 3013, Level 16, State 1, Server [ServerName], Line 1
RESTORE DATABASE is terminating abnormally.
This was perplexing because I had confirmed that the .bak file existed at the UNC path and that the path was accessible from my system.
Diagnosis
To diagnose the issue, I started by enabling xp_cmdshell in SQL Server. This extended stored procedure allows the execution of operating system commands, which is very helpful for troubleshooting such scenarios.
First, I enabled xp_cmdshell by running the following commands:
-- Enable advanced options
EXEC sp_configure 'show advanced options', 1;
RECONFIGURE;
-- Enable xp_cmdshell
EXEC sp_configure 'xp_cmdshell', 1;
RECONFIGURE;
Once xp_cmdshell was enabled, I ran a simple DIR command to verify if SQL Server could access the backup file share:
EXEC xp_cmdshell 'dir \\<UNC Path>\Backups\GeneralPurpose.bak';
The result indicated that the SQL Server service account did not have proper access to the SMB share, and that’s why it couldn’t find the .bak file.
Solution
To resolve this issue, I had to map the network share explicitly within SQL Server using the net use command, which allows SQL Server to authenticate to the SMB share.
Here’s the solution I implemented:
EXEC xp_cmdshell 'net use Z: \\<UNC Path> /user:localhost\<user> <PASSWORD>';
Explanation
- Mapping the Network Drive:
Thenet usecommand maps the SMB share to a local drive letter (in this case,Z:), which makes it accessible to SQL Server. - Authentication:
The/user:flag specifies the username and password needed to authenticate to the share. In my case, I used an account (e.g.,localhost\fsausse) with the correct credentials. - Accessing the Share:
After mapping the network drive, I could proceed to access the.bakfile located in the SMB share by using its mapped path (Z:). SQL Server would then be able to restore the database without the “file not found” error.
Once the restore was completed, I could remove the drive mapping with:
EXEC xp_cmdshell 'net use Z: /delete';
This approach ensured that SQL Server had the necessary permissions to access the file on the SMB share, and I could restore my database efficiently, only paying for the data usage on Azure Storage.
Conclusion
By transitioning from a managed disk to an SMB share on Azure Storage, I significantly reduced my costs during database restores. The issue with SQL Server not finding the .bak file was quickly diagnosed and resolved by enabling xp_cmdshell, mapping the network share, and ensuring proper authentication. This process allows me to restore databases in a more cost-effective manner, paying only for the data used during the restore, and avoiding unnecessary storage costs between restores.
For businesses looking to optimize Azure costs, this method provides an efficient, scalable solution for managing large database backups with minimal overhead.