Archive
The Misleading IndexOutOfRangeException That Means “Your List Isn’t Thread-Safe”
If you’ve ever seen a stack trace like this in a .NET application:
System.IndexOutOfRangeException: Index was outside the bounds of the array. at System.Collections.Generic.List`1.Enumerator.MoveNext() at System.Linq.Enumerable.WhereListIterator`1.MoveNext() at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection) at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source) at YourCode.SomeMethod(...)
…and you stared at the offending line — something innocuous like myList.Where(x => x.IsActive).ToList() — wondering how on earth a LINQ query could be indexing outside an array, you’ve run into one of the most misleadingly-named exceptions in the framework.
The message says “index out of bounds.” The actual problem is a race condition.
What the stack trace is telling you
Read the frames from the bottom up. Your code called ToList(). ToList() calls the List<T> constructor that takes an IEnumerable<T>. That constructor enumerates the Where iterator, which in turn enumerates the underlying List<T> via its Enumerator. And it’s Enumerator.MoveNext() that throws.
So the iterator is walking your list, and at some point it tries to read element N, and N is past the end of the internal array.
How can that happen? The list got smaller — or its internal buffer got swapped — while the enumerator was mid-walk.
List<T> is implemented as a wrapper around a T[] array plus a _size field. The enumerator captures a reference to the list at construction time, then on each MoveNext() it increments an index and reads list._items[index]. If another thread calls Remove, Clear, or triggers a resize via Add between two MoveNext() calls, the array your enumerator is reading from may have been replaced with a smaller one, or the items may have been shuffled. The result: an index that was valid when you started iterating is no longer valid, and you get IndexOutOfRangeException.
Why you don’t get the “nice” exception
List<T> does have a version-check mechanism. Every mutation increments an internal _version field, and the enumerator records the version it started with. If MoveNext() notices the version has changed, it throws InvalidOperationException: Collection was modified; enumeration operation may not execute — the exception most .NET developers have seen at least once and immediately recognise as a concurrency or mid-loop-mutation bug.
The catch: that version check happens after the index increment and array access. If the racing thread mutates the list in a way that shrinks the array between those two operations, you hit the raw IndexOutOfRangeException before the version check ever runs. Same root cause, much worse error message.
This is also why the bug is so frustrating to reproduce. It depends on precise interleaving of two threads down to the instruction level. You can hammer it in a test loop and not see it for a million iterations, then it triggers twice in five minutes in production.
The minimal reproduction
var list = new List<int>();for (int i = 0; i < 1000; i++) list.Add(i);// Thread A — repeatedly enumerateTask.Run(() =>{ while (true) { var copy = list.Where(x => x > 0).ToList(); }});// Thread B — repeatedly mutateTask.Run(() =>{ var rng = new Random(); while (true) { if (list.Count > 0) list.RemoveAt(rng.Next(list.Count)); list.Add(rng.Next()); }});
Run that for a few seconds and you’ll get either InvalidOperationException or IndexOutOfRangeException — sometimes both, on different runs. The exception you get is essentially a coin flip determined by exactly when the second thread’s mutation lands relative to the first thread’s bounds check.
The diagnostic giveaway
The single most useful signal in the stack trace is this frame:
at System.Collections.Generic.List`1.Enumerator.MoveNext()
IndexOutOfRangeException originating from List<T>.Enumerator.MoveNext almost always means concurrent modification. The enumerator’s own code is straightforward — there’s no realistic way for it to compute a bad index on its own. Something outside the enumerator changed the list while it was looking away.
If your stack trace shows that frame, stop looking for off-by-one errors in your LINQ predicate. Start looking for which other thread is writing to the same list.
Fixing it
The fix depends on the access pattern. In rough order of preference:
Use the right collection type. System.Collections.Concurrent has thread-safe equivalents tuned for different shapes of access:
ConcurrentBag<T>— many writers, occasional bulk drain. No keyed lookup or removal by item.ConcurrentQueue<T>/ConcurrentStack<T>— FIFO / LIFO producer-consumer pipelines.ConcurrentDictionary<TKey, TValue>— by far the most generally useful. Supports thread-safe add, remove, lookup, and snapshot enumeration via.Keysand.Values. If you’re keying items by an ID, this is almost always what you want.
Crucially: .Values on a ConcurrentDictionary returns a snapshot, so iterating it is safe even if other threads are mutating the dictionary at the same time. No exceptions, no locks.
Lock around access. If you’re stuck with List<T> — maybe the API surface is fixed, or the contention is low enough that the overhead doesn’t matter — wrap every read and every write in a lock on a dedicated private object. Every read and every write. Missing one is enough to bring the bug back.
private static readonly object _lock = new object();private static readonly List<Thing> _items = new List<Thing>();public static List<Thing> GetActiveThings(){ lock (_lock) { return _items.Where(t => t.IsActive).ToList(); }}public static void AddThing(Thing t){ lock (_lock) { _items.Add(t); }}
The ToList() inside the lock is deliberate — it forces the enumeration to complete before the lock is released, so the returned list is a safe, isolated copy the caller can work with at leisure.
Snapshot, then iterate. A halfway measure for read-mostly workloads:
var snapshot = Volatile.Read(ref _items).ToArray();
…paired with copy-on-write semantics for the writers. This is the pattern behind ImmutableList<T> from System.Collections.Immutable, which is worth knowing about for scenarios with many readers and rare writers.
What I’d take away from this
Two things.
First: when you see IndexOutOfRangeException coming out of a LINQ chain on a List<T>, your first hypothesis should be “another thread is writing to this list,” not “my predicate has a bug.” The stack trace looks like a logic error and it almost never is.
Second: List<T> being non-thread-safe is one of those facts every .NET developer knows in the abstract and still trips over in practice, because the framework gives you no help at all until something explodes. There’s no ThrowIfShared mode, no Roslyn analyzer that flags static List<T> fields, no runtime check at write time. The only feedback you get is a confusing exception from deep inside an enumerator, possibly weeks after deployment.
The fix is almost always “use the right type from System.Collections.Concurrent.” It costs you nothing in code clarity and saves you from a class of bug that’s genuinely painful to track down once it’s loose in production.
Thread Pool Exhaustion in ASP.NET: The Async Database Trap
If you’ve ever migrated a working ASP.NET application from synchronous database calls to async, and suddenly found yourself hitting connection pool timeouts under load, you’ve likely fallen into one of the most subtle and destructive traps in the .NET ecosystem: sync-over-async deadlock.
The Symptom
Everything works fine in development. You push to production, traffic picks up, and then:
Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
Your database isn’t overloaded. Your queries are fast. But connections are being swallowed and never returned.
What Actually Happens
To understand the deadlock, you first need to understand two things: the ASP.NET synchronization context, and what blocking on an async method actually does.
The Synchronization Context
In classic ASP.NET (WebForms and MVC on the traditional pipeline), each request runs with a synchronization context that ensures continuations — the code that runs after an await — resume on the same thread that started the request. This is a design choice that simplifies state management, but it has a fatal implication when you block.
The Deadlock Sequence
Consider this code:
csharp
// Somewhere in a sync method:var result = GetDataAsync().Result; // ← the problem
Here’s what happens step by step:
- Thread A handles the request and calls
GetDataAsync().Result - Thread A is now blocked — it’s sleeping, waiting for the Task to complete
GetDataAsync()runs its SQL query asynchronously and completes- The async machinery looks for a thread to resume on — but the synchronization context says it must resume on Thread A
- Thread A is blocked waiting for the task. The task is waiting for Thread A. Neither can proceed.
This is a classic deadlock. The thread never releases, the SQL connection it holds is never returned to the pool, and every subsequent request that hits the same code path adds another frozen thread and another stranded connection.
Why It Only Surfaces Under Load
With light traffic, the thread pool has spare threads. The continuation sneaks onto a different free thread and completes before the pool runs dry. As concurrency increases, all available threads become blocked, no free thread exists to run any continuation, and the whole system seizes.
This is why the bug can pass development and staging entirely undetected.
The Broken Pattern
csharp
public DataTable GetUserData(string userId){ // Blocking on an async method — dangerous in ASP.NET return GetUserDataAsync(userId).Result;}public async Task<DataTable> GetUserDataAsync(string userId){ using var conn = new SqlConnection(connectionString); using var cmd = new SqlCommand("sp_GetUser @1", conn); cmd.Parameters.AddWithValue("@1", userId); await conn.OpenAsync(); using var reader = await cmd.ExecuteReaderAsync(); var dt = new DataTable(); dt.Load(reader); return dt;}
The async method itself is fine. The problem is the caller blocking on it with .Result.
The Fix: Async All the Way Down
The only correct solution is to await the entire call chain without any blocking calls. There must be no .Result, .Wait(), or .GetAwaiter().GetResult() anywhere in the path from the entry point to the database.
csharp
// ✅ Correct: full async chainpublic async Task<DataTable> GetUserDataAsync(string userId){ using var conn = new SqlConnection(connectionString); using var cmd = new SqlCommand("sp_GetUser @1", conn); cmd.Parameters.AddWithValue("@1", userId); await conn.OpenAsync(); using var reader = await cmd.ExecuteReaderAsync(); var dt = new DataTable(); dt.Load(reader); return dt;}
And the caller:
csharp
var data = await GetUserDataAsync(userId); // ✅ not .Result
The WebForms Special Case
WebForms Page_Load is synchronous by signature, which tempts developers to block. The correct bridge is RegisterAsyncTask:
csharp
protected void Page_Load(object sender, EventArgs e){ RegisterAsyncTask(new PageAsyncTask(DoWorkAsync));}private async Task DoWorkAsync(){ var data = await GetUserDataAsync(userId); // ... use data}
RegisterAsyncTask is ASP.NET’s own sanctioned mechanism for running async work from a sync page lifecycle event. It does not block, does not hold threads, and allows the page pipeline to handle async completion correctly.
Coexisting Sync and Async
A pragmatic migration strategy — rather than converting everything at once — is to maintain both sync and async versions of database methods, and use each only from the appropriate call path:
csharp
// Sync version — for legacy sync call pathspublic static DataTable BoundPopulateDataTable(string command, string[] parameters){ using var conn = new SqlConnection(ConnectionString); using var cmd = new SqlCommand(command, conn); cmd.Parameters.AddRange(ConvertSqlParameters(parameters).ToArray()); conn.Open(); using var reader = cmd.ExecuteReader(); var dt = new DataTable(); dt.Load(reader); return dt;}// Async version — only called from async pathspublic static async Task<DataTable> BoundPopulateDataTableAsync(string command, string[] parameters){ using var conn = new SqlConnection(ConnectionString); using var cmd = new SqlCommand(command, conn); cmd.Parameters.AddRange(ConvertSqlParameters(parameters).ToArray()); await conn.OpenAsync(); using var reader = await cmd.ExecuteReaderAsync(); var dt = new DataTable(); dt.Load(reader); return dt;}
The discipline required is simple: never call the async version from a sync context, and never block on it.
Quick Diagnostic Checklist
If you’re seeing connection pool timeouts after an async migration, scan your codebase for these patterns:
| Pattern | Risk |
|---|---|
someTask.Result | ❌ Deadlock |
someTask.Wait() | ❌ Deadlock |
someTask.GetAwaiter().GetResult() | ❌ Deadlock |
await someTask | ✅ Safe |
RegisterAsyncTask(...) | ✅ Safe WebForms bridge |
Summary
The async deadlock in ASP.NET is invisible under low load, catastrophic under real traffic, and trivially easy to introduce during a migration. The root cause is always the same: blocking a thread on an async operation inside a synchronization context that needs that thread to resume.
The rule is simple and absolute: if you make a method async, every caller must also be async, all the way to the top of the call stack. There are no shortcuts. .Result is not a shortcut — it’s a time bomb.
Done correctly, async database access is genuinely more scalable. Done incorrectly, it’s worse than sync in every way.
The dangers of Parallel.ForEach(… , async (item)) in IIS
A single, trivial exception — one that your code already has a catch block for — shouldn’t be able to bring down your entire IIS web server. But it can, and it will, if you combine Parallel.ForEach with an async lambda. This post explains exactly why it happens, how to spot it in the Windows Event Log, and how to fix it permanently.
The Setup
You have a method that needs to perform the same async operation against multiple items — calling a set of external APIs, processing a batch of records, sending a collection of requests. You reach for Parallel.ForEach because it sounds like the right tool: parallel work, multiple items, run them all at once. You even add a try/catch inside the lambda because you’re being responsible. It looks like this:
Parallel.ForEach(items, async (item) =>{ try { var result = await ProcessItemAsync(item); lock (results) { results.Add(result); } } catch (ItemNotFoundException) { // item not found - fine, skip it } catch (Exception ex) { lock (errors) { errors.Add(ex); } }});
This looks safe. It has error handling. It uses async/await. It compiles without a single warning. And it will crash your IIS worker process (w3wp.exe) the moment any exception is thrown after an await.
Why It Crashes: The async void Trap
Parallel.ForEach was designed before async/await existed in C#. It expects a synchronous Action<T> delegate. When you pass it an async lambda, something subtle and dangerous happens: the compiler silently treats the lambda as returning void rather than Task.
This is the async void anti-pattern, and it has one devastating property: any exception thrown inside it cannot be caught by any caller. It escapes directly to the thread’s synchronisation context — and on a raw ThreadPool thread, that means it goes completely unhandled.
Here is the exact sequence of events that kills your server:
Parallel.ForEachfires the lambda for each item in the collection- Each lambda hits the first
awaitand suspends, returning control immediately Parallel.ForEachsees each lambda return (as void) and considers its job doneParallel.ForEachexits — the method returns to its caller — everything looks fine- Milliseconds later, the awaited operations complete and the continuations resume on raw ThreadPool threads
- An exception is thrown inside one of those continuations
- The
try/catchinside the lambda? It only catches exceptions thrown before the firstawait. After theawait, the lambda has already returned as far asParallel.ForEachis concerned - The exception has no owner, no observer, no catch block — it propagates to the ThreadPool itself
- In .NET 4.0 and later, an unhandled exception on a ThreadPool thread terminates the process
w3wp.execrashes. IIS restarts the application pool. All in-flight requests are lost
The particularly insidious part is that the try/catch gives you a false sense of security. You can see it right there in the code. But it doesn’t work the way you expect once an await is involved.
A Minimal Reproduction
You don’t need a complex codebase to reproduce this. The following is all it takes:
public static void CrashIIS(){ Parallel.ForEach(new[] { 1, 2, 3 }, async (item) => { await Task.Delay(100); // simulate any async I/O throw new Exception("This kills w3wp.exe"); // After the await, this runs on an orphaned ThreadPool thread // The process terminates }); // Parallel.ForEach has already returned here // The crash happens 100ms later}
Call that from any ASP.NET request handler — a controller action, an HttpHandler, anywhere — and your application pool will crash within moments. The caller gets no exception. The HTTP response may even succeed before the crash occurs. The next user to make any request gets a 503.
Even wrapping the call in a try/catch at the call site doesn’t help:
try{ Parallel.ForEach(new[] { 1, 2, 3 }, async (item) => { await Task.Delay(100); throw new Exception("Crash"); });}catch (Exception ex){ // This NEVER fires. // The exception doesn't happen until after Parallel.ForEach // has already exited this try block entirely. Log(ex);}
The catch block is long gone by the time the exception is thrown. This is what makes the pattern so dangerous — it looks exception-safe at every level, and isn’t.
How It Appears in the Windows Event Log
When this crash occurs, it leaves a very specific fingerprint in the Windows Event Log. Open Event Viewer → Windows Logs → Application and look for two entries appearing within seconds of each other.
Entry 1: .NET Runtime — Unhandled Exception
Source: .NET Runtime
Event ID: 1026
Application: w3wp.exeFramework Version: v4.0.30319Description: The process was terminated due to an unhandled exception.Exception Info: YourNamespace.YourException at YourClass.YourMethod() at SomeClass+<SomeMethod>d__3.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task) at SomeClass+<>c__DisplayClass4_0+<<YourParallelMethod>b__0>d.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Threading.ExecutionContext.RunInternal(...) at System.Threading.ExecutionContext.Run(...) at System.Threading.QueueUserWorkItemCallback.ExecuteWorkItem() at System.Threading.ThreadPoolWorkQueue.Dispatch()
The key indicators are at the bottom of the stack trace:
QueueUserWorkItemCallback.ExecuteWorkItem()ThreadPoolWorkQueue.Dispatch()
These tell you the exception surfaced on a raw ThreadPool work item with no managed owner — the classic signature of an orphaned async continuation. You will also see compiler-generated state machine names like <YourParallelMethod>b__0>d.MoveNext() in the trace, confirming the exception came from inside an async lambda. The angle brackets and the b__ notation are the C# compiler’s naming convention for anonymous methods and lambdas.
Entry 2: Application Error — w3wp.exe Fault
Source: Application Error
Event ID: 1000
Faulting application name: w3wp.exeFaulting module name: KERNELBASE.dllException code: 0xe0434352
Exception code 0xe0434352 is the Windows error code for a managed (.NET) exception that has escaped to the Win32 layer. It’s the OS-level record of a .NET exception killing a process. When you see this code combined with KERNELBASE.dll as the faulting module, a .NET unhandled exception is the cause.
What to Look For — Summary
| Signal | Where | What it means |
|---|---|---|
ThreadPoolWorkQueue.Dispatch() at bottom of stack | Event ID 1026, .NET Runtime | Exception from orphaned async continuation |
Compiler-generated names like b__0>d.MoveNext() | Event ID 1026, .NET Runtime | Exception came from inside an async lambda |
Exception code 0xe0434352 | Event ID 1000, Application Error | .NET exception killed the process |
Faulting module: KERNELBASE.dll | Event ID 1000, Application Error | Managed exception, not a native crash |
| Both entries within seconds of each other | Application log | Single event caused immediate process termination |
The Effect on IIS
When w3wp.exe terminates due to an unhandled exception, IIS detects the process death and marks the application pool as faulted. Depending on your Rapid Fail Protection settings (found in IIS Manager → Application Pools → Advanced Settings), IIS will either:
- Restart the worker process automatically — users experience a brief outage and then service resumes, with the first request after restart being slow due to application warm-up
- Disable the application pool if failures occur too frequently within the Rapid Fail Protection window (default: 5 failures in 5 minutes) — this results in a persistent 503 until an administrator manually starts the pool again
This is worth understanding because thread exhaustion and this crash pattern look identical from the outside — both produce 503 errors — but they behave very differently. Thread exhaustion self-recovers when load drops. A crashed application pool requires either automatic restart (if Rapid Fail Protection hasn’t tripped) or manual intervention. If your team is regularly performing IISResets to recover from outages, a crash like this is a more likely culprit than thread exhaustion.
The Fix
The correct replacement for Parallel.ForEach with async work is Task.WhenAll, which is async-native and properly propagates exceptions back to the awaiting caller:
public static async Task<IReadOnlyList<Result>> ProcessAllAsync(IEnumerable<Item> items){ var tasks = items.Select(async item => { try { return await ProcessItemAsync(item); } catch (ItemNotFoundException) { return Result.Empty; } }); // All items processed in parallel. // Exceptions surface here, as AggregateException, to a proper awaiter. return await Task.WhenAll(tasks);}
With Task.WhenAll:
- All items are processed in parallel — no performance regression
- Every async continuation is properly tracked by the Task infrastructure
- Exceptions are collected and re-thrown as
AggregateExceptionwhen awaited — to a caller that can handle them - The process does not terminate
As an immediate safety net while refactoring, you can also add a global handler in Global.asax that prevents process termination from unobserved task exceptions:
// In Application_Start (Global.asax)TaskScheduler.UnobservedTaskException += (sender, args) =>{ Logger.Error("Unobserved task exception", args.Exception); args.SetObserved(); // prevents process termination};
This is a safety net, not a fix — the underlying orphaned tasks still exist and their results are still lost. But it prevents a single unhandled background exception from taking down your entire server while you work through a proper refactor.
The Rule
The rule to remember is simple: never pass an async lambda to Parallel.ForEach. The two are fundamentally incompatible. Parallel.ForEach has no understanding of Task, does not await the work it fires, and any exception thrown after the first await inside your lambda will be orphaned on the ThreadPool. In .NET 4.0 and later, that means process termination.
The pattern is particularly easy to introduce because it compiles cleanly, looks reasonable, and even appears to have proper error handling. The only sign something is wrong is your server going down.
When you need parallel async work, use Task.WhenAll. It was designed for exactly this purpose.
Found this useful? If you’re diagnosing IIS instability, check your application pool’s Rapid Fail Protection settings and review Event Viewer’s Application log for Event ID 1026 with ThreadPoolWorkQueue.Dispatch() at the bottom of the stack trace — that’s the fingerprint that points directly to this pattern.
Batch AI Processing: Why Multithreading is the Wrong Instinct
When developers first encounter a large-scale AI classification job — say, two million records that each need to be sent to an LLM for analysis — the instinct is immediately familiar: spin up threads, parallelise the work, saturate the API. It’s the same pattern that works for database processing, file I/O, HTTP scraping. More threads, more throughput.
With LLM APIs, that instinct leads you straight into a wall. And the wall has a name: TPM.
The Problem with Multithreading LLM Calls
Most LLM APIs — OpenAI included — impose a Tokens Per Minute (TPM) limit. This is a rolling window, not a per-request limit. Every token you send in a prompt, and every token the model returns, counts against it.
The naive multithreaded approach burns through this budget in a way that’s both wasteful and hard to control:
The system prompt repeats on every request. If your prompt is 700 tokens and you’re running 20 threads firing one request each, you’re spending 14,000 tokens per second just on prompt overhead — before the model has classified a single record. With a 200,000 TPM limit, you’ve consumed 4.2 minutes of budget in one second.
Burst behaviour triggers rate limits unpredictably. The TPM limit is a rolling window. Twenty threads firing simultaneously create a spike that can exceed the per-minute budget in seconds, even if your average rate would be well within limits. The API returns 429 errors, your retry logic kicks in, those retries themselves consume tokens, and the situation compounds.
Thread count is a blunt instrument. Dialling concurrency up and down doesn’t map cleanly to token consumption because request latency varies. A batch that takes 500ms doesn’t consume the same tokens as one that takes 1,500ms, but both hold a thread slot for their duration.
The Better Model: Semantic Batching
The insight that changes everything is this: the system prompt is a fixed overhead, and you should amortise it across as many classifications as possible per API call.
Instead of:
Thread 1: [system prompt 700 tokens] + [address 1: 15 tokens] → [result: 15 tokens]Thread 2: [system prompt 700 tokens] + [address 2: 15 tokens] → [result: 15 tokens]...× 20 threadsTotal: 14,000 tokens for 20 classifications
You send:
[system prompt 700 tokens] + [addresses 1-20: 300 tokens] → [results 1-20: 100 tokens]Total: 1,100 tokens for 20 classifications
That’s a 12× reduction in token consumption for the same work. Suddenly your 200,000 TPM budget — which could only sustain ~270 single-record requests per minute — supports ~3,600 classifications per minute. No extra threads needed.
Key Implementation Details
1. Include an ID in Both Request and Response
The most important correctness detail in batch processing is never rely on positional alignment.
If you send 20 addresses and ask the model to return 20 results, it might return 19. Now you don’t know which one it dropped. If you’re matching by position, records from item 7 onwards get silently misclassified.
The fix is to include a unique identifier in both directions:
User message:id=548033: product Xid=548034: product Y...System prompt format instruction:Reply ONLY with a JSON array. Format: [{"id":548033,"c":"E"}, ...]
Now you build a dictionary from the response keyed on id, and match each input item explicitly. A missing id means that specific record gets skipped and retried on the next run. Everything else classifies correctly regardless of what the model dropped.
2. Resolve Labels Locally
The model doesn’t need to return the full label text. "Prime City Professionals" costs tokens on every response item. A single letter costs one token.
Keep a static dictionary in your code:
csharp
private static readonly Dictionary<string, string> Labels = new(){ { "A", "Prime Product" }, { "B", "Budget Product" }, // ...};
The model returns "c":"A", you look up the label locally. This also eliminates a class of hallucination errors where the model invents a label name slightly different from your taxonomy.
Note: even "category" vs "c" matters at scale. In the OpenAI tokenizer, "category" is 3 tokens; "c" is 1. Across 100,000 batch calls, that’s 200,000 tokens — small but free.
3. Track TPM with a Rolling Window, Not Concurrency
Rather than trying to infer safe concurrency from trial and error, measure what you’re actually consuming and throttle directly on that signal.
csharp
// On each successful response, record tokens used with a timestamptokenWindow.Enqueue((DateTime.UtcNow, inputTokens + outputTokens));// Before each request, prune entries older than 60 seconds and sum the restvar cutoff = DateTime.UtcNow.AddSeconds(-60);while (window.Peek().t < cutoff) window.Dequeue();long tpmUsed = window.Sum(x => x.tok);// Throttle graduated to usageif (tpmUsed > tpmLimit * 0.98) Thread.Sleep(2000);else if (tpmUsed > tpmLimit * 0.95) Thread.Sleep(800);else if (tpmUsed > tpmLimit * 0.85) Thread.Sleep(300);
This gives you automatic, self-correcting throttling that responds to real consumption rather than guessing from thread counts. If a batch of records happens to have longer addresses, the window fills faster and the delay kicks in sooner. No manual tuning required.
4. Resumability via Cursor Pagination
For a job that takes hours or days, stopping and restarting must be safe and cheap. The key is two things working together:
Write results immediately after each batch, not at the end of a page. If you crash mid-page, you’ve lost one batch (20 records), not a thousand.
Use a NULL-check filter combined with cursor pagination. The query for unclassified records looks like:
sql
WHERE segment_category IS NULL AND id > {lastId} ORDER BY id LIMIT 1000
On restart, lastId resets to 0, but the IS NULL filter automatically skips everything already classified. The cursor (id > lastId) keeps the query fast on large tables — OFFSET pagination slows to a crawl at millions of rows because the database still has to scan all preceding rows to find the offset position.
5. Handle Partial Batches Gracefully with Skip vs Error
Not all failures are equal. Distinguish between:
- Error: something went wrong that warrants logging (HTTP 500, persistent 429 after retries, DB connection failure). These need attention.
- Skip: the record wasn’t returned in this batch response. Leave it NULL in the database, it will be picked up automatically on the next run. No log noise needed.
This distinction keeps your error output meaningful. If every missing batch item logs as an error, a run with 0.1% skip rate produces thousands of error lines that mask real problems.
The Result
What started as a job estimated at 16–67 days with a naive multithreaded approach settled to around 7 hours using semantic batching — processing two million records through a rate-limited API without a single configuration change to the API account.
The throughput improvement didn’t come from more concurrency. It came from being smarter about what gets sent in each request.
The general principle applies beyond LLM classification: whenever you have a fixed overhead per API call (authentication, context, schema), the correct optimisation is to amortise that overhead across as much work as possible per call, not to fire more calls in parallel.
Summary of Patterns
| Pattern | Naive approach | Better approach |
|---|---|---|
| Throughput | More threads | Larger batches |
| Rate limiting | Catch 429, retry | Track TPM rolling window, throttle proactively |
| Result matching | Positional array index | ID-keyed dictionary |
| Label resolution | Ask model for full text | Return code, resolve locally |
| Resumability | Track page offset | NULL-check filter + cursor pagination |
| Failure handling | All failures are errors | Skip vs Error distinction |
| DB resilience | Crash on connection drop | Exponential backoff retry |
The instinct to parallelise is correct in principle — you want to keep the API busy. But with token-limited LLM APIs, the right parallelism is within a single request, not across many simultaneous ones.
The Hidden Cost of ORDER BY NEWID()
Fetching a random row from a table is a surprisingly common requirement — random banner ads, sample data, rotating API credentials. The instinctive solution in SQL Server is elegant-looking but conceals a serious performance trap.
-- Looks innocent. Isn't.SELECT TOP 1 * FROM LARGE_TABLE ORDER BY NEWID()
What SQL Server actually does
NEWID() generates a fresh GUID for every single row in the table. SQL Server must then sort the entire result set by those GUIDs before it can hand back the top one. On a table with a million rows you are generating a million GUIDs, sorting a million rows, and discarding 999,999 of them.
The problem: On large tables, ORDER BY NEWID() performs a full table scan and a full sort — O(n log n) work — regardless of how many rows you need. It cannot use any index for ordering.
A faster alternative: seek, don’t sort
The key insight is to convert the “random sort” into a “random seek”. If we can generate a random Id value cheaply and then let the clustered index do the work, we avoid scanning the table entirely.
DECLARE @Min INT = (SELECT MIN(Id) FROM LARGE_TABLE)DECLARE @Max INT = (SELECT MAX(Id) FROM LARGE_TABLE)SELECT TOP 1 *FROM LARGE_TABLEWHERE Id >= @Min + ABS(CHECKSUM(NEWID()) % (@Max - @Min + 1))ORDER BY Id ASC
MAX(Id) and MIN(Id) are single index seeks on the primary key. CHECKSUM(NEWID()) generates a random integer without sorting anything. The WHERE Id >= clause then performs a single index seek from that point forward, and ORDER BY Id ASC TOP 1 picks up the very next row.
The result: Two index seeks to get the range, one index seek to find the row. Constant time regardless of table size.
Performance at a glance
| Approach | Reads | Sort | Scales with table size? |
|---|---|---|---|
| ORDER BY NEWID() | Full scan | Full sort | O(n log n) |
| CHECKSUM seek | 3 index seeks | None | O(1) |
Three caveats to know
1. Id gaps cause mild bias. If rows have been deleted, gaps in the Id sequence mean rows immediately after a gap are slightly more likely to be selected. For most use cases — sampling, rotation, A/B testing — this is an acceptable trade-off.
2. Ids may not start at 1. This is why we use @Min rather than hardcoding zero. If your identity seed started at 1000, NEWID() % MAX(Id) would generate values 0–999, which would never match any row and you’d always get the first row in the table.
3. CHECKSUM can return INT_MIN. ABS(INT_MIN) overflows back to negative in SQL Server. The fix is to apply the modulo before the ABS, keeping the intermediate value safely within range.
When you don’t need randomness at all
For round-robin rotation across a fixed set of rows — such as alternating between API credentials or cookie sessions — true randomness is unnecessary overhead. A deterministic slot based on the current second is even cheaper:
-- Rotates across N accounts, one per second, no writes requiredWHERE Slot = DATEPART(SECOND, GETUTCDATE()) % TotalAccounts
This resolves to a constant integer comparison — effectively a single index seek — and scales to any number of accounts automatically. No tracking table, no writes, no contention.
The takeaway: whenever you reach for ORDER BY NEWID(), ask whether you actually need true randomness or just approximate distribution. In most production scenarios, a cheap seek beats an expensive sort by several orders of magnitude.
Enrich Your Qualtrics Surveys with Real-Time Respondent Data Using AvatarAPI
Qualtrics is excellent at capturing what respondents tell you. But what if you could automatically fill in what you already know — or can discover — the moment they enter their email address?
AvatarAPI resolves an email address into rich profile data in real time: a profile photo, full name, city, country, and the social network behind it. By embedding this lookup directly into your Qualtrics survey flow, you collect more information about each respondent without asking a single extra question.
What Data Does AvatarAPI Return?
When you pass an email address to the API, it returns the following fields — all of which can be mapped into Qualtrics Embedded Data and used anywhere in your survey:
| Field | Description |
|---|---|
Image | URL to the respondent’s profile photo |
Name | Resolved full name |
City | City of residence |
Country | Country code |
Valid | Whether the email address is real and reachable |
IsDefault | Whether the avatar is a fallback/generic image |
Source.Name | The social network the data came from |
RawData | The complete JSON payload |
Watch the Video Walkthrough
Before diving into the written steps, watch this complete tutorial — from configuring the Web Service element to rendering the avatar photo on a results page:
Step-by-Step Integration Guide
You can either follow these steps from scratch, or import the ready-made AvatarAPI.qsf template file directly into Qualtrics (see Step 8).
Step 1 — Get Your AvatarAPI Credentials
Sign up at avatarapi.com to obtain a username and password. A free demo account is available for evaluation — use the credentials demo / demo to test before going live.
The API endpoint you will call is:
https://avatarapi.com/v2/api.aspx
Step 2 — Create an Email Capture Question
In your Qualtrics survey, add a Text Entry question with a Single Line selector. This is where respondents will enter their email address.
Note the Question ID assigned to this question (e.g. QID3) — you will reference it when configuring the Web Service. You can find the QID by opening the question’s advanced options.
Tip: Add email format validation via Add Validation → Content Validation → Email to ensure the value passed to the API is always well-formed.
Step 3 — Add a Web Service Element to Your Survey Flow
Navigate to Survey Flow (the flow icon in the left sidebar). Click Add a New Element Here and choose Web Service. Position this element after the block containing your email question and before your results block.
Configure it as follows:
- URL:
https://avatarapi.com/v2/api.aspx - Method: POST
- Content-Type: application/json
Step 4 — Set the Request Body Parameters
Under Set Request Parameters, switch to Specify Body Params and add these three key-value pairs:
{ "username": "your_username", "password": "your_password", "email": "${q://QID3/ChoiceTextEntryValue}"}
The Qualtrics piped text expression ${q://QID3/ChoiceTextEntryValue} dynamically inserts whatever email the respondent typed. Replace QID3 with the actual QID of your email question if it differs.
Step 5 — Map the API Response to Embedded Data Fields
Scroll down to Map Fields from Response. Add one row for each field you want to capture. The From Response column is the JSON key returned by AvatarAPI; the To Field column is the Embedded Data variable name.
| From Response (JSON key) | To Field (Embedded Data) |
|---|---|
Image | Image |
Name | Name |
Valid | Valid |
City | City |
Country | Country |
IsDefault | IsDefault |
Source.Name | Source.Name |
RawData | RawData |
Note: Qualtrics stores these variables automatically — you don’t need to pre-declare them as Embedded Data elsewhere in the flow, though doing so in the survey flow header keeps things organised.
Step 6 — Display the Avatar Photo on a Results Page
Add a Descriptive Text / Graphic question in a block placed after the Web Service call in your flow.
In the rich-text editor, switch to the HTML source view and paste this snippet:
<img src="${e://Field/Image}" alt="Profile Picture" style="width:100px; height:100px; border-radius:50%;"/>
The expression ${e://Field/Image} inserts the profile photo URL at runtime. The border-radius: 50% gives it a circular crop for a polished appearance.
You can display other fields using the same pattern:
Name: ${e://Field/Name}City: ${e://Field/City}Country: ${e://Field/Country}Source: ${e://Field/Source.Name}
Step 7 — Test with the Demo Account
Before going live, test the integration using the demo credentials. Enter a well-known email address (such as a Gmail address you know has a Google profile photo) to verify the image and data return correctly.
After a test submission, check the Survey Data tab — all mapped fields (Image, Name, City, Country, etc.) should appear as columns alongside your standard question responses.
Rate limits & production use: The demo credentials are shared and rate-limited. Swap in your own account credentials before publishing a live survey to ensure reliable performance.
Step 8 — Import the Ready-Made QSF Template
Rather than building from scratch, you can import the AvatarAPI.qsf file directly into Qualtrics. This gives you a pre-configured survey with the email question, Web Service flow, and image display block already set up.
To import: go to Create a new project → Survey → Import a QSF file and upload AvatarAPI.qsf. Then update the Web Service credentials to your own username and password, and you’re ready to publish.
How the Survey Flow Works
Once configured, your survey flow has this simple three-part structure:
- Block — Respondent enters their email address
- Web Service — Silent POST to
avatarapi.com/v2/api.aspx; response fields mapped to Embedded Data - Block — Results page displays the avatar photo and enriched profile data
The respondent experiences a seamless survey: they enter their email on page one, the API call fires silently between pages, and they see a personalised result — including their own profile photo — on page two.
Practical Use Cases
Lead enrichment surveys — Capture a prospect’s email and automatically resolve their name, city, and country without asking. Append this data to your CRM export from Qualtrics.
Event registration flows — Display the registrant’s photo back to them as a confirmation step, increasing engagement and reducing drop-off.
Email validation checkpoints — Use the Valid flag in a branch logic condition to route respondents with unresolvable addresses to a correction screen or alternative path.
Research panels — Enrich responses with geographic signals without asking respondents to self-report location, reducing survey length and improving data quality.
Get Started
- API documentation & sign-up: avatarapi.com
- API endpoint:
https://avatarapi.com/v2/api.aspx - Demo credentials: username
demo/ passworddemo - Video tutorial: Watch on YouTube
Building a Wildcard Catch-All POP3 Mail Server on Ubuntu
Receive mail for any address on any subdomain — no per-account configuration required
Introduction
This guide walks through setting up a wildcard catch-all mail server on Ubuntu using Postfix and Dovecot. The goal is to receive email sent to any address on any subdomain of your domain — for example, anything@abc.yourdomain.com or test@xyz.yourdomain.com — without having to configure individual mailboxes in advance.
This is particularly useful for testing, disposable address systems, API integrations, and mail sink setups where you want to capture inbound mail programmatically. The server will not send mail — only receive it. Mail older than 24 hours is automatically purged.
Architecture Overview
The stack consists of three components working together:
- Postfix — receives inbound SMTP and delivers to a local virtual mailbox
- Dovecot — serves POP3 access to the mailbox
- A single catch-all mailbox — all mail for all subdomains and addresses funnels into one Maildir
Rather than creating individual accounts, everything is routed to a single mailbox. A POP3 client connects with one username and password to retrieve all mail regardless of which address or subdomain it was sent to.
Part 1 — DNS Configuration
How Wildcard MX Records Work
MX records must point to a hostname, not an IP address directly. This means two DNS records are needed: an MX record pointing to a mail hostname, and an A record resolving that hostname to your server’s IP address.
Create the following records in your DNS provider (AWS Route 53 or equivalent):
| Record Name | Type / Value |
| *.yourdomain.com | MX — 10 mail.yourdomain.com |
| mail.yourdomain.com | A — your.server.ip.address |
The wildcard MX record *.yourdomain.com matches any single-level subdomain lookup. When a sending mail server looks up the MX record for abc.yourdomain.com, it matches the wildcard and is directed to mail.yourdomain.com, which in turn resolves to your server’s IP via the A record.
Note that the wildcard covers one subdomain level deep. Mail to anything@abc.yourdomain.com is covered. A deeper level such as anything@a.b.yourdomain.com would require a separate record.
Verifying DNS Records
From a Windows machine, use nslookup to verify records have propagated:
| # Check the MX recordnslookup -type=MX abc.yourdomain.com # Check the A record for the mail hostnslookup mail.yourdomain.com # Query AWS nameservers directly (before public propagation)nslookup -type=NS yourdomain.comnslookup -type=MX abc.yourdomain.com ns-123.awsdns-45.com |
You can also use dnschecker.org to check propagation across multiple global resolvers simultaneously.
Part 2 — Server Setup
Install Postfix and Dovecot
| sudo apt updatesudo apt install postfix dovecot-pop3d -y |
During the Postfix installation prompt, select Internet Site and enter your domain name (e.g. yourdomain.com) when asked for the mail name.
Configure Postfix
Edit the main Postfix configuration file:
| sudo nano /etc/postfix/main.cf |
Add or update the following values:
| myhostname = mail.yourdomain.commydomain = yourdomain.com # Leave mydestination empty — we use virtual mailboxes insteadmydestination = # Accept mail for any subdomain matching the wildcardvirtual_mailbox_domains = regexp:/etc/postfix/virtual_domainsvirtual_mailbox_base = /var/mail/vhostsvirtual_mailbox_maps = regexp:/etc/postfix/virtual_mailboxvirtual_minimum_uid = 100virtual_uid_maps = static:5000virtual_gid_maps = static:5000 # Required to prevent open relaysmtpd_relay_restrictions = permit_mynetworks, reject_unauth_destination |
Create the virtual domains file — this regexp matches any subdomain of your domain:
| sudo nano /etc/postfix/virtual_domains |
| /^\.+\.yourdomain\.com$/ OK |
Create the virtual mailbox map — this catches all addresses and routes them to a single catchall mailbox:
| sudo nano /etc/postfix/virtual_mailbox |
| /^.+@.+\.yourdomain\.com$/ catchall/ |
Rebuild the aliases database (required to avoid a startup warning):
| newaliases |
Create the Virtual Mail User and Mailbox
Postfix delivers mail as a dedicated system user (vmail). Create the user, group, and mailbox directory:
| sudo groupadd -g 5000 vmailsudo useradd -u 5000 -g 5000 -d /var/mail/vhosts -s /sbin/nologin vmailsudo mkdir -p /var/mail/vhosts/catchallsudo chown -R vmail:vmail /var/mail/vhosts |
Configure Dovecot for POP3
Enable the POP3 protocol in the main Dovecot config:
| sudo nano /etc/dovecot/dovecot.conf |
| protocols = pop3 |
Set the mail location to the catchall Maildir:
| sudo nano /etc/dovecot/conf.d/10-mail.conf |
| mail_location = maildir:/var/mail/vhosts/catchall |
Allow plaintext authentication (suitable for internal/trusted use — see the TLS note at the end for public-facing deployments):
| sudo nano /etc/dovecot/conf.d/10-auth.conf |
| disable_plaintext_auth = noauth_mechanisms = plain login passdb { driver = passwd-file args = /etc/dovecot/users} userdb { driver = static args = uid=5000 gid=5000 home=/var/mail/vhosts/catchall} |
Create the Dovecot users file with your chosen credentials:
| sudo nano /etc/dovecot/users # Format: username:{PLAIN}passwordmailuser:{PLAIN}yourpasswordhere |
Start the Services
| sudo systemctl restart postfixsudo systemctl restart dovecot |
Verify Postfix is running:
| postfix status |
Check the mail log for any errors:
| tail -30 /var/log/mail.log |
Part 3 — Firewall Configuration
Cloud Firewall (Linode / AWS / equivalent)
Open the following inbound ports in your cloud provider’s firewall. On Linode this is found under Networking > Firewalls in the dashboard. Changes apply immediately with no reboot required.
| Port / Protocol | Purpose |
| 22 TCP | SSH (ensure this is always open) |
| 25 TCP | SMTP inbound (receiving mail) |
| 110 TCP | POP3 (retrieving mail) |
UFW on the Ubuntu Instance
| sudo ufw allow 22/tcpsudo ufw allow 25/tcpsudo ufw allow 110/tcpsudo ufw enablesudo ufw status |
Always confirm port 22 is allowed before enabling UFW to avoid locking yourself out of SSH.
Part 4 — Testing
Test SMTP Locally
From the server itself, connect to Postfix on port 25 and send a test message. Use 127.0.0.1 rather than localhost to avoid IPv6 connection issues:
| telnet 127.0.0.1 25 |
You should immediately see the greeting banner:
| 220 mail.yourdomain.com ESMTP Postfix |
Then send a test message interactively:
| EHLO test.comMAIL FROM:<test@test.com>RCPT TO:<anything@abc.yourdomain.com>DATASubject: Test mail Hello this is a test.QUIT |
Each step should return a 250 OK response. The RCPT TO line is the critical one — if the wildcard regexp is configured correctly, Postfix will accept any subdomain address. After QUIT, verify the mail landed in the mailbox:
| tail -20 /var/log/mail.logls -la /var/mail/vhosts/catchall/new/ |
You should see a file in the new/ directory — that is the email in Maildir format.
Test POP3 Locally
| telnet 127.0.0.1 110 |
Dovecot should respond with:
| +OK Dovecot (Ubuntu) ready. |
Then authenticate and list messages:
| USER mailuserPASS yourpasswordhereLISTRETR 1QUIT |
A successful LIST response showing message count confirms the full chain is working: inbound SMTP via Postfix, delivery to virtual Maildir, and POP3 retrieval via Dovecot.
Part 5 — Automatic Mail Purge
To automatically delete mail older than 24 hours, add a cron job:
| sudo crontab -e |
Add the following line:
| 0 * * * * find /var/mail/vhosts/catchall -type f -mmin +1440 -delete |
This runs every hour and removes any file in the catchall mailbox older than 1440 minutes (24 hours).
Optional — Silence the Backwards Compatibility Warning
Postfix logs a harmless warning about backwards-compatible default settings. To silence it:
| postconf compatibility_level=3.6postfix reload |
Security Notes
- Port 110 transmits credentials in plaintext. For any public-facing deployment, configure Dovecot with TLS and use POP3S on port 995 instead.
- The smtpd_relay_restrictions = permit_mynetworks, reject_unauth_destination setting prevents your server from acting as an open relay — do not remove this.
- Consider rate limiting inbound SMTP connections if the server is publicly accessible to reduce spam load.
- The vmail system user has no login shell (nologin) and cannot be used to access the system interactively.
Summary
With Postfix and Dovecot configured as described above, your server will:
- Accept inbound SMTP for any address on any subdomain of your domain
- Deliver all mail into a single catch-all Maildir with no per-account configuration
- Expose all received mail via POP3 using a single username and password
- Automatically purge mail older than 24 hours
- Require no restart or reconfiguration when new subdomains or addresses are used
How to Detect and Fix Squid Proxy Abuse
Running an open HTTP proxy on the internet, even temporarily for testing, can quickly attract unwanted attention. Within minutes of deploying an unsecured Squid proxy, malicious actors can discover and abuse it for scanning, attacks, or hiding their origin. Here’s how to spot the warning signs and lock down your proxy.
Symptoms of Proxy Abuse
1. Proxy Stops Working
The most obvious symptom is that your proxy simply stops responding to legitimate requests. Connections timeout or hang indefinitely, even though the Squid service appears to be running.
2. Cache File Descriptor Warnings
When checking the Squid service status, you see repeated warnings like:
WARNING: Your cache is running out of file descriptors
This occurs because the proxy is handling far more concurrent connections than expected for a small test server.
3. Service Shows Active But Unresponsive
The systemd status shows Squid as “active (running)” with normal startup messages, but actual proxy requests fail:
bash
$ sudo systemctl status squid● squid.service - Squid Web Proxy Server Active: active (running)
Yet when you try to use it:
bash
$ curl -x http://your-proxy:8888 https://example.comcurl: (28) Failed to connect to example.com port 443 after 21060ms
4. High Memory or CPU Usage
A small EC2 instance (t2.micro or t3.micro) that should be mostly idle shows elevated resource consumption.
How to Verify Proxy Abuse
Check the Access Logs
The quickest way to confirm abuse is to examine the Squid access log:
bash
sudo tail -100 /var/log/squid/access.log
What to look for:
A healthy proxy used only by you should show:
- One or two source IP addresses (yours)
- Requests to legitimate domains
- Occasional HTTPS CONNECT requests
An abused proxy will show:
- Dozens of different source IP addresses
- CONNECT requests to random IP addresses (not domain names)
- Strange ports: SSH (22), Telnet (23), email ports (25, 587, 993, 110), random high ports
- High frequency of requests
Example of Abuse
Here’s what an abused proxy log looks like:
1770200370.089 59842 172.234.115.25 TCP_TUNNEL/503 0 CONNECT 188.64.128.123:221770200370.089 59842 51.83.10.33 TCP_TUNNEL/503 0 CONNECT 188.64.132.53:4431770200370.089 59841 172.234.115.25 TCP_TUNNEL/503 0 CONNECT 188.64.128.4:221770200370.089 59332 185.90.61.84 TCP_TUNNEL/503 0 CONNECT 188.64.129.251:80001770200370.089 214 91.202.74.22 TCP_TUNNEL/503 0 CONNECT 188.64.129.143:231770200370.191 579 51.83.10.33 TCP_TUNNEL/200 39 CONNECT 188.64.128.4:80211770200370.235 11227 51.83.10.33 TCP_TUNNEL/200 176 CONNECT 188.64.131.66:587
Notice:
- Multiple unique source IPs
- Connections to SSH (port 22), Telnet (port 23), SMTP (port 587)
- Targets are raw IP addresses, not domain names
- Hundreds of requests per minute
This is classic behavior of attackers using your proxy to scan the internet for vulnerable services.
Count Unique IPs
To see how many different IPs are using your proxy:
bash
sudo awk '{print $3}' /var/log/squid/access.log | sort | uniq -c | sort -rn | head -20
If you see more than a handful of IPs (especially if you’re the only legitimate user), your proxy is being abused.
Check Current Connections
See active connections to your proxy port:
bash
sudo netstat -tn | grep :8888
A legitimate test proxy should have 0-2 active connections. Dozens of connections indicate abuse.
How to Fix It Immediately
1. Lock Down the AWS Security Group
The fastest fix is to restrict access at the network level:
Via AWS Console:
- Navigate to EC2 → Security Groups
- Select the security group attached to your proxy instance
- Click “Edit inbound rules”
- Find the rule for your proxy port (e.g., 8888)
- Change Source from
0.0.0.0/0to “My IP”- AWS will auto-detect and fill in your current public IP
- Click “Save rules”
The change takes effect immediately – no restart required.
2. Restart Squid to Kill Existing Connections
Even after locking down the security group, existing connections may persist:
bash
sudo systemctl restart squid
3. Clear the Logs
Start fresh to verify the abuse has stopped:
bash
# Stop Squidsudo systemctl stop squid# Clear logssudo truncate -s 0 /var/log/squid/access.logsudo truncate -s 0 /var/log/squid/cache.log# Clear cache if you're seeing file descriptor warningssudo rm -rf /var/spool/squid/*sudo squid -z# Restartsudo systemctl start squid
4. Verify It’s Fixed
Watch the log in real-time:
bash
sudo tail -f /var/log/squid/access.log
If tail -f just sits there with no output, that’s good – it means no requests are coming through.
Now test from your own machine:
bash
curl -x http://your-proxy-ip:8888 https://ifconfig.me
You should immediately see your request appear in the log, and nothing else.
Prevention Best Practices
For Testing Environments
- Always use IP whitelisting – Never expose a proxy to
0.0.0.0/0even for testing - Use non-standard ports – While not security through obscurity, it reduces automated scanning
- Set up authentication – Even basic auth is better than nothing
- Monitor logs – Check periodically for unexpected traffic
- Terminate when done – Don’t leave test infrastructure running
Minimal Squid Config with Authentication
For slightly better security, add basic authentication:
bash
# Install htpasswdsudo apt install apache2-utils# Create password filesudo htpasswd -c /etc/squid/passwords testuser# Edit squid.confsudo nano /etc/squid/squid.conf
Add these lines:
http_port 8888# Basic authenticationauth_param basic program /usr/lib/squid/basic_ncsa_auth /etc/squid/passwordsauth_param basic realm proxyacl authenticated proxy_auth REQUIREDhttp_access allow authenticatedhttp_access deny allcache deny all
Restart Squid and now clients must authenticate:
bash
curl -x http://testuser:password@your-proxy:8888 https://example.com
For Production
If you need a production proxy:
- Use a proper reverse proxy like nginx or HAProxy with TLS
- Implement OAuth or certificate-based authentication
- Use AWS PrivateLink or VPC peering instead of public exposure
- Enable detailed logging and monitoring
- Set up rate limiting
- Consider managed solutions like AWS API Gateway or CloudFront
Conclusion
Open proxies are magnets for abuse. Automated scanners continuously sweep the internet looking for misconfigured proxies to exploit. The symptoms are often subtle – file descriptor warnings, poor performance, or timeouts – but the fix is straightforward: restrict access to only trusted IP addresses at the network level.
For testing purposes, AWS Security Groups provide the perfect solution: instant IP-based access control with no performance overhead. Combined with monitoring the Squid access logs, you can quickly detect and eliminate abuse before it impacts your testing or incurs unexpected costs.
Remember: if you’re running a temporary test proxy, lock it down to your IP from the start. It only takes minutes for automated scanners to find and abuse an open proxy.
Key Takeaways:
- ✅ Always restrict proxy access via security groups/firewall rules
- ✅ Monitor access logs for unexpected IP addresses
- ✅ Watch for file descriptor warnings as an early sign of abuse
- ✅ Clear logs and restart after securing to verify the fix
- ✅ Terminate test infrastructure when finished to avoid ongoing costs
Benchmarking reCAPTCHA v3 Solver Services: Speed vs Quality Analysis

When implementing automated systems that need to solve reCAPTCHA v3 challenges, choosing the right solver service can significantly impact both your success rate and operational costs. We conducted a comprehensive benchmark test of five popular reCAPTCHA v3 solving services to compare their performance in terms of both speed and quality scores.
The Results
We tested five major captcha solving services: CapSolver, 2Captcha, AntiCaptcha, NextCaptcha, and DeathByCaptcha. Each service was evaluated both with and without residential proxy support (using Decodo residential proxies).
Speed Performance Rankings
Fastest to Slowest (without proxy):
- CapSolver – 3,383ms (3.4 seconds)
- NextCaptcha – 6,725ms (6.7 seconds)
- DeathByCaptcha – 16,212ms (16.2 seconds)
- AntiCaptcha – 17,069ms (17.1 seconds)
- 2Captcha – 36,149ms (36.1 seconds)
With residential proxy:
- CapSolver – 5,101ms (5.1 seconds)
- NextCaptcha – 10,875ms (10.9 seconds)
- DeathByCaptcha – 10,861ms (10.9 seconds)
- 2Captcha – 25,749ms (25.7 seconds)
- AntiCaptcha – Failed (task type not supported with proxy)
Quality Score Results
Here’s where the results become particularly interesting: all services that successfully completed the challenge returned identical scores of 0.10. This uniformly low score across all providers suggests we’re observing a fundamental characteristic of how these services interact with Google’s reCAPTCHA v3 system rather than differences in solver quality.
What Do These Results Tell Us?
1. The Score Mystery
A reCAPTCHA v3 score of 0.10 is at the very bottom of Google’s scoring range (0.0-1.0), indicating that Google’s system detected these tokens as very likely originating from bots. This consistent result across all five services reveals several important insights:
Why such low scores?
- reCAPTCHA v3 uses machine learning trained on actual site traffic patterns
- Without established traffic history, the system defaults to suspicious scores
- Commercial solver services are inherently detectable by Google’s sophisticated fingerprinting
- The test environment may lack the organic traffic patterns needed for v3 to generate higher scores
As mentioned in our research, CleanTalk found that reCAPTCHA v3 often returns consistent scores in test environments without production traffic. The system needs time to “learn” what normal traffic looks like for a given site before it can effectively differentiate between humans and bots.
2. Speed is the Real Differentiator
Since all services returned the same quality score, speed becomes the primary differentiator:
CapSolver emerged as the clear winner, solving challenges in just 3.4 seconds without proxy and 5.1 seconds with proxy. This represents a 10x speed advantage over the slowest service (2Captcha at 36 seconds).
NextCaptcha came in second place with respectable times of 6.7 seconds (no proxy) and 10.9 seconds (with proxy), making it a solid middle-ground option.
DeathByCaptcha and AntiCaptcha performed similarly at around 16-17 seconds without proxy, though AntiCaptcha failed to support proxy-based solving for this captcha type.
2Captcha was significantly slower at 36 seconds without proxy, though it did improve to 25.7 seconds with proxy enabled.
3. Proxy Support Variations
Proxy support proved inconsistent across services:
- Most services handled proxies well, with CapSolver, NextCaptcha, DeathByCaptcha, and 2Captcha all successfully completing challenges through residential proxies
- AntiCaptcha failed with proxy, returning an “ERROR_TASK_NOT_SUPPORTED” error, suggesting their proxy-based reCAPTCHA v3 implementation may have limitations
- Proxy impact on speed varied: Some services (2Captcha) were faster with proxy, while others (CapSolver, NextCaptcha) were slower
4. Success Rates
All services except AntiCaptcha (with proxy) achieved 100% success rates, meaning they reliably returned valid tokens. However, the validity of a token doesn’t correlate with its quality score—all tokens were valid but all received low scores from Google.
Practical Implications
For High-Volume Operations
If you’re processing thousands of captchas daily, CapSolver’s 3-5 second solve time provides a massive throughput advantage. At scale, this speed difference translates to:
- Processing 1,000 captchas with CapSolver: ~56 minutes
- Processing 1,000 captchas with 2Captcha: ~10 hours
For Quality-Sensitive Applications
The uniform 0.10 scores reveal a hard truth: commercial reCAPTCHA v3 solvers may not produce high-quality tokens that pass strict score thresholds. If your target site requires scores above 0.5 or 0.7, these services may not be suitable regardless of which one you choose.
Cost Considerations
Since all services returned the same quality, cost-per-solve becomes the tiebreaker alongside speed:
- CapSolver: ~$1.00 per 1,000 solves
- 2Captcha: ~$2.99 per 1,000 solves
- AntiCaptcha: ~$2.00 per 1,000 solves
CapSolver offers the best speed-to-cost ratio in this comparison.
The Bigger Picture: reCAPTCHA v3 Limitations
These results illuminate a broader challenge with reCAPTCHA v3 solver services. Google’s v3 system is fundamentally different from v2:
- v2 presented challenges that could be solved by humans or AI
- v3 analyzes behavior patterns, browser fingerprints, and site-specific traffic history
Commercial solvers can generate valid tokens, but those tokens carry telltale signatures that Google’s machine learning readily identifies. The consistently low scores suggest that Google has effective detection mechanisms for solver-generated traffic.
When Might Scores Improve?
Based on research and documentation:
- Production environments with real organic traffic may see better scores
- Time – letting reCAPTCHA v3 “train” on a site for days or weeks
- Mixed traffic – solver tokens mixed with legitimate user traffic
- Residential proxies – though our test showed this alone doesn’t improve scores
Conclusions and Recommendations
If Speed Matters Most
Choose CapSolver. Its 3-5 second solve times are unmatched, and at $1 per 1,000 solves, it’s also the most cost-effective option.
If You Need Proxy Support
Avoid AntiCaptcha for proxy-based v3 solving. CapSolver, NextCaptcha, and DeathByCaptcha all handled residential proxies successfully.
If Quality Scores Matter
Reconsider using solver services entirely. The uniform 0.10 scores suggest that commercial solvers may not be suitable for sites with strict score requirements. Consider alternative approaches:
- Browser automation with real user simulation
- Residential proxy networks with actual human solvers
- Challenging whether reCAPTCHA v3 is the right solution for your use case
The Bottom Line
For raw performance in a test environment, CapSolver dominated with the fastest solve times and lowest cost. However, the universal 0.10 quality scores across all services reveal that speed and cost may be moot points if your application requires high-quality scores that pass Google’s bot detection.
The real takeaway? reCAPTCHA v3 is doing its job—it successfully identifies solver-generated tokens regardless of which service you use. If you need high scores, you’ll need more sophisticated approaches than simply purchasing tokens from commercial solving services.
This benchmark was conducted in January 2026 using production API credentials for all services. Tests were performed with both direct connections and residential proxy infrastructure. Individual results may vary based on site configuration, traffic patterns, and Google’s evolving detection systems.
Migrating Google Cloud Run to Scaleway: Bringing Your Cloud Infrastructure Back to Europe
Introduction: Why European Cloud Sovereignty Matters Now More Than Ever

In an era of increasing geopolitical tensions, data sovereignty concerns, and evolving international relations, European companies are reconsidering their dependence on US-based cloud providers. The EU’s growing emphasis on digital sovereignty, combined with uncertainties around US data access laws like the CLOUD Act and recent political developments, has made many businesses uncomfortable with storing sensitive data on American infrastructure.
For EU-based companies running containerized workloads on Google Cloud Run, there’s good news: migrating to European alternatives like Scaleway is surprisingly straightforward. This guide will walk you through the technical process of moving your Cloud Run services to Scaleway’s Serverless Containers—keeping your applications running while bringing your infrastructure back under European jurisdiction.
Why Scaleway?
Scaleway, a French cloud provider founded in 1999, offers a compelling alternative to Google Cloud Run:
- 🇪🇺 100% European: All data centers located in France, Netherlands, and Poland
- 📜 GDPR Native: Built from the ground up with European data protection in mind
- 💰 Transparent Pricing: No hidden costs, generous free tiers, and competitive rates
- 🔒 Data Sovereignty: Your data never leaves EU jurisdiction
- ⚡ Scale-to-Zero: Just like Cloud Run, pay only for actual usage
- 🌱 Environmental Leadership: Strong commitment to sustainable cloud infrastructure
Most importantly: Scaleway Serverless Containers are technically equivalent to Google Cloud Run. Both are built on Knative, meaning your containers will run identically on both platforms.
Prerequisites
Before starting, ensure you have:
- An existing Google Cloud Run service
- Windows machine with PowerShell
gcloudCLI installed and authenticated- A Scaleway account (free to create)
- Skopeo installed (we’ll cover this)
Understanding the Architecture
Both Google Cloud Run and Scaleway Serverless Containers work the same way:
- You provide a container image
- The platform runs it on-demand via HTTPS endpoints
- It scales automatically (including to zero when idle)
- You pay only for execution time
The migration process is simply:
- Copy your container image from Google’s registry to Scaleway’s registry
- Deploy it as a Scaleway Serverless Container
- Update your DNS/endpoints
No code changes required—your existing .NET, Node.js, Python, Go, or any other containerized application works as-is.
Step 1: Install Skopeo (Lightweight Docker Alternative)
Since we’re on Windows and don’t want to run full Docker Desktop, we’ll use Skopeo—a lightweight tool designed specifically for copying container images between registries.
Install via winget:
powershell
winget install RedHat.Skopeo
Or download directly from: https://github.com/containers/skopeo/releases
Why Skopeo?
- No daemon required: No background services consuming resources
- Direct registry-to-registry transfer: Images never touch your local disk
- Minimal footprint: ~50MB vs. several GB for Docker Desktop
- Perfect for CI/CD: Designed for automation and registry operations
Configure Skopeo’s Trust Policy
Skopeo requires a policy file to determine which registries to trust. Create it:
powershell
# Create the config directoryNew-Item -ItemType Directory -Force -Path "$env:USERPROFILE\.config\containers"# Create a permissive policy that trusts all registries@"{ "default": [ { "type": "insecureAcceptAnything" } ], "transports": { "docker-daemon": { "": [{"type": "insecureAcceptAnything"}] } }}"@ | Out-File -FilePath "$env:USERPROFILE\.config\containers\policy.json" -Encoding utf8
For production environments, you might want a more restrictive policy that only trusts specific registries:
powershell
@"{ "default": [{"type": "reject"}], "transports": { "docker": { "gcr.io": [{"type": "insecureAcceptAnything"}], "europe-west2-docker.pkg.dev": [{"type": "insecureAcceptAnything"}], "rg.fr-par.scw.cloud": [{"type": "insecureAcceptAnything"}] } }}"@ | Out-File -FilePath "$env:USERPROFILE\.config\containers\policy.json" -Encoding utf8
Step 2: Find Your Cloud Run Container Image
Your Cloud Run service uses a specific container image. To find it:
Via gcloud CLI (recommended):
bash
gcloud run services describe YOUR-SERVICE-NAME \ --region=YOUR-REGION \ --project=YOUR-PROJECT \ --format='value(spec.template.spec.containers[0].image)'```This returns the full image URL, something like:```europe-west2-docker.pkg.dev/your-project/cloud-run-source-deploy/your-service@sha256:abc123...
Via Google Cloud Console:
- Navigate to Cloud Run in the console
- Click your service
- Go to the “Revisions” tab
- Look for “Container image URL”
The @sha256:... digest is important—it ensures you’re copying the exact image currently running in production.
Step 3: Set Up Scaleway Container Registry
Create a Scaleway Account
- Sign up at https://console.scaleway.com/
- Complete email verification
- Navigate to the console
Create a Container Registry Namespace
- Go to Containers → Container Registry
- Click Create namespace
- Choose a region (Paris, Amsterdam, or Warsaw)
- Important: Choose the same region where you’ll deploy your containers
- Enter a namespace name (e.g.,
my-containers,production)- Must be unique within that region
- Lowercase, numbers, and hyphens only
- Set Privacy to Private
- Click Create
Your registry URL will be: rg.fr-par.scw.cloud/your-namespace
Create API Credentials
- Click your profile → API Keys (or visit https://console.scaleway.com/iam/api-keys)
- Click Generate API Key
- Give it a name (e.g., “container-migration”)
- Save the Secret Key securely—it’s only shown once
- Note both the Access Key and Secret Key
Step 4: Copy Your Container Image
Now comes the magic—copying your container directly from Google to Scaleway without downloading it locally.
Authenticate and Copy:
powershell
# Set your Scaleway secret key as environment variable (more secure)$env:SCW_SECRET_KEY = "your-scaleway-secret-key-here"# Copy the image directly between registriesskopeo copy ` --src-creds="oauth2accesstoken:$(gcloud auth print-access-token)" ` --dest-creds="nologin:$env:SCW_SECRET_KEY" ` docker://europe-west2-docker.pkg.dev/your-project/cloud-run-source-deploy/your-service@sha256:abc123... ` docker://rg.fr-par.scw.cloud/your-namespace/your-service:latest```### What's Happening:- `--src-creds`: Authenticates with Google using your gcloud session- `--dest-creds`: Authenticates with Scaleway using your API key- Source URL: Your Google Artifact Registry image- Destination URL: Your Scaleway Container RegistryThe transfer happens directly between registries—your Windows machine just orchestrates it. Even a multi-GB container copies in minutes.### Verify the Copy:1. Go to https://console.scaleway.com/registry/namespaces2. Click your namespace3. You should see your service image listed with the `latest` tag## Step 5: Deploy to Scaleway Serverless Containers### Create a Serverless Container Namespace:1. Navigate to **Containers** → **Serverless Containers**2. Click **Create namespace**3. Choose the **same region** as your Container Registry4. Give it a name (e.g., `production-services`)5. Click **Create**### Deploy Your Container:1. Click **Create container**2. **Image source**: Select "Scaleway Container Registry"3. Choose your namespace and image4. **Configuration**: - **Port**: Set to the port your app listens on (usually 8080 for Cloud Run apps) - **Environment variables**: Copy any env vars from Cloud Run - **Resources**: - Memory: Start with what you used in Cloud Run - vCPU: 0.5-1 vCPU is typical - **Scaling**: - **Min scale**: `0` (enables scale-to-zero, just like Cloud Run) - **Max scale**: Set based on expected traffic (e.g., 10)5. Click **Deploy container**### Get Your Endpoint:After deployment (1-2 minutes), you'll receive an HTTPS endpoint:```https://your-container-namespace-xxxxx.functions.fnc.fr-par.scw.cloud
This is your public API endpoint—no API Gateway needed, SSL included for free.
Step 6: Test Your Service
powershell
# Test the endpointInvoke-WebRequest -Uri "https://your-container-url.functions.fnc.fr-par.scw.cloud/your-endpoint"
Your application should respond identically to how it did on Cloud Run.
Understanding the Cost Comparison
Google Cloud Run Pricing (Typical):
- vCPU: $0.00002400/vCPU-second
- Memory: $0.00000250/GB-second
- Requests: $0.40 per million
- Plus: API Gateway, Load Balancer, or other routing costs
Scaleway Serverless Containers:
- vCPU: €0.00001/vCPU-second (€1.00 per 100k vCPU-s)
- Memory: €0.000001/GB-second (€0.10 per 100k GB-s)
- Requests: Free (no per-request charges)
- HTTPS endpoint: Free (included)
- Free Tier: 200k vCPU-seconds + 400k GB-seconds per month
Example Calculation:
For an API handling 1 million requests/month, 200ms average response time, 1 vCPU, 2GB memory:
Google Cloud Run:
- vCPU: 1M × 0.2s × $0.000024 = $4.80
- Memory: 1M × 0.2s × 2GB × $0.0000025 = $1.00
- Requests: 1M × $0.0000004 = $0.40
- Total: ~$6.20/month
Scaleway:
- vCPU: 200k vCPU-s → Free (within free tier)
- Memory: 400k GB-s → Free (within free tier)
- Total: €0.00/month
Even beyond free tiers, Scaleway is typically 30-50% cheaper, with no surprise charges.
Key Differences to Be Aware Of
Similarities (Good News):
✅ Both use Knative under the hood ✅ Both support HTTP, HTTP/2, WebSocket, gRPC ✅ Both scale to zero automatically ✅ Both provide HTTPS endpoints ✅ Both support custom domains ✅ Both integrate with monitoring/logging
Differences:
- Cold start: Scaleway takes ~2-5 seconds (similar to Cloud Run)
- Idle timeout: Scaleway scales to zero after 15 minutes (vs. Cloud Run’s varies)
- Regions: Limited to EU (Paris, Amsterdam, Warsaw) vs. Google’s global presence
- Ecosystem: Smaller ecosystem than GCP (but rapidly growing)
When Scaleway Makes Sense:
- ✅ Your primary users/customers are in Europe
- ✅ GDPR compliance is critical
- ✅ You want to avoid US jurisdiction over your data
- ✅ You prefer transparent, predictable pricing
- ✅ You don’t need GCP-specific services (BigQuery, etc.)
When to Consider Carefully:
- ⚠️ You need global edge distribution (though you can use CDN)
- ⚠️ You’re heavily integrated with other GCP services
- ⚠️ You need GCP’s machine learning services
- ⚠️ Your customers are primarily in Asia/Americas
Additional Migration Considerations
Environment Variables and Secrets:
Scaleway offers Secret Manager integration. Copy your Cloud Run secrets:
- Go to Secret Manager in Scaleway
- Create secrets matching your Cloud Run environment variables
- Reference them in your container configuration
Custom Domains:
Both platforms support custom domains. In Scaleway:
- Go to your container settings
- Add custom domain
- Update your DNS CNAME to point to Scaleway’s endpoint
- SSL is handled automatically
Databases and Storage:
If you’re using Cloud SQL or Cloud Storage:
- Databases: Consider Scaleway’s Managed PostgreSQL/MySQL or Serverless SQL Database
- Object Storage: Scaleway Object Storage is S3-compatible
- Or: Keep using GCP services (cross-cloud is possible, but adds latency)
Monitoring and Logging:
Scaleway provides Cockpit (based on Grafana):
- Automatic logging for all Serverless Containers
- Pre-built dashboards
- Integration with alerts and metrics
- Similar to Cloud Logging/Monitoring
The Broader Picture: European Digital Sovereignty
This migration isn’t just about cost savings or technical features—it’s about control.
Why EU Companies Are Moving:
- Legal Protection: GDPR protections are stronger when data never leaves EU jurisdiction
- Political Risk: Reduces exposure to US government data requests under CLOUD Act
- Supply Chain Resilience: Diversification away from Big Tech dependency
- Supporting European Tech: Strengthens the European cloud ecosystem
- Future-Proofing: As digital sovereignty regulations increase, early movers are better positioned
The Economic Argument:
Every euro spent with European cloud providers:
- Stays in the European economy
- Supports European jobs and innovation
- Builds alternatives to US/Chinese tech dominance
- Strengthens Europe’s strategic autonomy
Conclusion: A Straightforward Path to Sovereignty
Migrating from Google Cloud Run to Scaleway Serverless Containers is technically simple—often taking just a few hours for a typical service. The containers are identical, the pricing is competitive, and the operational model is the same.
But beyond the technical benefits, there’s a strategic argument: as a European company, every infrastructure decision is a choice about where your data lives, who has access to it, and which ecosystem you’re supporting.
Scaleway (and other European cloud providers) aren’t perfect replacements for every GCP use case. But for containerized APIs and web services—which represent the majority of Cloud Run workloads—they’re absolutely production-ready alternatives that keep your infrastructure firmly within European jurisdiction.
In 2026’s geopolitical landscape, that’s not just a nice-to-have—it’s increasingly essential.
Resources
- Scaleway Serverless Containers: https://www.scaleway.com/en/serverless-containers/
- Scaleway Documentation: https://www.scaleway.com/en/docs/
- Skopeo Documentation: https://github.com/containers/skopeo
- European Cloud Providers: Research Scaleway, OVHcloud, Hetzner, and others
- EU Digital Sovereignty: European Commission digital strategy resources
Have you migrated your infrastructure back to Europe? Share your experience in the comments below.