Archive
The dangers of Parallel.ForEach(… , async (item)) in IIS
A single, trivial exception — one that your code already has a catch block for — shouldn’t be able to bring down your entire IIS web server. But it can, and it will, if you combine Parallel.ForEach with an async lambda. This post explains exactly why it happens, how to spot it in the Windows Event Log, and how to fix it permanently.
The Setup
You have a method that needs to perform the same async operation against multiple items — calling a set of external APIs, processing a batch of records, sending a collection of requests. You reach for Parallel.ForEach because it sounds like the right tool: parallel work, multiple items, run them all at once. You even add a try/catch inside the lambda because you’re being responsible. It looks like this:
Parallel.ForEach(items, async (item) =>{ try { var result = await ProcessItemAsync(item); lock (results) { results.Add(result); } } catch (ItemNotFoundException) { // item not found - fine, skip it } catch (Exception ex) { lock (errors) { errors.Add(ex); } }});
This looks safe. It has error handling. It uses async/await. It compiles without a single warning. And it will crash your IIS worker process (w3wp.exe) the moment any exception is thrown after an await.
Why It Crashes: The async void Trap
Parallel.ForEach was designed before async/await existed in C#. It expects a synchronous Action<T> delegate. When you pass it an async lambda, something subtle and dangerous happens: the compiler silently treats the lambda as returning void rather than Task.
This is the async void anti-pattern, and it has one devastating property: any exception thrown inside it cannot be caught by any caller. It escapes directly to the thread’s synchronisation context — and on a raw ThreadPool thread, that means it goes completely unhandled.
Here is the exact sequence of events that kills your server:
Parallel.ForEachfires the lambda for each item in the collection- Each lambda hits the first
awaitand suspends, returning control immediately Parallel.ForEachsees each lambda return (as void) and considers its job doneParallel.ForEachexits — the method returns to its caller — everything looks fine- Milliseconds later, the awaited operations complete and the continuations resume on raw ThreadPool threads
- An exception is thrown inside one of those continuations
- The
try/catchinside the lambda? It only catches exceptions thrown before the firstawait. After theawait, the lambda has already returned as far asParallel.ForEachis concerned - The exception has no owner, no observer, no catch block — it propagates to the ThreadPool itself
- In .NET 4.0 and later, an unhandled exception on a ThreadPool thread terminates the process
w3wp.execrashes. IIS restarts the application pool. All in-flight requests are lost
The particularly insidious part is that the try/catch gives you a false sense of security. You can see it right there in the code. But it doesn’t work the way you expect once an await is involved.
A Minimal Reproduction
You don’t need a complex codebase to reproduce this. The following is all it takes:
public static void CrashIIS(){ Parallel.ForEach(new[] { 1, 2, 3 }, async (item) => { await Task.Delay(100); // simulate any async I/O throw new Exception("This kills w3wp.exe"); // After the await, this runs on an orphaned ThreadPool thread // The process terminates }); // Parallel.ForEach has already returned here // The crash happens 100ms later}
Call that from any ASP.NET request handler — a controller action, an HttpHandler, anywhere — and your application pool will crash within moments. The caller gets no exception. The HTTP response may even succeed before the crash occurs. The next user to make any request gets a 503.
Even wrapping the call in a try/catch at the call site doesn’t help:
try{ Parallel.ForEach(new[] { 1, 2, 3 }, async (item) => { await Task.Delay(100); throw new Exception("Crash"); });}catch (Exception ex){ // This NEVER fires. // The exception doesn't happen until after Parallel.ForEach // has already exited this try block entirely. Log(ex);}
The catch block is long gone by the time the exception is thrown. This is what makes the pattern so dangerous — it looks exception-safe at every level, and isn’t.
How It Appears in the Windows Event Log
When this crash occurs, it leaves a very specific fingerprint in the Windows Event Log. Open Event Viewer → Windows Logs → Application and look for two entries appearing within seconds of each other.
Entry 1: .NET Runtime — Unhandled Exception
Source: .NET Runtime
Event ID: 1026
Application: w3wp.exeFramework Version: v4.0.30319Description: The process was terminated due to an unhandled exception.Exception Info: YourNamespace.YourException at YourClass.YourMethod() at SomeClass+<SomeMethod>d__3.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task) at SomeClass+<>c__DisplayClass4_0+<<YourParallelMethod>b__0>d.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Threading.ExecutionContext.RunInternal(...) at System.Threading.ExecutionContext.Run(...) at System.Threading.QueueUserWorkItemCallback.ExecuteWorkItem() at System.Threading.ThreadPoolWorkQueue.Dispatch()
The key indicators are at the bottom of the stack trace:
QueueUserWorkItemCallback.ExecuteWorkItem()ThreadPoolWorkQueue.Dispatch()
These tell you the exception surfaced on a raw ThreadPool work item with no managed owner — the classic signature of an orphaned async continuation. You will also see compiler-generated state machine names like <YourParallelMethod>b__0>d.MoveNext() in the trace, confirming the exception came from inside an async lambda. The angle brackets and the b__ notation are the C# compiler’s naming convention for anonymous methods and lambdas.
Entry 2: Application Error — w3wp.exe Fault
Source: Application Error
Event ID: 1000
Faulting application name: w3wp.exeFaulting module name: KERNELBASE.dllException code: 0xe0434352
Exception code 0xe0434352 is the Windows error code for a managed (.NET) exception that has escaped to the Win32 layer. It’s the OS-level record of a .NET exception killing a process. When you see this code combined with KERNELBASE.dll as the faulting module, a .NET unhandled exception is the cause.
What to Look For — Summary
| Signal | Where | What it means |
|---|---|---|
ThreadPoolWorkQueue.Dispatch() at bottom of stack | Event ID 1026, .NET Runtime | Exception from orphaned async continuation |
Compiler-generated names like b__0>d.MoveNext() | Event ID 1026, .NET Runtime | Exception came from inside an async lambda |
Exception code 0xe0434352 | Event ID 1000, Application Error | .NET exception killed the process |
Faulting module: KERNELBASE.dll | Event ID 1000, Application Error | Managed exception, not a native crash |
| Both entries within seconds of each other | Application log | Single event caused immediate process termination |
The Effect on IIS
When w3wp.exe terminates due to an unhandled exception, IIS detects the process death and marks the application pool as faulted. Depending on your Rapid Fail Protection settings (found in IIS Manager → Application Pools → Advanced Settings), IIS will either:
- Restart the worker process automatically — users experience a brief outage and then service resumes, with the first request after restart being slow due to application warm-up
- Disable the application pool if failures occur too frequently within the Rapid Fail Protection window (default: 5 failures in 5 minutes) — this results in a persistent 503 until an administrator manually starts the pool again
This is worth understanding because thread exhaustion and this crash pattern look identical from the outside — both produce 503 errors — but they behave very differently. Thread exhaustion self-recovers when load drops. A crashed application pool requires either automatic restart (if Rapid Fail Protection hasn’t tripped) or manual intervention. If your team is regularly performing IISResets to recover from outages, a crash like this is a more likely culprit than thread exhaustion.
The Fix
The correct replacement for Parallel.ForEach with async work is Task.WhenAll, which is async-native and properly propagates exceptions back to the awaiting caller:
public static async Task<IReadOnlyList<Result>> ProcessAllAsync(IEnumerable<Item> items){ var tasks = items.Select(async item => { try { return await ProcessItemAsync(item); } catch (ItemNotFoundException) { return Result.Empty; } }); // All items processed in parallel. // Exceptions surface here, as AggregateException, to a proper awaiter. return await Task.WhenAll(tasks);}
With Task.WhenAll:
- All items are processed in parallel — no performance regression
- Every async continuation is properly tracked by the Task infrastructure
- Exceptions are collected and re-thrown as
AggregateExceptionwhen awaited — to a caller that can handle them - The process does not terminate
As an immediate safety net while refactoring, you can also add a global handler in Global.asax that prevents process termination from unobserved task exceptions:
// In Application_Start (Global.asax)TaskScheduler.UnobservedTaskException += (sender, args) =>{ Logger.Error("Unobserved task exception", args.Exception); args.SetObserved(); // prevents process termination};
This is a safety net, not a fix — the underlying orphaned tasks still exist and their results are still lost. But it prevents a single unhandled background exception from taking down your entire server while you work through a proper refactor.
The Rule
The rule to remember is simple: never pass an async lambda to Parallel.ForEach. The two are fundamentally incompatible. Parallel.ForEach has no understanding of Task, does not await the work it fires, and any exception thrown after the first await inside your lambda will be orphaned on the ThreadPool. In .NET 4.0 and later, that means process termination.
The pattern is particularly easy to introduce because it compiles cleanly, looks reasonable, and even appears to have proper error handling. The only sign something is wrong is your server going down.
When you need parallel async work, use Task.WhenAll. It was designed for exactly this purpose.
Found this useful? If you’re diagnosing IIS instability, check your application pool’s Rapid Fail Protection settings and review Event Viewer’s Application log for Event ID 1026 with ThreadPoolWorkQueue.Dispatch() at the bottom of the stack trace — that’s the fingerprint that points directly to this pattern.
Taiwan motorcycle plate lookup via #API

Our vehicle API network expands into Taiwan with full motorcycle registration lookups — make, age, engine size, and emissions test history, all in one call.
Taiwan has one of the highest motorcycle densities in the world, with over 14 million registered two-wheelers on its roads. Today, we’re pleased to announce that the /CheckTaiwan endpoint is live — bringing motorcycle registration data from Taiwan’s national vehicle database into our global API network.
What the endpoint returns
A single call to /CheckTaiwan with a Taiwanese plate number returns structured vehicle data covering identity, registration history, and emissions compliance records:
- Make — Manufacturer name (Chinese & romanised)
- Age / Registration year — Year of manufacture and license issue date
- Engine size — Displacement in cc and engine cycle type
- Inspection records — Full emissions test history with HC, CO, and CO₂ readings
Sample lookup: MWN-0076
Here’s what a real response looks like for a 2018 Kymco (光陽) motorcycle:
{ "Description": "光陽", "RegistrationYear": "2018", "CarMake": { "CurrentTextValue": "光陽" }, "EngineSize": "149", "ManufactureDate": "01/08/2018", "LicenseIssueDate": "05/09/2018", "EngineCycle": "四行程", "TestRecords": [ { "LicensePlate": "MWN-0076", "InspectionType": "定期檢驗", "HC_ppm": "102", "CO_pct": "0.1", "CO2_pct": "14.8", "Result": "合格", "TestDate": "20240822" }, { "LicensePlate": "MWN-0076", "InspectionType": "定期檢驗", "HC_ppm": "315", "CO_pct": "0.1", "CO2_pct": "14.9", "Result": "合格", "TestDate": "20240718" } ]}
The TestRecords array is particularly valuable — it provides a full chronological emissions test history, with pass/fail status (合格 = pass), hydrocarbon and carbon monoxide readings, and the serial number of each inspection. This supports fleet compliance monitoring, insurance underwriting, and second-hand vehicle verification use cases.
API endpoint
The endpoint is live now at:
https://www.chepaiapi.tw/api/reg.asmx?op=CheckTaiwan
Full documentation and interactive testing are available at chepaiapi.tw.
Expanding Chinese-language coverage
This launch also deepens our Chinese-language vehicle data coverage. Alongside Taiwan, our mainland China vehicle lookup service at chepaiapi.cn continues to serve customers requiring PRC plate data — together forming a comprehensive Chinese-language API offering across both sides of the strait.
Use cases
The Taiwan endpoint is well-suited to:
- Insurers pricing two-wheeler policies
- Logistics platforms operating scooter fleets
- Used vehicle marketplaces verifying provenance
- KYC and compliance workflows touching Taiwanese vehicle assets
The inclusion of emissions test records is a differentiator that goes beyond simple registration confirmation — providing genuine due diligence depth for any platform that needs it.
Taiwan joins our network of 55+ country vehicle lookup APIs. We’ll continue expanding coverage across Asia Pacific throughout 2026.
Batch AI Processing: Why Multithreading is the Wrong Instinct
When developers first encounter a large-scale AI classification job — say, two million records that each need to be sent to an LLM for analysis — the instinct is immediately familiar: spin up threads, parallelise the work, saturate the API. It’s the same pattern that works for database processing, file I/O, HTTP scraping. More threads, more throughput.
With LLM APIs, that instinct leads you straight into a wall. And the wall has a name: TPM.
The Problem with Multithreading LLM Calls
Most LLM APIs — OpenAI included — impose a Tokens Per Minute (TPM) limit. This is a rolling window, not a per-request limit. Every token you send in a prompt, and every token the model returns, counts against it.
The naive multithreaded approach burns through this budget in a way that’s both wasteful and hard to control:
The system prompt repeats on every request. If your prompt is 700 tokens and you’re running 20 threads firing one request each, you’re spending 14,000 tokens per second just on prompt overhead — before the model has classified a single record. With a 200,000 TPM limit, you’ve consumed 4.2 minutes of budget in one second.
Burst behaviour triggers rate limits unpredictably. The TPM limit is a rolling window. Twenty threads firing simultaneously create a spike that can exceed the per-minute budget in seconds, even if your average rate would be well within limits. The API returns 429 errors, your retry logic kicks in, those retries themselves consume tokens, and the situation compounds.
Thread count is a blunt instrument. Dialling concurrency up and down doesn’t map cleanly to token consumption because request latency varies. A batch that takes 500ms doesn’t consume the same tokens as one that takes 1,500ms, but both hold a thread slot for their duration.
The Better Model: Semantic Batching
The insight that changes everything is this: the system prompt is a fixed overhead, and you should amortise it across as many classifications as possible per API call.
Instead of:
Thread 1: [system prompt 700 tokens] + [address 1: 15 tokens] → [result: 15 tokens]Thread 2: [system prompt 700 tokens] + [address 2: 15 tokens] → [result: 15 tokens]...× 20 threadsTotal: 14,000 tokens for 20 classifications
You send:
[system prompt 700 tokens] + [addresses 1-20: 300 tokens] → [results 1-20: 100 tokens]Total: 1,100 tokens for 20 classifications
That’s a 12× reduction in token consumption for the same work. Suddenly your 200,000 TPM budget — which could only sustain ~270 single-record requests per minute — supports ~3,600 classifications per minute. No extra threads needed.
Key Implementation Details
1. Include an ID in Both Request and Response
The most important correctness detail in batch processing is never rely on positional alignment.
If you send 20 addresses and ask the model to return 20 results, it might return 19. Now you don’t know which one it dropped. If you’re matching by position, records from item 7 onwards get silently misclassified.
The fix is to include a unique identifier in both directions:
User message:id=548033: product Xid=548034: product Y...System prompt format instruction:Reply ONLY with a JSON array. Format: [{"id":548033,"c":"E"}, ...]
Now you build a dictionary from the response keyed on id, and match each input item explicitly. A missing id means that specific record gets skipped and retried on the next run. Everything else classifies correctly regardless of what the model dropped.
2. Resolve Labels Locally
The model doesn’t need to return the full label text. "Prime City Professionals" costs tokens on every response item. A single letter costs one token.
Keep a static dictionary in your code:
csharp
private static readonly Dictionary<string, string> Labels = new(){ { "A", "Prime Product" }, { "B", "Budget Product" }, // ...};
The model returns "c":"A", you look up the label locally. This also eliminates a class of hallucination errors where the model invents a label name slightly different from your taxonomy.
Note: even "category" vs "c" matters at scale. In the OpenAI tokenizer, "category" is 3 tokens; "c" is 1. Across 100,000 batch calls, that’s 200,000 tokens — small but free.
3. Track TPM with a Rolling Window, Not Concurrency
Rather than trying to infer safe concurrency from trial and error, measure what you’re actually consuming and throttle directly on that signal.
csharp
// On each successful response, record tokens used with a timestamptokenWindow.Enqueue((DateTime.UtcNow, inputTokens + outputTokens));// Before each request, prune entries older than 60 seconds and sum the restvar cutoff = DateTime.UtcNow.AddSeconds(-60);while (window.Peek().t < cutoff) window.Dequeue();long tpmUsed = window.Sum(x => x.tok);// Throttle graduated to usageif (tpmUsed > tpmLimit * 0.98) Thread.Sleep(2000);else if (tpmUsed > tpmLimit * 0.95) Thread.Sleep(800);else if (tpmUsed > tpmLimit * 0.85) Thread.Sleep(300);
This gives you automatic, self-correcting throttling that responds to real consumption rather than guessing from thread counts. If a batch of records happens to have longer addresses, the window fills faster and the delay kicks in sooner. No manual tuning required.
4. Resumability via Cursor Pagination
For a job that takes hours or days, stopping and restarting must be safe and cheap. The key is two things working together:
Write results immediately after each batch, not at the end of a page. If you crash mid-page, you’ve lost one batch (20 records), not a thousand.
Use a NULL-check filter combined with cursor pagination. The query for unclassified records looks like:
sql
WHERE segment_category IS NULL AND id > {lastId} ORDER BY id LIMIT 1000
On restart, lastId resets to 0, but the IS NULL filter automatically skips everything already classified. The cursor (id > lastId) keeps the query fast on large tables — OFFSET pagination slows to a crawl at millions of rows because the database still has to scan all preceding rows to find the offset position.
5. Handle Partial Batches Gracefully with Skip vs Error
Not all failures are equal. Distinguish between:
- Error: something went wrong that warrants logging (HTTP 500, persistent 429 after retries, DB connection failure). These need attention.
- Skip: the record wasn’t returned in this batch response. Leave it NULL in the database, it will be picked up automatically on the next run. No log noise needed.
This distinction keeps your error output meaningful. If every missing batch item logs as an error, a run with 0.1% skip rate produces thousands of error lines that mask real problems.
The Result
What started as a job estimated at 16–67 days with a naive multithreaded approach settled to around 7 hours using semantic batching — processing two million records through a rate-limited API without a single configuration change to the API account.
The throughput improvement didn’t come from more concurrency. It came from being smarter about what gets sent in each request.
The general principle applies beyond LLM classification: whenever you have a fixed overhead per API call (authentication, context, schema), the correct optimisation is to amortise that overhead across as much work as possible per call, not to fire more calls in parallel.
Summary of Patterns
| Pattern | Naive approach | Better approach |
|---|---|---|
| Throughput | More threads | Larger batches |
| Rate limiting | Catch 429, retry | Track TPM rolling window, throttle proactively |
| Result matching | Positional array index | ID-keyed dictionary |
| Label resolution | Ask model for full text | Return code, resolve locally |
| Resumability | Track page offset | NULL-check filter + cursor pagination |
| Failure handling | All failures are errors | Skip vs Error distinction |
| DB resilience | Crash on connection drop | Exponential backoff retry |
The instinct to parallelise is correct in principle — you want to keep the API busy. But with token-limited LLM APIs, the right parallelism is within a single request, not across many simultaneous ones.