Batch AI Processing: Why Multithreading is the Wrong Instinct

When developers first encounter a large-scale AI classification job — say, two million records that each need to be sent to an LLM for analysis — the instinct is immediately familiar: spin up threads, parallelise the work, saturate the API. It’s the same pattern that works for database processing, file I/O, HTTP scraping. More threads, more throughput.

With LLM APIs, that instinct leads you straight into a wall. And the wall has a name: TPM.


The Problem with Multithreading LLM Calls

Most LLM APIs — OpenAI included — impose a Tokens Per Minute (TPM) limit. This is a rolling window, not a per-request limit. Every token you send in a prompt, and every token the model returns, counts against it.

The naive multithreaded approach burns through this budget in a way that’s both wasteful and hard to control:

The system prompt repeats on every request. If your prompt is 700 tokens and you’re running 20 threads firing one request each, you’re spending 14,000 tokens per second just on prompt overhead — before the model has classified a single record. With a 200,000 TPM limit, you’ve consumed 4.2 minutes of budget in one second.

Burst behaviour triggers rate limits unpredictably. The TPM limit is a rolling window. Twenty threads firing simultaneously create a spike that can exceed the per-minute budget in seconds, even if your average rate would be well within limits. The API returns 429 errors, your retry logic kicks in, those retries themselves consume tokens, and the situation compounds.

Thread count is a blunt instrument. Dialling concurrency up and down doesn’t map cleanly to token consumption because request latency varies. A batch that takes 500ms doesn’t consume the same tokens as one that takes 1,500ms, but both hold a thread slot for their duration.


The Better Model: Semantic Batching

The insight that changes everything is this: the system prompt is a fixed overhead, and you should amortise it across as many classifications as possible per API call.

Instead of:

Thread 1: [system prompt 700 tokens] + [address 1: 15 tokens] → [result: 15 tokens]
Thread 2: [system prompt 700 tokens] + [address 2: 15 tokens] → [result: 15 tokens]
...× 20 threads
Total: 14,000 tokens for 20 classifications

You send:

[system prompt 700 tokens] + [addresses 1-20: 300 tokens] → [results 1-20: 100 tokens]
Total: 1,100 tokens for 20 classifications

That’s a 12× reduction in token consumption for the same work. Suddenly your 200,000 TPM budget — which could only sustain ~270 single-record requests per minute — supports ~3,600 classifications per minute. No extra threads needed.


Key Implementation Details

1. Include an ID in Both Request and Response

The most important correctness detail in batch processing is never rely on positional alignment.

If you send 20 addresses and ask the model to return 20 results, it might return 19. Now you don’t know which one it dropped. If you’re matching by position, records from item 7 onwards get silently misclassified.

The fix is to include a unique identifier in both directions:

User message:
id=548033: product X
id=548034: product Y
...
System prompt format instruction:
Reply ONLY with a JSON array. Format: [{"id":548033,"c":"E"}, ...]

Now you build a dictionary from the response keyed on id, and match each input item explicitly. A missing id means that specific record gets skipped and retried on the next run. Everything else classifies correctly regardless of what the model dropped.

2. Resolve Labels Locally

The model doesn’t need to return the full label text. "Prime City Professionals" costs tokens on every response item. A single letter costs one token.

Keep a static dictionary in your code:

csharp

private static readonly Dictionary<string, string> Labels = new()
{
{ "A", "Prime Product" },
{ "B", "Budget Product" },
// ...
};

The model returns "c":"A", you look up the label locally. This also eliminates a class of hallucination errors where the model invents a label name slightly different from your taxonomy.

Note: even "category" vs "c" matters at scale. In the OpenAI tokenizer, "category" is 3 tokens; "c" is 1. Across 100,000 batch calls, that’s 200,000 tokens — small but free.

3. Track TPM with a Rolling Window, Not Concurrency

Rather than trying to infer safe concurrency from trial and error, measure what you’re actually consuming and throttle directly on that signal.

csharp

// On each successful response, record tokens used with a timestamp
tokenWindow.Enqueue((DateTime.UtcNow, inputTokens + outputTokens));
// Before each request, prune entries older than 60 seconds and sum the rest
var cutoff = DateTime.UtcNow.AddSeconds(-60);
while (window.Peek().t < cutoff) window.Dequeue();
long tpmUsed = window.Sum(x => x.tok);
// Throttle graduated to usage
if (tpmUsed > tpmLimit * 0.98) Thread.Sleep(2000);
else if (tpmUsed > tpmLimit * 0.95) Thread.Sleep(800);
else if (tpmUsed > tpmLimit * 0.85) Thread.Sleep(300);

This gives you automatic, self-correcting throttling that responds to real consumption rather than guessing from thread counts. If a batch of records happens to have longer addresses, the window fills faster and the delay kicks in sooner. No manual tuning required.

4. Resumability via Cursor Pagination

For a job that takes hours or days, stopping and restarting must be safe and cheap. The key is two things working together:

Write results immediately after each batch, not at the end of a page. If you crash mid-page, you’ve lost one batch (20 records), not a thousand.

Use a NULL-check filter combined with cursor pagination. The query for unclassified records looks like:

sql

WHERE segment_category IS NULL AND id > {lastId} ORDER BY id LIMIT 1000

On restart, lastId resets to 0, but the IS NULL filter automatically skips everything already classified. The cursor (id > lastId) keeps the query fast on large tables — OFFSET pagination slows to a crawl at millions of rows because the database still has to scan all preceding rows to find the offset position.

5. Handle Partial Batches Gracefully with Skip vs Error

Not all failures are equal. Distinguish between:

  • Error: something went wrong that warrants logging (HTTP 500, persistent 429 after retries, DB connection failure). These need attention.
  • Skip: the record wasn’t returned in this batch response. Leave it NULL in the database, it will be picked up automatically on the next run. No log noise needed.

This distinction keeps your error output meaningful. If every missing batch item logs as an error, a run with 0.1% skip rate produces thousands of error lines that mask real problems.


The Result

What started as a job estimated at 16–67 days with a naive multithreaded approach settled to around 7 hours using semantic batching — processing two million records through a rate-limited API without a single configuration change to the API account.

The throughput improvement didn’t come from more concurrency. It came from being smarter about what gets sent in each request.

The general principle applies beyond LLM classification: whenever you have a fixed overhead per API call (authentication, context, schema), the correct optimisation is to amortise that overhead across as much work as possible per call, not to fire more calls in parallel.


Summary of Patterns

PatternNaive approachBetter approach
ThroughputMore threadsLarger batches
Rate limitingCatch 429, retryTrack TPM rolling window, throttle proactively
Result matchingPositional array indexID-keyed dictionary
Label resolutionAsk model for full textReturn code, resolve locally
ResumabilityTrack page offsetNULL-check filter + cursor pagination
Failure handlingAll failures are errorsSkip vs Error distinction
DB resilienceCrash on connection dropExponential backoff retry

The instinct to parallelise is correct in principle — you want to keep the API busy. But with token-limited LLM APIs, the right parallelism is within a single request, not across many simultaneous ones.

Fixing Chrome’s “Aw, Snap!” STATUS_ACCESS_VIOLATION in CDP Automation

How a race condition between page navigation and JavaScript execution causes STATUS_ACCESS_VIOLATION crashes — and how to fix it properly.

C# · .NET·Puppeteer / CDP·Chrome Automation 💥

😬 Aw, Snap! Something went wrong displaying this page.

Chrome’s infamous crash page — and the Windows error behind it: STATUS_ACCESS_VIOLATION (0xC0000005). If you’re automating Chrome via CDP and seeing this, a race condition is likely the culprit.

If you’ve built a browser automation system using the Chrome DevTools Protocol — whether through Puppeteer, Playwright, or a custom CDP client — you may have encountered Chrome processes dying with a STATUS_ACCESS_VIOLATION error. The browser just vanishes. No clean exception, no useful log output. Just Chrome’s “Aw, Snap!” page, or worse, a completely dead process.

This error code (0xC0000005 on Windows) means the process attempted to read or write memory it doesn’t own. It’s a hard native crash, well below the level where .NET exception handling can help you. A try/catch around your CDP call won’t save you.

The Usual Suspects

Most writeups on this error point to GPU driver issues, sandbox misconfiguration, or DLL injection from antivirus software — and those are all valid causes. The standard advice is to throw flags like --disable-gpu, --no-sandbox, and --disable-dev-shm-usage at the problem until it goes away.

But there’s another cause that gets far less attention: injecting JavaScript into a page that’s mid-navigation. This is a timing issue, and it’s surprisingly easy to introduce.

The Race Condition

Consider a common automation pattern: click a button, wait for the resulting page to load, then execute some JavaScript on the new page. A naive implementation might look like this:

problematic// Click a button that triggers navigation
await ExecuteJavascript("document.querySelector('button').click();");

// Arbitrary delay, then poll readyState
await Task.Delay(1000);
while (true)
{
    await Task.Delay(500);
    var readyState = await ExecuteJavascript("document.readyState");
    if (readyState == "complete") break;
}

This pattern has a fundamental flaw. After clicking the button, Chrome begins tearing down the current document and loading a new one. There is a window — however brief — where the old document is gone but the new one isn’t yet attached. If ExecuteJavascript fires a CDP Runtime.evaluate command into that gap, Chrome is asked to execute JavaScript in a context that no longer exists.

The result isn’t a clean error. Chrome’s internal state becomes inconsistent, and the access violation follows.

Why does it only crash sometimes? Because the race condition is non-deterministic. On a fast machine or a fast network, the new page loads before the poll fires and everything works. On a slower day, the poll lands in the gap and Chrome crashes. This makes the bug look intermittent and hardware-dependent, when it’s actually a logic error.

What the Timeline Looks Like

Button click dispatched

CDP sends Runtime.evaluate with the click expression. Navigation begins.

⚠ Danger zone begins

Old document is being torn down. New document not yet attached to the frame.

document.readyState polled

If Runtime.evaluate fires here, Chrome has no valid document context to evaluate against.

STATUS_ACCESS_VIOLATION

Chrome dereferences a null or freed pointer. Process dies. “Aw, Snap!”

New document attached (if we were lucky)

If the poll happened to land here instead, readyState returns “loading” or “complete” and everything works fine.

The Fix: Let Chrome Tell You

The correct solution is to stop guessing when navigation is complete and instead subscribe to Chrome’s own navigation events. CDP exposes Page.loadEventFired precisely for this purpose — it fires when the new page’s load event has completed, meaning the document is fully attached and ready for JavaScript execution.

fixedprivate async Task WaitForPageLoad(ChromeSession session, int timeoutMs = 30000)
{
    var tcs = new TaskCompletionSource<bool>();

    session.Subscribe<LoadEventFiredEvent>(e => tcs.TrySetResult(true));

    var timeoutTask = Task.Delay(timeoutMs);
    var completed = await Task.WhenAny(tcs.Task, timeoutTask);

    if (completed == timeoutTask)
        throw new TimeoutException("Page load timed out");
}

Critically, the subscription must be set up before triggering navigation — not after. Otherwise there’s a small window where the event fires before you’re listening:

usage// Subscribe FIRST, then trigger navigation
var pageLoad = WaitForPageLoad(chromeSession);

await ExecuteJavascript("document.querySelector('button').click();");

// Now wait — Chrome will signal when it's actually ready
await pageLoad;

// Safe to execute JavaScript on the new page
await ExecuteJavascript("/* your code here */");

Why Not Just Catch the Exception?

A STATUS_ACCESS_VIOLATION is a native Windows exception originating inside the Chrome process itself. It is entirely outside the .NET runtime. Wrapping your CDP calls in try/catch does nothing — there is no managed exception to catch. The Chrome process simply dies.

Similarly, adding more Task.Delay calls doesn’t fix the race condition — it just makes it less likely to trigger on any given run, while leaving the underlying problem completely intact.

Applies to Puppeteer and Other CDP Clients Too

This issue isn’t specific to C# or any particular CDP library. The same race condition can occur in Node.js with Puppeteer, Python with pyppeteer, or any system that drives Chrome via the DevTools Protocol. Puppeteer’s page.waitForNavigation() and page.waitForLoadState() exist precisely to solve this problem — they’re wrappers around the same loadEventFired event.

If you’re rolling a custom CDP client in any language, the principle is the same: never rely on arbitrary delays or polling to determine when a page is ready for JavaScript execution. Subscribe to Page.loadEventFired or Page.frameStoppedLoading, and let Chrome do the signalling.


Summary

Root cause: JavaScript injected via CDP (Runtime.evaluate) during a page transition hits Chrome in an inconsistent internal state, causing a STATUS_ACCESS_VIOLATION native crash.

Why it’s intermittent: The race condition depends on timing — fast loads mask it, slow loads expose it.

The fix: Subscribe to Page.loadEventFiredbefore triggering navigation, and await the event before executing any JavaScript. Never use Task.Delay or document.readyState polling as a substitute for proper navigation events. Chrome DevTools Protocol · Browser Automation · STATUS_ACCESS_VIOLATION

Categories: Uncategorized Tags: , , ,

Cracking the Code: Estimating a Car’s Age from Its Argentine Licence Plate

Technical Deep-dive · Vehicle Data

How sequential plate issuance, a little combinatorics, and 200,000 training records let you estimate a registration year from seven characters.By Infinite Loop Development  ·  March 2026

Try it live

The techniques described in this post are implemented in ar.matriculaapi.com — an API that returns full vehicle details for any Argentine licence plate, including an estimated registration year for plates where the exact date is unknown.

Every Argentine licence plate encodes its own approximate birth year. You just have to know how to read it.

Argentina has issued plates in two distinct sequential formats over the past three decades. Because they are allocated in strict national order, a plate’s position in that sequence maps — with surprising precision — to the year the vehicle was registered. This post explains the technique: the encoding, the boundary estimation, and the confidence model.


Two Formats, One Principle

Argentina uses two plate formats, each covering a different era.

OZY040

Pre-Mercosur · ABC123≈ 1990 – 2016

AC875MD

Mercosur · AB123CD2016 – present

The Pre-Mercosur format (ABC123) consists of three letters followed by three digits. It was Argentina’s standard from the early 1990s through approximately 2016, when the country transitioned to the regional Mercosur standard.

The Mercosur format (AB123CD) uses two letters, three digits, then two more letters. It began with AA000AA and has been incrementing steadily ever since, shared across the Mercosur bloc — which is why plates from Brazil, Uruguay, and Paraguay share the same structure.

The critical insight is that both formats were issued sequentially at a national level. A plate allocated in 2019 will always have a higher sequence number than one from 2018. This makes year estimation a matter of finding where a given plate falls in the sequence.


Encoding a Plate to a Single Integer

To compare plates across their sequence, we convert each plate to a single integer using mixed-radix encoding — the same idea as a number system that switches base partway through.

Mercosur encoding

The Mercosur plate AB123CD has four alphabetic components (each 0–25) and one numeric component (0–999). Treating letters as base-26 and the number as base-1000:

Python

def encode_mercosur(plate: str) -> int:
    """
    Encode an AB123CD Mercosur plate to a sequence integer.
    AA000AA = 0, AA000AB = 1, ... AZ999ZZ = 17,575,999, BA000AA = 17,576,000 ...
    """
    p = plate.upper().replace(" ", "").replace("-", "")
    assert len(p) == 7, "Mercosur plates are 7 characters"

    l1 = ord(p[0]) - ord('A')  # 0-25
    l2 = ord(p[1]) - ord('A')  # 0-25
    n  = int(p[2:5])             # 0-999
    l3 = ord(p[5]) - ord('A')  # 0-25
    l4 = ord(p[6]) - ord('A')  # 0-25

    return (l1 * 26 + l2) * 676_000 \
         + n              * 676     \
         + l3             * 26      \
         + l4

A few spot checks: AA000AA → 0. AA000AB → 1. AA001AA → 676. AB000AA → 676,000. The encoding is monotonically increasing — every plate that comes later in the alphabet maps to a strictly higher integer.

Pre-Mercosur encoding

The earlier ABC123 format encodes similarly, but with three leading letters instead of two:

Python

def encode_pre_mercosur(plate: str) -> int:
    """
    Encode an ABC123 pre-Mercosur plate to a sequence integer.
    AAA000 = 0, AAA001 = 1, ... ZZZ999 = 17,575,999
    """
    p = plate.upper().replace(" ", "").replace("-", "")
    assert len(p) == 6, "Pre-Mercosur plates are 6 characters"

    l1 = ord(p[0]) - ord('A')  # 0-25
    l2 = ord(p[1]) - ord('A')  # 0-25
    l3 = ord(p[2]) - ord('A')  # 0-25
    n  = int(p[3:])             # 0-999

    return (l1 * 676 + l2 * 26 + l3) * 1000 + n

Learning the Boundaries from Real Data

Encoding gives us integers. To turn those integers into years, we need to know which sequence ranges correspond to which years. This is where training data comes in.

Using over 200,000 Argentine plates with known registration years, we computed the mean sequence number per year. This gives a representative centroid for each year’s plate population:

The year boundary cut points are simply the midpoints between adjacent means. This produces clean, non-overlapping ranges — every sequence integer maps to exactly one year:

Seq rangeEstimated yearCut point derivation
< 671,6192016(294k + 1,048k) / 2
671,619 – 1,467,1702017(1,048k + 1,885k) / 2
1,467,170 – 2,207,1362018(1,885k + 2,528k) / 2
2,207,136 – 2,719,7202019(2,528k + 2,910k) / 2
2,719,720 – 3,088,3062020(2,910k + 3,265k) / 2
3,088,306 – 3,472,7692021(3,265k + 3,679k) / 2
3,472,769 – 3,888,5342022(3,679k + 4,097k) / 2
3,888,534 – 4,307,0052023(4,097k + 4,516k) / 2
4,307,005 – 4,773,8592024(4,516k + 5,030k) / 2
≥ 4,773,8592025(open upper bound)

The naive approach used min/max ranges per year — but these overlap badly. Using the mean and splitting at midpoints gives clean, unambiguous boundaries.


The Full Estimator in Python

Python

import re
from dataclasses import dataclass
from typing import Optional

# ── Mercosur year boundaries (midpoints between annual means) ──────────────
# Derived from 200k+ Argentine plates with known registration years.
# Each January, recalculate the mean for the new year and add one entry:
#   new_cut = (mean_prev_year + mean_new_year) / 2
# ──────────────────────────────────────────────────────────────────────────
MERCOSUR_CUTS = [
    (671_619,   2016),
    (1_467_170, 2017),
    (2_207_136, 2018),
    (2_719_720, 2019),
    (3_088_306, 2020),
    (3_472_769, 2021),
    (3_888_534, 2022),
    (4_307_005, 2023),
    (4_773_859, 2024),
    (float('inf'), 2025),
    # Add 2026 here in January 2027:
    # (cut_2025_2026, 2025), (float('inf'), 2026),
]

# Pre-Mercosur: two-letter prefix → dominant year
# Derived from same training dataset. Stable — no new plates since ~2016.
PRE_MERCOSUR_PREFIX = {
    # A block 1994-1996
    'AA': lambda n: 1994 if n < 500 else 1995,
    'AB': 1995, 'AC': 1995, 'AD': 1995, 'AE': 1995, 'AF': 1995,
    'AG': 1995, 'AH': 1995, 'AI': 1995, 'AJ': 1995, 'AK': 1995,
    'AL': 1995, 'AM': 1995, 'AN': 1995,
    'AO': lambda n: 1995 if n < 400 else 1996,
    'AP': 1996, 'AQ': 1996, 'AR': 1996, 'AS': 1996, 'AT': 1996,
    'AU': 1996, 'AV': 1996, 'AW': 1996, 'AX': 1996, 'AY': 1996, 'AZ': 1996,
    # B block 1996-1998
    'BA': 1996, 'BB': 1996,
    'BC': lambda n: 1996 if n < 300 else 1997,
    'BD': 1997, 'BE': 1997, 'BF': 1997, 'BG': 1997, 'BH': 1997,
    'BI': 1997, 'BJ': 1997, 'BK': 1997, 'BL': 1997, 'BM': 1997,
    'BN': 1997, 'BO': 1997,
    'BP': lambda n: 1997 if n < 500 else 1998,
    'BQ': 1998, 'BR': 1998, 'BS': 1998, 'BT': 1998, 'BU': 1998,
    'BV': 1998, 'BW': 1998, 'BX': 1998, 'BY': 1998, 'BZ': 1998,
    # C-P blocks follow same pattern; see full table at ar.matriculaapi.com
    # ... (abbreviated for readability)
}

PRE_MERCOSUR_RE = re.compile(r'^[A-Z]{3}\d{3}$')
MERCOSUR_RE     = re.compile(r'^[A-Z]{2}\d{3}[A-Z]{2}$')


@dataclass
class PlateEstimate:
    input_plate:    str
    format:         str         # 'MERCOSUR' | 'PRE-MERCOSUR' | 'MERCOSUR-IMPORT' | 'UNKNOWN'
    estimated_year: Optional[int]
    confidence:     str         # 'HIGH' | 'MEDIUM' | 'LOW'
    sequence_num:   Optional[int]
    notes:          Optional[str] = None


def estimate_year(plate: str) -> PlateEstimate:
    p = plate.upper().replace(" ", "").replace("-", "")

    # ── Mercosur AB123CD ─────────────────────────────────────────────────────
    if MERCOSUR_RE.match(p):
        seq = encode_mercosur(p)

        if seq < 7:
            return PlateEstimate(plate, 'MERCOSUR-IMPORT', None, 'LOW', seq,
                notes='Sequence predates Argentine rollout; likely import')

        year = next(y for cut, y in MERCOSUR_CUTS if seq < cut)

        # Confidence: MEDIUM if within 5% of the nearest boundary
        confidence = 'HIGH'
        cuts = [c for c, _ in MERCOSUR_CUTS[:-1]]
        gaps = [MERCOSUR_CUTS[i+1][0] - MERCOSUR_CUTS[i][0]
                for i in range(len(cuts))]
        for cut, gap in zip(cuts, gaps):
            if abs(seq - cut) < gap * 0.05:
                confidence = 'MEDIUM'
                break

        return PlateEstimate(plate, 'MERCOSUR', year, confidence, seq)

    # ── Pre-Mercosur ABC123 ───────────────────────────────────────────────────
    if PRE_MERCOSUR_RE.match(p):
        prefix = p[:2]
        num    = int(p[3:])
        entry  = PRE_MERCOSUR_PREFIX.get(prefix)

        if entry is None:
            return PlateEstimate(plate, 'PRE-MERCOSUR', None, 'LOW', None,
                notes=f'Prefix {prefix!r} not in training data')

        year = entry(num) if callable(entry) else entry

        confidence = ('LOW'    if prefix[0] in 'RSTUV'
                 else 'MEDIUM' if prefix <= 'DZ'
                 else 'HIGH')

        return PlateEstimate(plate, 'PRE-MERCOSUR', year, confidence,
                              encode_pre_mercosur(p))

    return PlateEstimate(plate, 'UNKNOWN', None, 'LOW', None,
        notes='Does not match any known Argentine plate format')


# ── Quick demo ────────────────────────────────────────────────────────────
for test in ['AC601QQ', 'AC875MD', 'OZY040', 'GDA123', 'AB 172UC']:
    r = estimate_year(test)
    print(f"{r.input_plate:10} → {r.estimated_year}  [{r.confidence}]  {r.format}")

Why Pre-Mercosur Is Trickier

You might expect pre-Mercosur plates to work the same way as Mercosur — encode to an integer, find the range. But the raw data tells a different story: the sequence number ranges for adjacent years overlap almost completely.

The reason is that Argentina’s provinces received independent plate allocations. Buenos Aires, Córdoba, and Mendoza were all issuing plates simultaneously from their own provincial ranges. A national sequence number alone therefore can’t pinpoint a year — the number space was being consumed by 23 provinces in parallel.

What does cluster cleanly by year is the two-letter prefix. The national allocation advanced through the alphabet over time, so GA–GT plates are overwhelmingly from 2007, HT–HZ from 2009, and so on. The training data confirms this: for the densest prefixes, over 90% of plates share a single dominant year.

At prefix boundaries — AOBCGTHS, and others — the numeric suffix provides a secondary signal. A plate in the GT prefix with a low number (GT100) likely precedes one with a high number (GT850) by several months, straddling a year boundary.


Confidence and Outliers

The estimator returns one of three confidence levels:

HIGH  The plate sits comfortably within a year band — more than 5% away from any boundary. For Mercosur plates from 2017 onwards (where training density is highest) this is the common case.

MEDIUM  The plate is within 5% of a year cut point, or belongs to a pre-Mercosur prefix with moderate training data. The estimate is most likely correct but the adjacent year is plausible — late registrations, delivery delays, and data entry lag all introduce real ambiguity near boundaries.

LOW  The prefix is sparse (old R/S/T/U/V plates, typically pre-1995) or the plate is flagged as a likely import. Imports arise when a Mercosur-format plate has a sequence number that predates Argentina’s 2016 rollout — these vehicles were most likely registered in Brazil, Uruguay, or Paraguay and subsequently imported.


Keeping It Fresh

The Mercosur boundaries need updating once a year. The process is three steps:

1. Collect a fresh batch of plates with known years.
2. Compute the mean sequence number for the new year’s cohort.
3. Set the new cut point as (mean_prev + mean_new) / 2 and append it to the table.

No existing cut points change — you’re only ever adding one new entry to the bottom of the list. The pre-Mercosur prefix table is stable and needs no maintenance, since no new plates in that format have been issued since approximately 2016.


Beyond Year Estimation

Year estimation is useful on its own — for insurance quoting, fleet valuation, or fraud detection — but it becomes much more powerful when combined with a full vehicle lookup.

The Argentine Matricula API takes any plate (pre-Mercosur or Mercosur) and returns the complete vehicle record: make, model, year, engine, and more. For records where the registration year is missing or uncertain, the sequence-based estimate described here fills the gap automatically, with a confidence flag so downstream consumers know how much to trust it.

Argentine vehicle lookup API

Try a plate lookup at ar.matriculaapi.com. The API is available for integration via vehicleapi.com and supports batch queries, JSON responses, and per-format confidence scoring.

© 2026 Infinite Loop Development Ltd  

Categories: Uncategorized

The Hidden Cost of ORDER BY NEWID()

Fetching a random row from a table is a surprisingly common requirement — random banner ads, sample data, rotating API credentials. The instinctive solution in SQL Server is elegant-looking but conceals a serious performance trap.

-- Looks innocent. Isn't.
SELECT TOP 1 * FROM LARGE_TABLE ORDER BY NEWID()

What SQL Server actually does

NEWID() generates a fresh GUID for every single row in the table. SQL Server must then sort the entire result set by those GUIDs before it can hand back the top one. On a table with a million rows you are generating a million GUIDs, sorting a million rows, and discarding 999,999 of them.

The problem: On large tables, ORDER BY NEWID() performs a full table scan and a full sort — O(n log n) work — regardless of how many rows you need. It cannot use any index for ordering.

A faster alternative: seek, don’t sort

The key insight is to convert the “random sort” into a “random seek”. If we can generate a random Id value cheaply and then let the clustered index do the work, we avoid scanning the table entirely.

DECLARE @Min INT = (SELECT MIN(Id) FROM LARGE_TABLE)
DECLARE @Max INT = (SELECT MAX(Id) FROM LARGE_TABLE)
SELECT TOP 1 *
FROM LARGE_TABLE
WHERE Id >= @Min + ABS(CHECKSUM(NEWID()) % (@Max - @Min + 1))
ORDER BY Id ASC

MAX(Id) and MIN(Id) are single index seeks on the primary key. CHECKSUM(NEWID()) generates a random integer without sorting anything. The WHERE Id >= clause then performs a single index seek from that point forward, and ORDER BY Id ASC TOP 1 picks up the very next row.

The result: Two index seeks to get the range, one index seek to find the row. Constant time regardless of table size.

Performance at a glance

ApproachReadsSortScales with table size?
ORDER BY NEWID()Full scanFull sortO(n log n)
CHECKSUM seek3 index seeksNoneO(1)

Three caveats to know

1. Id gaps cause mild bias. If rows have been deleted, gaps in the Id sequence mean rows immediately after a gap are slightly more likely to be selected. For most use cases — sampling, rotation, A/B testing — this is an acceptable trade-off.

2. Ids may not start at 1. This is why we use @Min rather than hardcoding zero. If your identity seed started at 1000, NEWID() % MAX(Id) would generate values 0–999, which would never match any row and you’d always get the first row in the table.

3. CHECKSUM can return INT_MIN. ABS(INT_MIN) overflows back to negative in SQL Server. The fix is to apply the modulo before the ABS, keeping the intermediate value safely within range.

When you don’t need randomness at all

For round-robin rotation across a fixed set of rows — such as alternating between API credentials or cookie sessions — true randomness is unnecessary overhead. A deterministic slot based on the current second is even cheaper:

-- Rotates across N accounts, one per second, no writes required
WHERE Slot = DATEPART(SECOND, GETUTCDATE()) % TotalAccounts

This resolves to a constant integer comparison — effectively a single index seek — and scales to any number of accounts automatically. No tracking table, no writes, no contention.

The takeaway: whenever you reach for ORDER BY NEWID(), ask whether you actually need true randomness or just approximate distribution. In most production scenarios, a cheap seek beats an expensive sort by several orders of magnitude.

Enrich Your Qualtrics Surveys with Real-Time Respondent Data Using AvatarAPI

Qualtrics is excellent at capturing what respondents tell you. But what if you could automatically fill in what you already know — or can discover — the moment they enter their email address?

AvatarAPI resolves an email address into rich profile data in real time: a profile photo, full name, city, country, and the social network behind it. By embedding this lookup directly into your Qualtrics survey flow, you collect more information about each respondent without asking a single extra question.


What Data Does AvatarAPI Return?

When you pass an email address to the API, it returns the following fields — all of which can be mapped into Qualtrics Embedded Data and used anywhere in your survey:

FieldDescription
ImageURL to the respondent’s profile photo
NameResolved full name
CityCity of residence
CountryCountry code
ValidWhether the email address is real and reachable
IsDefaultWhether the avatar is a fallback/generic image
Source.NameThe social network the data came from
RawDataThe complete JSON payload

Watch the Video Walkthrough

Before diving into the written steps, watch this complete tutorial — from configuring the Web Service element to rendering the avatar photo on a results page:


Step-by-Step Integration Guide

You can either follow these steps from scratch, or import the ready-made AvatarAPI.qsf template file directly into Qualtrics (see Step 8).


Step 1 — Get Your AvatarAPI Credentials

Sign up at avatarapi.com to obtain a username and password. A free demo account is available for evaluation — use the credentials demo / demo to test before going live.

The API endpoint you will call is:

https://avatarapi.com/v2/api.aspx

Step 2 — Create an Email Capture Question

In your Qualtrics survey, add a Text Entry question with a Single Line selector. This is where respondents will enter their email address.

Note the Question ID assigned to this question (e.g. QID3) — you will reference it when configuring the Web Service. You can find the QID by opening the question’s advanced options.

Tip: Add email format validation via Add Validation → Content Validation → Email to ensure the value passed to the API is always well-formed.


Step 3 — Add a Web Service Element to Your Survey Flow

Navigate to Survey Flow (the flow icon in the left sidebar). Click Add a New Element Here and choose Web Service. Position this element after the block containing your email question and before your results block.

Configure it as follows:

  • URL: https://avatarapi.com/v2/api.aspx
  • Method: POST
  • Content-Type: application/json

Step 4 — Set the Request Body Parameters

Under Set Request Parameters, switch to Specify Body Params and add these three key-value pairs:

{
"username": "your_username",
"password": "your_password",
"email": "${q://QID3/ChoiceTextEntryValue}"
}

The Qualtrics piped text expression ${q://QID3/ChoiceTextEntryValue} dynamically inserts whatever email the respondent typed. Replace QID3 with the actual QID of your email question if it differs.


Step 5 — Map the API Response to Embedded Data Fields

Scroll down to Map Fields from Response. Add one row for each field you want to capture. The From Response column is the JSON key returned by AvatarAPI; the To Field column is the Embedded Data variable name.

From Response (JSON key)To Field (Embedded Data)
ImageImage
NameName
ValidValid
CityCity
CountryCountry
IsDefaultIsDefault
Source.NameSource.Name
RawDataRawData

Note: Qualtrics stores these variables automatically — you don’t need to pre-declare them as Embedded Data elsewhere in the flow, though doing so in the survey flow header keeps things organised.


Step 6 — Display the Avatar Photo on a Results Page

Add a Descriptive Text / Graphic question in a block placed after the Web Service call in your flow.

In the rich-text editor, switch to the HTML source view and paste this snippet:

<img
src="${e://Field/Image}"
alt="Profile Picture"
style="width:100px; height:100px; border-radius:50%;"
/>

The expression ${e://Field/Image} inserts the profile photo URL at runtime. The border-radius: 50% gives it a circular crop for a polished appearance.

You can display other fields using the same pattern:

Name: ${e://Field/Name}
City: ${e://Field/City}
Country: ${e://Field/Country}
Source: ${e://Field/Source.Name}

Step 7 — Test with the Demo Account

Before going live, test the integration using the demo credentials. Enter a well-known email address (such as a Gmail address you know has a Google profile photo) to verify the image and data return correctly.

After a test submission, check the Survey Data tab — all mapped fields (Image, Name, City, Country, etc.) should appear as columns alongside your standard question responses.

Rate limits & production use: The demo credentials are shared and rate-limited. Swap in your own account credentials before publishing a live survey to ensure reliable performance.


Step 8 — Import the Ready-Made QSF Template

Rather than building from scratch, you can import the AvatarAPI.qsf file directly into Qualtrics. This gives you a pre-configured survey with the email question, Web Service flow, and image display block already set up.

To import: go to Create a new project → Survey → Import a QSF file and upload AvatarAPI.qsf. Then update the Web Service credentials to your own username and password, and you’re ready to publish.


How the Survey Flow Works

Once configured, your survey flow has this simple three-part structure:

  1. Block — Respondent enters their email address
  2. Web Service — Silent POST to avatarapi.com/v2/api.aspx; response fields mapped to Embedded Data
  3. Block — Results page displays the avatar photo and enriched profile data

The respondent experiences a seamless survey: they enter their email on page one, the API call fires silently between pages, and they see a personalised result — including their own profile photo — on page two.


Practical Use Cases

Lead enrichment surveys — Capture a prospect’s email and automatically resolve their name, city, and country without asking. Append this data to your CRM export from Qualtrics.

Event registration flows — Display the registrant’s photo back to them as a confirmation step, increasing engagement and reducing drop-off.

Email validation checkpoints — Use the Valid flag in a branch logic condition to route respondents with unresolvable addresses to a correction screen or alternative path.

Research panels — Enrich responses with geographic signals without asking respondents to self-report location, reducing survey length and improving data quality.


Get Started

  • API documentation & sign-up: avatarapi.com
  • API endpoint: https://avatarapi.com/v2/api.aspx
  • Demo credentials: username demo / password demo
  • Video tutorial: Watch on YouTube

Building a Wildcard Catch-All POP3 Mail Server on Ubuntu

Receive mail for any address on any subdomain — no per-account configuration required

Introduction

This guide walks through setting up a wildcard catch-all mail server on Ubuntu using Postfix and Dovecot. The goal is to receive email sent to any address on any subdomain of your domain — for example, anything@abc.yourdomain.com or test@xyz.yourdomain.com — without having to configure individual mailboxes in advance.

This is particularly useful for testing, disposable address systems, API integrations, and mail sink setups where you want to capture inbound mail programmatically. The server will not send mail — only receive it. Mail older than 24 hours is automatically purged.

Architecture Overview

The stack consists of three components working together:

  • Postfix — receives inbound SMTP and delivers to a local virtual mailbox
  • Dovecot — serves POP3 access to the mailbox
  • A single catch-all mailbox — all mail for all subdomains and addresses funnels into one Maildir

Rather than creating individual accounts, everything is routed to a single mailbox. A POP3 client connects with one username and password to retrieve all mail regardless of which address or subdomain it was sent to.

Part 1 — DNS Configuration

How Wildcard MX Records Work

MX records must point to a hostname, not an IP address directly. This means two DNS records are needed: an MX record pointing to a mail hostname, and an A record resolving that hostname to your server’s IP address.

Create the following records in your DNS provider (AWS Route 53 or equivalent):

Record NameType / Value
*.yourdomain.comMX — 10 mail.yourdomain.com
mail.yourdomain.comA — your.server.ip.address

The wildcard MX record *.yourdomain.com matches any single-level subdomain lookup. When a sending mail server looks up the MX record for abc.yourdomain.com, it matches the wildcard and is directed to mail.yourdomain.com, which in turn resolves to your server’s IP via the A record.

Note that the wildcard covers one subdomain level deep. Mail to anything@abc.yourdomain.com is covered. A deeper level such as anything@a.b.yourdomain.com would require a separate record.

Verifying DNS Records

From a Windows machine, use nslookup to verify records have propagated:

# Check the MX recordnslookup -type=MX abc.yourdomain.com
# Check the A record for the mail hostnslookup mail.yourdomain.com
# Query AWS nameservers directly (before public propagation)nslookup -type=NS yourdomain.comnslookup -type=MX abc.yourdomain.com ns-123.awsdns-45.com

You can also use dnschecker.org to check propagation across multiple global resolvers simultaneously.

Part 2 — Server Setup

Install Postfix and Dovecot

sudo apt updatesudo apt install postfix dovecot-pop3d -y

During the Postfix installation prompt, select Internet Site and enter your domain name (e.g. yourdomain.com) when asked for the mail name.

Configure Postfix

Edit the main Postfix configuration file:

sudo nano /etc/postfix/main.cf

Add or update the following values:

myhostname = mail.yourdomain.commydomain = yourdomain.com
# Leave mydestination empty — we use virtual mailboxes insteadmydestination =
# Accept mail for any subdomain matching the wildcardvirtual_mailbox_domains = regexp:/etc/postfix/virtual_domainsvirtual_mailbox_base = /var/mail/vhostsvirtual_mailbox_maps = regexp:/etc/postfix/virtual_mailboxvirtual_minimum_uid = 100virtual_uid_maps = static:5000virtual_gid_maps = static:5000
# Required to prevent open relaysmtpd_relay_restrictions = permit_mynetworks, reject_unauth_destination

Create the virtual domains file — this regexp matches any subdomain of your domain:

sudo nano /etc/postfix/virtual_domains
/^\.+\.yourdomain\.com$/    OK

Create the virtual mailbox map — this catches all addresses and routes them to a single catchall mailbox:

sudo nano /etc/postfix/virtual_mailbox
/^.+@.+\.yourdomain\.com$/    catchall/

Rebuild the aliases database (required to avoid a startup warning):

newaliases

Create the Virtual Mail User and Mailbox

Postfix delivers mail as a dedicated system user (vmail). Create the user, group, and mailbox directory:

sudo groupadd -g 5000 vmailsudo useradd -u 5000 -g 5000 -d /var/mail/vhosts -s /sbin/nologin vmailsudo mkdir -p /var/mail/vhosts/catchallsudo chown -R vmail:vmail /var/mail/vhosts

Configure Dovecot for POP3

Enable the POP3 protocol in the main Dovecot config:

sudo nano /etc/dovecot/dovecot.conf
protocols = pop3

Set the mail location to the catchall Maildir:

sudo nano /etc/dovecot/conf.d/10-mail.conf
mail_location = maildir:/var/mail/vhosts/catchall

Allow plaintext authentication (suitable for internal/trusted use — see the TLS note at the end for public-facing deployments):

sudo nano /etc/dovecot/conf.d/10-auth.conf
disable_plaintext_auth = noauth_mechanisms = plain login
passdb {  driver = passwd-file  args = /etc/dovecot/users}
userdb {  driver = static  args = uid=5000 gid=5000 home=/var/mail/vhosts/catchall}

Create the Dovecot users file with your chosen credentials:

sudo nano /etc/dovecot/users
# Format: username:{PLAIN}passwordmailuser:{PLAIN}yourpasswordhere

Start the Services

sudo systemctl restart postfixsudo systemctl restart dovecot

Verify Postfix is running:

postfix status

Check the mail log for any errors:

tail -30 /var/log/mail.log

Part 3 — Firewall Configuration

Cloud Firewall (Linode / AWS / equivalent)

Open the following inbound ports in your cloud provider’s firewall. On Linode this is found under Networking > Firewalls in the dashboard. Changes apply immediately with no reboot required.

Port / ProtocolPurpose
22 TCPSSH (ensure this is always open)
25 TCPSMTP inbound (receiving mail)
110 TCPPOP3 (retrieving mail)

UFW on the Ubuntu Instance

sudo ufw allow 22/tcpsudo ufw allow 25/tcpsudo ufw allow 110/tcpsudo ufw enablesudo ufw status

Always confirm port 22 is allowed before enabling UFW to avoid locking yourself out of SSH.

Part 4 — Testing

Test SMTP Locally

From the server itself, connect to Postfix on port 25 and send a test message. Use 127.0.0.1 rather than localhost to avoid IPv6 connection issues:

telnet 127.0.0.1 25

You should immediately see the greeting banner:

220 mail.yourdomain.com ESMTP Postfix

Then send a test message interactively:

EHLO test.comMAIL FROM:<test@test.com>RCPT TO:<anything@abc.yourdomain.com>DATASubject: Test mail
Hello this is a test.QUIT

Each step should return a 250 OK response. The RCPT TO line is the critical one — if the wildcard regexp is configured correctly, Postfix will accept any subdomain address. After QUIT, verify the mail landed in the mailbox:

tail -20 /var/log/mail.logls -la /var/mail/vhosts/catchall/new/

You should see a file in the new/ directory — that is the email in Maildir format.

Test POP3 Locally

telnet 127.0.0.1 110

Dovecot should respond with:

+OK Dovecot (Ubuntu) ready.

Then authenticate and list messages:

USER mailuserPASS yourpasswordhereLISTRETR 1QUIT

A successful LIST response showing message count confirms the full chain is working: inbound SMTP via Postfix, delivery to virtual Maildir, and POP3 retrieval via Dovecot.

Part 5 — Automatic Mail Purge

To automatically delete mail older than 24 hours, add a cron job:

sudo crontab -e

Add the following line:

0 * * * * find /var/mail/vhosts/catchall -type f -mmin +1440 -delete

This runs every hour and removes any file in the catchall mailbox older than 1440 minutes (24 hours).

Optional — Silence the Backwards Compatibility Warning

Postfix logs a harmless warning about backwards-compatible default settings. To silence it:

postconf compatibility_level=3.6postfix reload

Security Notes

  • Port 110 transmits credentials in plaintext. For any public-facing deployment, configure Dovecot with TLS and use POP3S on port 995 instead.
  • The smtpd_relay_restrictions = permit_mynetworks, reject_unauth_destination setting prevents your server from acting as an open relay — do not remove this.
  • Consider rate limiting inbound SMTP connections if the server is publicly accessible to reduce spam load.
  • The vmail system user has no login shell (nologin) and cannot be used to access the system interactively.

Summary

With Postfix and Dovecot configured as described above, your server will:

  • Accept inbound SMTP for any address on any subdomain of your domain
  • Deliver all mail into a single catch-all Maildir with no per-account configuration
  • Expose all received mail via POP3 using a single username and password
  • Automatically purge mail older than 24 hours
  • Require no restart or reconfiguration when new subdomains or addresses are used

How to Detect and Fix Squid Proxy Abuse

Running an open HTTP proxy on the internet, even temporarily for testing, can quickly attract unwanted attention. Within minutes of deploying an unsecured Squid proxy, malicious actors can discover and abuse it for scanning, attacks, or hiding their origin. Here’s how to spot the warning signs and lock down your proxy.

Symptoms of Proxy Abuse

1. Proxy Stops Working

The most obvious symptom is that your proxy simply stops responding to legitimate requests. Connections timeout or hang indefinitely, even though the Squid service appears to be running.

2. Cache File Descriptor Warnings

When checking the Squid service status, you see repeated warnings like:

WARNING: Your cache is running out of file descriptors

This occurs because the proxy is handling far more concurrent connections than expected for a small test server.

3. Service Shows Active But Unresponsive

The systemd status shows Squid as “active (running)” with normal startup messages, but actual proxy requests fail:

bash

$ sudo systemctl status squid
● squid.service - Squid Web Proxy Server
Active: active (running)

Yet when you try to use it:

bash

$ curl -x http://your-proxy:8888 https://example.com
curl: (28) Failed to connect to example.com port 443 after 21060ms

4. High Memory or CPU Usage

A small EC2 instance (t2.micro or t3.micro) that should be mostly idle shows elevated resource consumption.

How to Verify Proxy Abuse

Check the Access Logs

The quickest way to confirm abuse is to examine the Squid access log:

bash

sudo tail -100 /var/log/squid/access.log

What to look for:

A healthy proxy used only by you should show:

  • One or two source IP addresses (yours)
  • Requests to legitimate domains
  • Occasional HTTPS CONNECT requests

An abused proxy will show:

  • Dozens of different source IP addresses
  • CONNECT requests to random IP addresses (not domain names)
  • Strange ports: SSH (22), Telnet (23), email ports (25, 587, 993, 110), random high ports
  • High frequency of requests

Example of Abuse

Here’s what an abused proxy log looks like:

1770200370.089 59842 172.234.115.25 TCP_TUNNEL/503 0 CONNECT 188.64.128.123:22
1770200370.089 59842 51.83.10.33 TCP_TUNNEL/503 0 CONNECT 188.64.132.53:443
1770200370.089 59841 172.234.115.25 TCP_TUNNEL/503 0 CONNECT 188.64.128.4:22
1770200370.089 59332 185.90.61.84 TCP_TUNNEL/503 0 CONNECT 188.64.129.251:8000
1770200370.089 214 91.202.74.22 TCP_TUNNEL/503 0 CONNECT 188.64.129.143:23
1770200370.191 579 51.83.10.33 TCP_TUNNEL/200 39 CONNECT 188.64.128.4:8021
1770200370.235 11227 51.83.10.33 TCP_TUNNEL/200 176 CONNECT 188.64.131.66:587

Notice:

  • Multiple unique source IPs
  • Connections to SSH (port 22), Telnet (port 23), SMTP (port 587)
  • Targets are raw IP addresses, not domain names
  • Hundreds of requests per minute

This is classic behavior of attackers using your proxy to scan the internet for vulnerable services.

Count Unique IPs

To see how many different IPs are using your proxy:

bash

sudo awk '{print $3}' /var/log/squid/access.log | sort | uniq -c | sort -rn | head -20

If you see more than a handful of IPs (especially if you’re the only legitimate user), your proxy is being abused.

Check Current Connections

See active connections to your proxy port:

bash

sudo netstat -tn | grep :8888

A legitimate test proxy should have 0-2 active connections. Dozens of connections indicate abuse.

How to Fix It Immediately

1. Lock Down the AWS Security Group

The fastest fix is to restrict access at the network level:

Via AWS Console:

  1. Navigate to EC2 → Security Groups
  2. Select the security group attached to your proxy instance
  3. Click “Edit inbound rules”
  4. Find the rule for your proxy port (e.g., 8888)
  5. Change Source from 0.0.0.0/0 to “My IP”
    • AWS will auto-detect and fill in your current public IP
  6. Click “Save rules”

The change takes effect immediately – no restart required.

2. Restart Squid to Kill Existing Connections

Even after locking down the security group, existing connections may persist:

bash

sudo systemctl restart squid

3. Clear the Logs

Start fresh to verify the abuse has stopped:

bash

# Stop Squid
sudo systemctl stop squid
# Clear logs
sudo truncate -s 0 /var/log/squid/access.log
sudo truncate -s 0 /var/log/squid/cache.log
# Clear cache if you're seeing file descriptor warnings
sudo rm -rf /var/spool/squid/*
sudo squid -z
# Restart
sudo systemctl start squid

4. Verify It’s Fixed

Watch the log in real-time:

bash

sudo tail -f /var/log/squid/access.log

If tail -f just sits there with no output, that’s good – it means no requests are coming through.

Now test from your own machine:

bash

curl -x http://your-proxy-ip:8888 https://ifconfig.me

You should immediately see your request appear in the log, and nothing else.

Prevention Best Practices

For Testing Environments

  1. Always use IP whitelisting – Never expose a proxy to 0.0.0.0/0 even for testing
  2. Use non-standard ports – While not security through obscurity, it reduces automated scanning
  3. Set up authentication – Even basic auth is better than nothing
  4. Monitor logs – Check periodically for unexpected traffic
  5. Terminate when done – Don’t leave test infrastructure running

Minimal Squid Config with Authentication

For slightly better security, add basic authentication:

bash

# Install htpasswd
sudo apt install apache2-utils
# Create password file
sudo htpasswd -c /etc/squid/passwords testuser
# Edit squid.conf
sudo nano /etc/squid/squid.conf

Add these lines:

http_port 8888
# Basic authentication
auth_param basic program /usr/lib/squid/basic_ncsa_auth /etc/squid/passwords
auth_param basic realm proxy
acl authenticated proxy_auth REQUIRED
http_access allow authenticated
http_access deny all
cache deny all

Restart Squid and now clients must authenticate:

bash

curl -x http://testuser:password@your-proxy:8888 https://example.com

For Production

If you need a production proxy:

  • Use a proper reverse proxy like nginx or HAProxy with TLS
  • Implement OAuth or certificate-based authentication
  • Use AWS PrivateLink or VPC peering instead of public exposure
  • Enable detailed logging and monitoring
  • Set up rate limiting
  • Consider managed solutions like AWS API Gateway or CloudFront

Conclusion

Open proxies are magnets for abuse. Automated scanners continuously sweep the internet looking for misconfigured proxies to exploit. The symptoms are often subtle – file descriptor warnings, poor performance, or timeouts – but the fix is straightforward: restrict access to only trusted IP addresses at the network level.

For testing purposes, AWS Security Groups provide the perfect solution: instant IP-based access control with no performance overhead. Combined with monitoring the Squid access logs, you can quickly detect and eliminate abuse before it impacts your testing or incurs unexpected costs.

Remember: if you’re running a temporary test proxy, lock it down to your IP from the start. It only takes minutes for automated scanners to find and abuse an open proxy.


Key Takeaways:

  • ✅ Always restrict proxy access via security groups/firewall rules
  • ✅ Monitor access logs for unexpected IP addresses
  • ✅ Watch for file descriptor warnings as an early sign of abuse
  • ✅ Clear logs and restart after securing to verify the fix
  • ✅ Terminate test infrastructure when finished to avoid ongoing costs

Benchmarking reCAPTCHA v3 Solver Services: Speed vs Quality Analysis

When implementing automated systems that need to solve reCAPTCHA v3 challenges, choosing the right solver service can significantly impact both your success rate and operational costs. We conducted a comprehensive benchmark test of five popular reCAPTCHA v3 solving services to compare their performance in terms of both speed and quality scores.

The Results

We tested five major captcha solving services: CapSolver, 2Captcha, AntiCaptcha, NextCaptcha, and DeathByCaptcha. Each service was evaluated both with and without residential proxy support (using Decodo residential proxies).

Speed Performance Rankings

Fastest to Slowest (without proxy):

  1. CapSolver – 3,383ms (3.4 seconds)
  2. NextCaptcha – 6,725ms (6.7 seconds)
  3. DeathByCaptcha – 16,212ms (16.2 seconds)
  4. AntiCaptcha – 17,069ms (17.1 seconds)
  5. 2Captcha – 36,149ms (36.1 seconds)

With residential proxy:

  1. CapSolver – 5,101ms (5.1 seconds)
  2. NextCaptcha – 10,875ms (10.9 seconds)
  3. DeathByCaptcha – 10,861ms (10.9 seconds)
  4. 2Captcha – 25,749ms (25.7 seconds)
  5. AntiCaptcha – Failed (task type not supported with proxy)

Quality Score Results

Here’s where the results become particularly interesting: all services that successfully completed the challenge returned identical scores of 0.10. This uniformly low score across all providers suggests we’re observing a fundamental characteristic of how these services interact with Google’s reCAPTCHA v3 system rather than differences in solver quality.

What Do These Results Tell Us?

1. The Score Mystery

A reCAPTCHA v3 score of 0.10 is at the very bottom of Google’s scoring range (0.0-1.0), indicating that Google’s system detected these tokens as very likely originating from bots. This consistent result across all five services reveals several important insights:

Why such low scores?

  • reCAPTCHA v3 uses machine learning trained on actual site traffic patterns
  • Without established traffic history, the system defaults to suspicious scores
  • Commercial solver services are inherently detectable by Google’s sophisticated fingerprinting
  • The test environment may lack the organic traffic patterns needed for v3 to generate higher scores

As mentioned in our research, CleanTalk found that reCAPTCHA v3 often returns consistent scores in test environments without production traffic. The system needs time to “learn” what normal traffic looks like for a given site before it can effectively differentiate between humans and bots.

2. Speed is the Real Differentiator

Since all services returned the same quality score, speed becomes the primary differentiator:

CapSolver emerged as the clear winner, solving challenges in just 3.4 seconds without proxy and 5.1 seconds with proxy. This represents a 10x speed advantage over the slowest service (2Captcha at 36 seconds).

NextCaptcha came in second place with respectable times of 6.7 seconds (no proxy) and 10.9 seconds (with proxy), making it a solid middle-ground option.

DeathByCaptcha and AntiCaptcha performed similarly at around 16-17 seconds without proxy, though AntiCaptcha failed to support proxy-based solving for this captcha type.

2Captcha was significantly slower at 36 seconds without proxy, though it did improve to 25.7 seconds with proxy enabled.

3. Proxy Support Variations

Proxy support proved inconsistent across services:

  • Most services handled proxies well, with CapSolver, NextCaptcha, DeathByCaptcha, and 2Captcha all successfully completing challenges through residential proxies
  • AntiCaptcha failed with proxy, returning an “ERROR_TASK_NOT_SUPPORTED” error, suggesting their proxy-based reCAPTCHA v3 implementation may have limitations
  • Proxy impact on speed varied: Some services (2Captcha) were faster with proxy, while others (CapSolver, NextCaptcha) were slower

4. Success Rates

All services except AntiCaptcha (with proxy) achieved 100% success rates, meaning they reliably returned valid tokens. However, the validity of a token doesn’t correlate with its quality score—all tokens were valid but all received low scores from Google.

Practical Implications

For High-Volume Operations

If you’re processing thousands of captchas daily, CapSolver’s 3-5 second solve time provides a massive throughput advantage. At scale, this speed difference translates to:

  • Processing 1,000 captchas with CapSolver: ~56 minutes
  • Processing 1,000 captchas with 2Captcha: ~10 hours

For Quality-Sensitive Applications

The uniform 0.10 scores reveal a hard truth: commercial reCAPTCHA v3 solvers may not produce high-quality tokens that pass strict score thresholds. If your target site requires scores above 0.5 or 0.7, these services may not be suitable regardless of which one you choose.

Cost Considerations

Since all services returned the same quality, cost-per-solve becomes the tiebreaker alongside speed:

  • CapSolver: ~$1.00 per 1,000 solves
  • 2Captcha: ~$2.99 per 1,000 solves
  • AntiCaptcha: ~$2.00 per 1,000 solves

CapSolver offers the best speed-to-cost ratio in this comparison.

The Bigger Picture: reCAPTCHA v3 Limitations

These results illuminate a broader challenge with reCAPTCHA v3 solver services. Google’s v3 system is fundamentally different from v2:

  • v2 presented challenges that could be solved by humans or AI
  • v3 analyzes behavior patterns, browser fingerprints, and site-specific traffic history

Commercial solvers can generate valid tokens, but those tokens carry telltale signatures that Google’s machine learning readily identifies. The consistently low scores suggest that Google has effective detection mechanisms for solver-generated traffic.

When Might Scores Improve?

Based on research and documentation:

  1. Production environments with real organic traffic may see better scores
  2. Time – letting reCAPTCHA v3 “train” on a site for days or weeks
  3. Mixed traffic – solver tokens mixed with legitimate user traffic
  4. Residential proxies – though our test showed this alone doesn’t improve scores

Conclusions and Recommendations

If Speed Matters Most

Choose CapSolver. Its 3-5 second solve times are unmatched, and at $1 per 1,000 solves, it’s also the most cost-effective option.

If You Need Proxy Support

Avoid AntiCaptcha for proxy-based v3 solving. CapSolver, NextCaptcha, and DeathByCaptcha all handled residential proxies successfully.

If Quality Scores Matter

Reconsider using solver services entirely. The uniform 0.10 scores suggest that commercial solvers may not be suitable for sites with strict score requirements. Consider alternative approaches:

  • Browser automation with real user simulation
  • Residential proxy networks with actual human solvers
  • Challenging whether reCAPTCHA v3 is the right solution for your use case

The Bottom Line

For raw performance in a test environment, CapSolver dominated with the fastest solve times and lowest cost. However, the universal 0.10 quality scores across all services reveal that speed and cost may be moot points if your application requires high-quality scores that pass Google’s bot detection.

The real takeaway? reCAPTCHA v3 is doing its job—it successfully identifies solver-generated tokens regardless of which service you use. If you need high scores, you’ll need more sophisticated approaches than simply purchasing tokens from commercial solving services.


This benchmark was conducted in January 2026 using production API credentials for all services. Tests were performed with both direct connections and residential proxy infrastructure. Individual results may vary based on site configuration, traffic patterns, and Google’s evolving detection systems.

Migrating Google Cloud Run to Scaleway: Bringing Your Cloud Infrastructure Back to Europe


Introduction: Why European Cloud Sovereignty Matters Now More Than Ever

In an era of increasing geopolitical tensions, data sovereignty concerns, and evolving international relations, European companies are reconsidering their dependence on US-based cloud providers. The EU’s growing emphasis on digital sovereignty, combined with uncertainties around US data access laws like the CLOUD Act and recent political developments, has made many businesses uncomfortable with storing sensitive data on American infrastructure.

For EU-based companies running containerized workloads on Google Cloud Run, there’s good news: migrating to European alternatives like Scaleway is surprisingly straightforward. This guide will walk you through the technical process of moving your Cloud Run services to Scaleway’s Serverless Containers—keeping your applications running while bringing your infrastructure back under European jurisdiction.

Why Scaleway?

Scaleway, a French cloud provider founded in 1999, offers a compelling alternative to Google Cloud Run:

  • 🇪🇺 100% European: All data centers located in France, Netherlands, and Poland
  • 📜 GDPR Native: Built from the ground up with European data protection in mind
  • 💰 Transparent Pricing: No hidden costs, generous free tiers, and competitive rates
  • 🔒 Data Sovereignty: Your data never leaves EU jurisdiction
  • ⚡ Scale-to-Zero: Just like Cloud Run, pay only for actual usage
  • 🌱 Environmental Leadership: Strong commitment to sustainable cloud infrastructure

Most importantly: Scaleway Serverless Containers are technically equivalent to Google Cloud Run. Both are built on Knative, meaning your containers will run identically on both platforms.

Prerequisites

Before starting, ensure you have:

  • An existing Google Cloud Run service
  • Windows machine with PowerShell
  • gcloud CLI installed and authenticated
  • A Scaleway account (free to create)
  • Skopeo installed (we’ll cover this)

Understanding the Architecture

Both Google Cloud Run and Scaleway Serverless Containers work the same way:

  1. You provide a container image
  2. The platform runs it on-demand via HTTPS endpoints
  3. It scales automatically (including to zero when idle)
  4. You pay only for execution time

The migration process is simply:

  1. Copy your container image from Google’s registry to Scaleway’s registry
  2. Deploy it as a Scaleway Serverless Container
  3. Update your DNS/endpoints

No code changes required—your existing .NET, Node.js, Python, Go, or any other containerized application works as-is.

Step 1: Install Skopeo (Lightweight Docker Alternative)

Since we’re on Windows and don’t want to run full Docker Desktop, we’ll use Skopeo—a lightweight tool designed specifically for copying container images between registries.

Install via winget:

powershell

winget install RedHat.Skopeo

Or download directly from: https://github.com/containers/skopeo/releases

Why Skopeo?

  • No daemon required: No background services consuming resources
  • Direct registry-to-registry transfer: Images never touch your local disk
  • Minimal footprint: ~50MB vs. several GB for Docker Desktop
  • Perfect for CI/CD: Designed for automation and registry operations

Configure Skopeo’s Trust Policy

Skopeo requires a policy file to determine which registries to trust. Create it:

powershell

# Create the config directory
New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\.config\containers"
# Create a permissive policy that trusts all registries
@"
{
"default": [
{
"type": "insecureAcceptAnything"
}
],
"transports": {
"docker-daemon": {
"": [{"type": "insecureAcceptAnything"}]
}
}
}
"@ | Out-File -FilePath "$env:USERPROFILE\.config\containers\policy.json" -Encoding utf8

For production environments, you might want a more restrictive policy that only trusts specific registries:

powershell

@"
{
"default": [{"type": "reject"}],
"transports": {
"docker": {
"gcr.io": [{"type": "insecureAcceptAnything"}],
"europe-west2-docker.pkg.dev": [{"type": "insecureAcceptAnything"}],
"rg.fr-par.scw.cloud": [{"type": "insecureAcceptAnything"}]
}
}
}
"@ | Out-File -FilePath "$env:USERPROFILE\.config\containers\policy.json" -Encoding utf8

Step 2: Find Your Cloud Run Container Image

Your Cloud Run service uses a specific container image. To find it:

Via gcloud CLI (recommended):

bash

gcloud run services describe YOUR-SERVICE-NAME \
--region=YOUR-REGION \
--project=YOUR-PROJECT \
--format='value(spec.template.spec.containers[0].image)'
```
This returns the full image URL, something like:
```
europe-west2-docker.pkg.dev/your-project/cloud-run-source-deploy/your-service@sha256:abc123...

Via Google Cloud Console:

  1. Navigate to Cloud Run in the console
  2. Click your service
  3. Go to the “Revisions” tab
  4. Look for “Container image URL”

The @sha256:... digest is important—it ensures you’re copying the exact image currently running in production.

Step 3: Set Up Scaleway Container Registry

Create a Scaleway Account

  1. Sign up at https://console.scaleway.com/
  2. Complete email verification
  3. Navigate to the console

Create a Container Registry Namespace

  1. Go to ContainersContainer Registry
  2. Click Create namespace
  3. Choose a region (Paris, Amsterdam, or Warsaw)
    • Important: Choose the same region where you’ll deploy your containers
  4. Enter a namespace name (e.g., my-containers, production)
    • Must be unique within that region
    • Lowercase, numbers, and hyphens only
  5. Set Privacy to Private
  6. Click Create

Your registry URL will be: rg.fr-par.scw.cloud/your-namespace

Create API Credentials

  1. Click your profile → API Keys (or visit https://console.scaleway.com/iam/api-keys)
  2. Click Generate API Key
  3. Give it a name (e.g., “container-migration”)
  4. Save the Secret Key securely—it’s only shown once
  5. Note both the Access Key and Secret Key

Step 4: Copy Your Container Image

Now comes the magic—copying your container directly from Google to Scaleway without downloading it locally.

Authenticate and Copy:

powershell

# Set your Scaleway secret key as environment variable (more secure)
$env:SCW_SECRET_KEY = "your-scaleway-secret-key-here"
# Copy the image directly between registries
skopeo copy `
--src-creds="oauth2accesstoken:$(gcloud auth print-access-token)" `
--dest-creds="nologin:$env:SCW_SECRET_KEY" `
docker://europe-west2-docker.pkg.dev/your-project/cloud-run-source-deploy/your-service@sha256:abc123... `
docker://rg.fr-par.scw.cloud/your-namespace/your-service:latest
```
### What's Happening:
- `--src-creds`: Authenticates with Google using your gcloud session
- `--dest-creds`: Authenticates with Scaleway using your API key
- Source URL: Your Google Artifact Registry image
- Destination URL: Your Scaleway Container Registry
The transfer happens directly between registries—your Windows machine just orchestrates it. Even a multi-GB container copies in minutes.
### Verify the Copy:
1. Go to https://console.scaleway.com/registry/namespaces
2. Click your namespace
3. You should see your service image listed with the `latest` tag
## Step 5: Deploy to Scaleway Serverless Containers
### Create a Serverless Container Namespace:
1. Navigate to **Containers** → **Serverless Containers**
2. Click **Create namespace**
3. Choose the **same region** as your Container Registry
4. Give it a name (e.g., `production-services`)
5. Click **Create**
### Deploy Your Container:
1. Click **Create container**
2. **Image source**: Select "Scaleway Container Registry"
3. Choose your namespace and image
4. **Configuration**:
- **Port**: Set to the port your app listens on (usually 8080 for Cloud Run apps)
- **Environment variables**: Copy any env vars from Cloud Run
- **Resources**:
- Memory: Start with what you used in Cloud Run
- vCPU: 0.5-1 vCPU is typical
- **Scaling**:
- **Min scale**: `0` (enables scale-to-zero, just like Cloud Run)
- **Max scale**: Set based on expected traffic (e.g., 10)
5. Click **Deploy container**
### Get Your Endpoint:
After deployment (1-2 minutes), you'll receive an HTTPS endpoint:
```
https://your-container-namespace-xxxxx.functions.fnc.fr-par.scw.cloud

This is your public API endpoint—no API Gateway needed, SSL included for free.

Step 6: Test Your Service

powershell

# Test the endpoint
Invoke-WebRequest -Uri "https://your-container-url.functions.fnc.fr-par.scw.cloud/your-endpoint"

Your application should respond identically to how it did on Cloud Run.

Understanding the Cost Comparison

Google Cloud Run Pricing (Typical):

  • vCPU: $0.00002400/vCPU-second
  • Memory: $0.00000250/GB-second
  • Requests: $0.40 per million
  • Plus: API Gateway, Load Balancer, or other routing costs

Scaleway Serverless Containers:

  • vCPU: €0.00001/vCPU-second (€1.00 per 100k vCPU-s)
  • Memory: €0.000001/GB-second (€0.10 per 100k GB-s)
  • Requests: Free (no per-request charges)
  • HTTPS endpoint: Free (included)
  • Free Tier: 200k vCPU-seconds + 400k GB-seconds per month

Example Calculation:

For an API handling 1 million requests/month, 200ms average response time, 1 vCPU, 2GB memory:

Google Cloud Run:

  • vCPU: 1M × 0.2s × $0.000024 = $4.80
  • Memory: 1M × 0.2s × 2GB × $0.0000025 = $1.00
  • Requests: 1M × $0.0000004 = $0.40
  • Total: ~$6.20/month

Scaleway:

  • vCPU: 200k vCPU-s → Free (within free tier)
  • Memory: 400k GB-s → Free (within free tier)
  • Total: €0.00/month

Even beyond free tiers, Scaleway is typically 30-50% cheaper, with no surprise charges.

Key Differences to Be Aware Of

Similarities (Good News):

✅ Both use Knative under the hood ✅ Both support HTTP, HTTP/2, WebSocket, gRPC ✅ Both scale to zero automatically ✅ Both provide HTTPS endpoints ✅ Both support custom domains ✅ Both integrate with monitoring/logging

Differences:

  • Cold start: Scaleway takes ~2-5 seconds (similar to Cloud Run)
  • Idle timeout: Scaleway scales to zero after 15 minutes (vs. Cloud Run’s varies)
  • Regions: Limited to EU (Paris, Amsterdam, Warsaw) vs. Google’s global presence
  • Ecosystem: Smaller ecosystem than GCP (but rapidly growing)

When Scaleway Makes Sense:

  • ✅ Your primary users/customers are in Europe
  • ✅ GDPR compliance is critical
  • ✅ You want to avoid US jurisdiction over your data
  • ✅ You prefer transparent, predictable pricing
  • ✅ You don’t need GCP-specific services (BigQuery, etc.)

When to Consider Carefully:

  • ⚠️ You need global edge distribution (though you can use CDN)
  • ⚠️ You’re heavily integrated with other GCP services
  • ⚠️ You need GCP’s machine learning services
  • ⚠️ Your customers are primarily in Asia/Americas

Additional Migration Considerations

Environment Variables and Secrets:

Scaleway offers Secret Manager integration. Copy your Cloud Run secrets:

  1. Go to Secret Manager in Scaleway
  2. Create secrets matching your Cloud Run environment variables
  3. Reference them in your container configuration

Custom Domains:

Both platforms support custom domains. In Scaleway:

  1. Go to your container settings
  2. Add custom domain
  3. Update your DNS CNAME to point to Scaleway’s endpoint
  4. SSL is handled automatically

Databases and Storage:

If you’re using Cloud SQL or Cloud Storage:

  • Databases: Consider Scaleway’s Managed PostgreSQL/MySQL or Serverless SQL Database
  • Object Storage: Scaleway Object Storage is S3-compatible
  • Or: Keep using GCP services (cross-cloud is possible, but adds latency)

Monitoring and Logging:

Scaleway provides Cockpit (based on Grafana):

  • Automatic logging for all Serverless Containers
  • Pre-built dashboards
  • Integration with alerts and metrics
  • Similar to Cloud Logging/Monitoring

The Broader Picture: European Digital Sovereignty

This migration isn’t just about cost savings or technical features—it’s about control.

Why EU Companies Are Moving:

  1. Legal Protection: GDPR protections are stronger when data never leaves EU jurisdiction
  2. Political Risk: Reduces exposure to US government data requests under CLOUD Act
  3. Supply Chain Resilience: Diversification away from Big Tech dependency
  4. Supporting European Tech: Strengthens the European cloud ecosystem
  5. Future-Proofing: As digital sovereignty regulations increase, early movers are better positioned

The Economic Argument:

Every euro spent with European cloud providers:

  • Stays in the European economy
  • Supports European jobs and innovation
  • Builds alternatives to US/Chinese tech dominance
  • Strengthens Europe’s strategic autonomy

Conclusion: A Straightforward Path to Sovereignty

Migrating from Google Cloud Run to Scaleway Serverless Containers is technically simple—often taking just a few hours for a typical service. The containers are identical, the pricing is competitive, and the operational model is the same.

But beyond the technical benefits, there’s a strategic argument: as a European company, every infrastructure decision is a choice about where your data lives, who has access to it, and which ecosystem you’re supporting.

Scaleway (and other European cloud providers) aren’t perfect replacements for every GCP use case. But for containerized APIs and web services—which represent the majority of Cloud Run workloads—they’re absolutely production-ready alternatives that keep your infrastructure firmly within European jurisdiction.

In 2026’s geopolitical landscape, that’s not just a nice-to-have—it’s increasingly essential.


Resources

Have you migrated your infrastructure back to Europe? Share your experience in the comments below.

Categories: Uncategorized Tags: , , , ,

Google Calendar Privacy Proxy

https://github.com/infiniteloopltd/Google-Calendar-Redactor-Proxy/

A lightweight Google Cloud Run service that creates privacy-protected calendar feeds from Google Calendar. Share your availability with colleagues without exposing personal appointment details.

The Problem

You want to share your calendar availability with work colleagues, but:

  • You have multiple calendars (work, personal, family) that you need to consolidate
  • Google Calendar’s subscribed calendars (ICS feeds) don’t count toward your Outlook free/busy status
  • You don’t want to expose personal appointment details to work contacts
  • Outlook’s native calendar sharing only works with Exchange/Microsoft 365 calendars, not external ICS subscriptions

This service solves that problem by creating a privacy-filtered calendar feed that Outlook can subscribe to, showing you as “Busy” during your appointments without revealing what those appointments are.

How It Works

Google Calendar → This Service → Privacy-Protected ICS Feed → Outlook
   (full details)    (redaction)     (busy blocks only)      (subscription)

The service:

  1. Fetches your Google Calendar ICS feed using the private URL
  2. Strips out all identifying information (titles, descriptions, locations, attendees)
  3. Replaces event summaries with “Busy”
  4. Preserves all timing information (when you’re busy/free)
  5. Returns a sanitized ICS feed that Outlook can subscribe to

Use Cases

  • Multiple calendar consolidation: Combine work, personal, and family calendars into one availability view
  • Privacy-protected sharing: Share when you’re busy without sharing what you’re doing
  • Cross-platform calendaring: Bridge Google Calendar into Outlook environments
  • Professional boundaries: Keep personal life private while showing accurate availability

Quick Start

1. Get Your Google Calendar Private URL

  1. Open Google Calendar
  2. Click the ⚙️ Settings icon → Settings
  3. Select your calendar from the left sidebar
  4. Scroll to “Integrate calendar”
  5. Copy the “Secret address in iCal format” URL

Your URL will look like:

https://calendar.google.com/calendar/ical/info%40infiniteloop.ie/private-xxxxxxx/basic.ics

2. Deploy the Service

# Edit deploy.bat and set your PROJECT_ID
deploy.bat

# Or deploy manually
gcloud run deploy calendar-proxy --source . --platform managed --region europe-west1 --allow-unauthenticated

You’ll get a service URL like: https://calendar-proxy-xxxxxxxxxx-ew.a.run.app

3. Construct Your Privacy-Protected Feed URL

From your Google Calendar URL:

https://calendar.google.com/calendar/ical/info%xxxxx.xxx/private-xxxxxxx/basic.ics

Extract:

  • calendarIdinfo@infiniteloop.ie (URL decoded)
  • privateKeyxxxxxxxxxx (just the key, without “private-” prefix)

Build your proxy URL:

https://calendar-proxy-xxxxxxxxxx-ew.a.run.app/calendar?calendarId=info@infiniteloop.ie&privateKey=xxxxxxx

4. Subscribe in Outlook

Outlook Desktop / Web

  1. Open Outlook
  2. Go to Calendar
  3. Click Add Calendar → Subscribe from web
  4. Paste your proxy URL
  5. Give it a name (e.g., “My Availability”)
  6. Click Import

Outlook will now show:

  • ✅ Blocked time during your appointments
  • ✅ “Busy” status for those times
  • ❌ No details about what the appointments are

What Gets Redacted

The service removes all identifying information:

Original ICS PropertyResult
SUMMARY: (event title)→ "Busy"
DESCRIPTION: (event details)→ Removed
LOCATION: (where)→ Removed
ORGANIZER: (who created it)→ Removed
ATTENDEE: (participants)→ Removed
URL: (meeting links)→ Removed
ATTACH: (attachments)→ Removed
CLASS: (privacy)→ Set to PRIVATE

What Gets Preserved

All timing and scheduling information remains intact:

  • ✅ Event start times (DTSTART)
  • ✅ Event end times (DTEND)
  • ✅ Event duration
  • ✅ Recurring events (RRULE)
  • ✅ Exception dates (EXDATE)
  • ✅ Event status (confirmed, tentative, cancelled)
  • ✅ Time zones
  • ✅ All-day events
  • ✅ Unique identifiers (UID)

Technical Details

Stack: .NET 8 / ASP.NET Core Minimal API
Hosting: Google Cloud Run (serverless)
Cost: Virtually free for personal use (Cloud Run free tier: 2M requests/month)
Latency: ~200-500ms per request (fetches from Google, processes, returns)

API Endpoint

GET /calendar?calendarId={id}&privateKey={key}

Parameters:

  • calendarId (required): Your Google Calendar ID (usually your email)
  • privateKey (required): The private key from your Google Calendar ICS URL

Response:

  • Content-Type: text/calendar; charset=utf-8
  • Body: Privacy-redacted ICS feed

Local Development

# Run locally
dotnet run

# Test
curl "http://localhost:8080/calendar?calendarId=test@example.com&privateKey=abc123"

Deployment

Prerequisites

Deploy

# Option 1: Use the batch file
deploy.bat

# Option 2: Manual deployment
gcloud run deploy calendar-proxy ^
  --source . ^
  --platform managed ^
  --region europe-west1 ^
  --allow-unauthenticated ^
  --memory 512Mi

The --allow-unauthenticated flag is required so that Outlook can fetch your calendar without authentication. Your calendar data is still protected by the private key in the URL.

Security & Privacy

Is This Secure?

Yes, with caveats:

✅ Your calendar data is already protected by Google’s private key mechanism
✅ No data is stored – the service is stateless and doesn’t log calendar contents
✅ HTTPS encrypted – All traffic is encrypted in transit
✅ Minimal attack surface – Simple pass-through service with redaction

⚠️ Considerations:

  • Your private key is in the URL you share (same as Google’s original ICS URL)
  • Anyone with your proxy URL can see your busy/free times (but not details)
  • The service runs as --allow-unauthenticated so Outlook can fetch it
  • If you need stricter access control, consider adding authentication

Privacy Features

  • Strips all personally identifying information
  • Marks all events as CLASS:PRIVATE
  • No logging of calendar contents
  • No data persistence
  • Stateless operation

Recommendations

  • Don’t share your proxy URL publicly
  • Treat it like a password – it grants access to your availability
  • Regenerate your Google Calendar private key if compromised
  • Monitor your Cloud Run logs for unexpected access patterns

Cost Estimation

Google Cloud Run pricing (as of 2025):

  • Free tier: 2M requests/month, 360,000 GB-seconds/month
  • Typical calendar: Refreshes every 30-60 minutes
  • Monthly cost: $0 for personal use (well within free tier)

Even with 10 people subscribing to your calendar refreshing every 30 minutes:

  • ~14,400 requests/month
  • ~$0.00 cost

Troubleshooting

“404 Not Found” when subscribing in Outlook

  • Verify your service is deployed: gcloud run services list
  • Check your URL is correctly formatted
  • Ensure --allow-unauthenticated is set

“Invalid calendar” error

  • Verify your Google Calendar private key is correct
  • Test the URL directly in a browser first
  • Check that your calendarId doesn’t have URL encoding issues

Events not showing up

  • Google Calendar ICS feeds can take 12-24 hours to reflect changes
  • Try re-subscribing to the calendar in Outlook
  • Verify the original Google Calendar ICS URL works

Deployment fails

# Ensure you're authenticated
gcloud auth login

# Set your project
gcloud config set project YOUR_PROJECT_ID

# Enable required APIs
gcloud services enable run.googleapis.com
gcloud services enable cloudbuild.googleapis.com

Limitations

  • Refresh rate: Calendar clients typically refresh ICS feeds every 30-60 minutes (not real-time)
  • Google’s ICS feed: Updates can take up to 24 hours to reflect in the ICS feed
  • Authentication: No built-in authentication (relies on URL secrecy)
  • Multi-calendar: Requires one proxy URL per Google Calendar

Alternatives Considered

SolutionProsCons
Native Outlook calendar sharingBuilt-in, real-timeOnly works with Exchange calendars
Calendly/BookingsProfessional, feature-richMonthly cost, overkill for simple availability
Manual sync (Zapier/Power Automate)WorksComplex setup, ongoing maintenance
This solutionSimple, free, privacy-focusedRelies on ICS feed delays

Contributing

Contributions welcome! Areas for enhancement:

  •  Add basic authentication support
  •  Support multiple calendars in one feed
  •  Caching layer to reduce Google Calendar API calls
  •  Health check endpoint
  •  Metrics/monitoring
  •  Custom “Busy” text per calendar

License

MIT License – free to use, modify, and distribute.

Author

Created by Infinite Loop Development Ltd to solve a real business need for calendar privacy across platforms. https://github.com/infiniteloopltd/Google-Calendar-Redactor-Proxy/