Archive

Archive for March, 2026

Cracking the Code: Estimating a Car’s Age from Its Argentine Licence Plate

Technical Deep-dive · Vehicle Data

How sequential plate issuance, a little combinatorics, and 200,000 training records let you estimate a registration year from seven characters.By Infinite Loop Development  ·  March 2026

Try it live

The techniques described in this post are implemented in ar.matriculaapi.com — an API that returns full vehicle details for any Argentine licence plate, including an estimated registration year for plates where the exact date is unknown.

Every Argentine licence plate encodes its own approximate birth year. You just have to know how to read it.

Argentina has issued plates in two distinct sequential formats over the past three decades. Because they are allocated in strict national order, a plate’s position in that sequence maps — with surprising precision — to the year the vehicle was registered. This post explains the technique: the encoding, the boundary estimation, and the confidence model.


Two Formats, One Principle

Argentina uses two plate formats, each covering a different era.

OZY040

Pre-Mercosur · ABC123≈ 1990 – 2016

AC875MD

Mercosur · AB123CD2016 – present

The Pre-Mercosur format (ABC123) consists of three letters followed by three digits. It was Argentina’s standard from the early 1990s through approximately 2016, when the country transitioned to the regional Mercosur standard.

The Mercosur format (AB123CD) uses two letters, three digits, then two more letters. It began with AA000AA and has been incrementing steadily ever since, shared across the Mercosur bloc — which is why plates from Brazil, Uruguay, and Paraguay share the same structure.

The critical insight is that both formats were issued sequentially at a national level. A plate allocated in 2019 will always have a higher sequence number than one from 2018. This makes year estimation a matter of finding where a given plate falls in the sequence.


Encoding a Plate to a Single Integer

To compare plates across their sequence, we convert each plate to a single integer using mixed-radix encoding — the same idea as a number system that switches base partway through.

Mercosur encoding

The Mercosur plate AB123CD has four alphabetic components (each 0–25) and one numeric component (0–999). Treating letters as base-26 and the number as base-1000:

Python

def encode_mercosur(plate: str) -> int:
    """
    Encode an AB123CD Mercosur plate to a sequence integer.
    AA000AA = 0, AA000AB = 1, ... AZ999ZZ = 17,575,999, BA000AA = 17,576,000 ...
    """
    p = plate.upper().replace(" ", "").replace("-", "")
    assert len(p) == 7, "Mercosur plates are 7 characters"

    l1 = ord(p[0]) - ord('A')  # 0-25
    l2 = ord(p[1]) - ord('A')  # 0-25
    n  = int(p[2:5])             # 0-999
    l3 = ord(p[5]) - ord('A')  # 0-25
    l4 = ord(p[6]) - ord('A')  # 0-25

    return (l1 * 26 + l2) * 676_000 \
         + n              * 676     \
         + l3             * 26      \
         + l4

A few spot checks: AA000AA → 0. AA000AB → 1. AA001AA → 676. AB000AA → 676,000. The encoding is monotonically increasing — every plate that comes later in the alphabet maps to a strictly higher integer.

Pre-Mercosur encoding

The earlier ABC123 format encodes similarly, but with three leading letters instead of two:

Python

def encode_pre_mercosur(plate: str) -> int:
    """
    Encode an ABC123 pre-Mercosur plate to a sequence integer.
    AAA000 = 0, AAA001 = 1, ... ZZZ999 = 17,575,999
    """
    p = plate.upper().replace(" ", "").replace("-", "")
    assert len(p) == 6, "Pre-Mercosur plates are 6 characters"

    l1 = ord(p[0]) - ord('A')  # 0-25
    l2 = ord(p[1]) - ord('A')  # 0-25
    l3 = ord(p[2]) - ord('A')  # 0-25
    n  = int(p[3:])             # 0-999

    return (l1 * 676 + l2 * 26 + l3) * 1000 + n

Learning the Boundaries from Real Data

Encoding gives us integers. To turn those integers into years, we need to know which sequence ranges correspond to which years. This is where training data comes in.

Using over 200,000 Argentine plates with known registration years, we computed the mean sequence number per year. This gives a representative centroid for each year’s plate population:

The year boundary cut points are simply the midpoints between adjacent means. This produces clean, non-overlapping ranges — every sequence integer maps to exactly one year:

Seq rangeEstimated yearCut point derivation
< 671,6192016(294k + 1,048k) / 2
671,619 – 1,467,1702017(1,048k + 1,885k) / 2
1,467,170 – 2,207,1362018(1,885k + 2,528k) / 2
2,207,136 – 2,719,7202019(2,528k + 2,910k) / 2
2,719,720 – 3,088,3062020(2,910k + 3,265k) / 2
3,088,306 – 3,472,7692021(3,265k + 3,679k) / 2
3,472,769 – 3,888,5342022(3,679k + 4,097k) / 2
3,888,534 – 4,307,0052023(4,097k + 4,516k) / 2
4,307,005 – 4,773,8592024(4,516k + 5,030k) / 2
≥ 4,773,8592025(open upper bound)

The naive approach used min/max ranges per year — but these overlap badly. Using the mean and splitting at midpoints gives clean, unambiguous boundaries.


The Full Estimator in Python

Python

import re
from dataclasses import dataclass
from typing import Optional

# ── Mercosur year boundaries (midpoints between annual means) ──────────────
# Derived from 200k+ Argentine plates with known registration years.
# Each January, recalculate the mean for the new year and add one entry:
#   new_cut = (mean_prev_year + mean_new_year) / 2
# ──────────────────────────────────────────────────────────────────────────
MERCOSUR_CUTS = [
    (671_619,   2016),
    (1_467_170, 2017),
    (2_207_136, 2018),
    (2_719_720, 2019),
    (3_088_306, 2020),
    (3_472_769, 2021),
    (3_888_534, 2022),
    (4_307_005, 2023),
    (4_773_859, 2024),
    (float('inf'), 2025),
    # Add 2026 here in January 2027:
    # (cut_2025_2026, 2025), (float('inf'), 2026),
]

# Pre-Mercosur: two-letter prefix → dominant year
# Derived from same training dataset. Stable — no new plates since ~2016.
PRE_MERCOSUR_PREFIX = {
    # A block 1994-1996
    'AA': lambda n: 1994 if n < 500 else 1995,
    'AB': 1995, 'AC': 1995, 'AD': 1995, 'AE': 1995, 'AF': 1995,
    'AG': 1995, 'AH': 1995, 'AI': 1995, 'AJ': 1995, 'AK': 1995,
    'AL': 1995, 'AM': 1995, 'AN': 1995,
    'AO': lambda n: 1995 if n < 400 else 1996,
    'AP': 1996, 'AQ': 1996, 'AR': 1996, 'AS': 1996, 'AT': 1996,
    'AU': 1996, 'AV': 1996, 'AW': 1996, 'AX': 1996, 'AY': 1996, 'AZ': 1996,
    # B block 1996-1998
    'BA': 1996, 'BB': 1996,
    'BC': lambda n: 1996 if n < 300 else 1997,
    'BD': 1997, 'BE': 1997, 'BF': 1997, 'BG': 1997, 'BH': 1997,
    'BI': 1997, 'BJ': 1997, 'BK': 1997, 'BL': 1997, 'BM': 1997,
    'BN': 1997, 'BO': 1997,
    'BP': lambda n: 1997 if n < 500 else 1998,
    'BQ': 1998, 'BR': 1998, 'BS': 1998, 'BT': 1998, 'BU': 1998,
    'BV': 1998, 'BW': 1998, 'BX': 1998, 'BY': 1998, 'BZ': 1998,
    # C-P blocks follow same pattern; see full table at ar.matriculaapi.com
    # ... (abbreviated for readability)
}

PRE_MERCOSUR_RE = re.compile(r'^[A-Z]{3}\d{3}$')
MERCOSUR_RE     = re.compile(r'^[A-Z]{2}\d{3}[A-Z]{2}$')


@dataclass
class PlateEstimate:
    input_plate:    str
    format:         str         # 'MERCOSUR' | 'PRE-MERCOSUR' | 'MERCOSUR-IMPORT' | 'UNKNOWN'
    estimated_year: Optional[int]
    confidence:     str         # 'HIGH' | 'MEDIUM' | 'LOW'
    sequence_num:   Optional[int]
    notes:          Optional[str] = None


def estimate_year(plate: str) -> PlateEstimate:
    p = plate.upper().replace(" ", "").replace("-", "")

    # ── Mercosur AB123CD ─────────────────────────────────────────────────────
    if MERCOSUR_RE.match(p):
        seq = encode_mercosur(p)

        if seq < 7:
            return PlateEstimate(plate, 'MERCOSUR-IMPORT', None, 'LOW', seq,
                notes='Sequence predates Argentine rollout; likely import')

        year = next(y for cut, y in MERCOSUR_CUTS if seq < cut)

        # Confidence: MEDIUM if within 5% of the nearest boundary
        confidence = 'HIGH'
        cuts = [c for c, _ in MERCOSUR_CUTS[:-1]]
        gaps = [MERCOSUR_CUTS[i+1][0] - MERCOSUR_CUTS[i][0]
                for i in range(len(cuts))]
        for cut, gap in zip(cuts, gaps):
            if abs(seq - cut) < gap * 0.05:
                confidence = 'MEDIUM'
                break

        return PlateEstimate(plate, 'MERCOSUR', year, confidence, seq)

    # ── Pre-Mercosur ABC123 ───────────────────────────────────────────────────
    if PRE_MERCOSUR_RE.match(p):
        prefix = p[:2]
        num    = int(p[3:])
        entry  = PRE_MERCOSUR_PREFIX.get(prefix)

        if entry is None:
            return PlateEstimate(plate, 'PRE-MERCOSUR', None, 'LOW', None,
                notes=f'Prefix {prefix!r} not in training data')

        year = entry(num) if callable(entry) else entry

        confidence = ('LOW'    if prefix[0] in 'RSTUV'
                 else 'MEDIUM' if prefix <= 'DZ'
                 else 'HIGH')

        return PlateEstimate(plate, 'PRE-MERCOSUR', year, confidence,
                              encode_pre_mercosur(p))

    return PlateEstimate(plate, 'UNKNOWN', None, 'LOW', None,
        notes='Does not match any known Argentine plate format')


# ── Quick demo ────────────────────────────────────────────────────────────
for test in ['AC601QQ', 'AC875MD', 'OZY040', 'GDA123', 'AB 172UC']:
    r = estimate_year(test)
    print(f"{r.input_plate:10} → {r.estimated_year}  [{r.confidence}]  {r.format}")

Why Pre-Mercosur Is Trickier

You might expect pre-Mercosur plates to work the same way as Mercosur — encode to an integer, find the range. But the raw data tells a different story: the sequence number ranges for adjacent years overlap almost completely.

The reason is that Argentina’s provinces received independent plate allocations. Buenos Aires, Córdoba, and Mendoza were all issuing plates simultaneously from their own provincial ranges. A national sequence number alone therefore can’t pinpoint a year — the number space was being consumed by 23 provinces in parallel.

What does cluster cleanly by year is the two-letter prefix. The national allocation advanced through the alphabet over time, so GA–GT plates are overwhelmingly from 2007, HT–HZ from 2009, and so on. The training data confirms this: for the densest prefixes, over 90% of plates share a single dominant year.

At prefix boundaries — AOBCGTHS, and others — the numeric suffix provides a secondary signal. A plate in the GT prefix with a low number (GT100) likely precedes one with a high number (GT850) by several months, straddling a year boundary.


Confidence and Outliers

The estimator returns one of three confidence levels:

HIGH  The plate sits comfortably within a year band — more than 5% away from any boundary. For Mercosur plates from 2017 onwards (where training density is highest) this is the common case.

MEDIUM  The plate is within 5% of a year cut point, or belongs to a pre-Mercosur prefix with moderate training data. The estimate is most likely correct but the adjacent year is plausible — late registrations, delivery delays, and data entry lag all introduce real ambiguity near boundaries.

LOW  The prefix is sparse (old R/S/T/U/V plates, typically pre-1995) or the plate is flagged as a likely import. Imports arise when a Mercosur-format plate has a sequence number that predates Argentina’s 2016 rollout — these vehicles were most likely registered in Brazil, Uruguay, or Paraguay and subsequently imported.


Keeping It Fresh

The Mercosur boundaries need updating once a year. The process is three steps:

1. Collect a fresh batch of plates with known years.
2. Compute the mean sequence number for the new year’s cohort.
3. Set the new cut point as (mean_prev + mean_new) / 2 and append it to the table.

No existing cut points change — you’re only ever adding one new entry to the bottom of the list. The pre-Mercosur prefix table is stable and needs no maintenance, since no new plates in that format have been issued since approximately 2016.


Beyond Year Estimation

Year estimation is useful on its own — for insurance quoting, fleet valuation, or fraud detection — but it becomes much more powerful when combined with a full vehicle lookup.

The Argentine Matricula API takes any plate (pre-Mercosur or Mercosur) and returns the complete vehicle record: make, model, year, engine, and more. For records where the registration year is missing or uncertain, the sequence-based estimate described here fills the gap automatically, with a confidence flag so downstream consumers know how much to trust it.

Argentine vehicle lookup API

Try a plate lookup at ar.matriculaapi.com. The API is available for integration via vehicleapi.com and supports batch queries, JSON responses, and per-format confidence scoring.

© 2026 Infinite Loop Development Ltd  

Categories: Uncategorized

The Hidden Cost of ORDER BY NEWID()

Fetching a random row from a table is a surprisingly common requirement — random banner ads, sample data, rotating API credentials. The instinctive solution in SQL Server is elegant-looking but conceals a serious performance trap.

-- Looks innocent. Isn't.
SELECT TOP 1 * FROM LARGE_TABLE ORDER BY NEWID()

What SQL Server actually does

NEWID() generates a fresh GUID for every single row in the table. SQL Server must then sort the entire result set by those GUIDs before it can hand back the top one. On a table with a million rows you are generating a million GUIDs, sorting a million rows, and discarding 999,999 of them.

The problem: On large tables, ORDER BY NEWID() performs a full table scan and a full sort — O(n log n) work — regardless of how many rows you need. It cannot use any index for ordering.

A faster alternative: seek, don’t sort

The key insight is to convert the “random sort” into a “random seek”. If we can generate a random Id value cheaply and then let the clustered index do the work, we avoid scanning the table entirely.

DECLARE @Min INT = (SELECT MIN(Id) FROM LARGE_TABLE)
DECLARE @Max INT = (SELECT MAX(Id) FROM LARGE_TABLE)
SELECT TOP 1 *
FROM LARGE_TABLE
WHERE Id >= @Min + ABS(CHECKSUM(NEWID()) % (@Max - @Min + 1))
ORDER BY Id ASC

MAX(Id) and MIN(Id) are single index seeks on the primary key. CHECKSUM(NEWID()) generates a random integer without sorting anything. The WHERE Id >= clause then performs a single index seek from that point forward, and ORDER BY Id ASC TOP 1 picks up the very next row.

The result: Two index seeks to get the range, one index seek to find the row. Constant time regardless of table size.

Performance at a glance

ApproachReadsSortScales with table size?
ORDER BY NEWID()Full scanFull sortO(n log n)
CHECKSUM seek3 index seeksNoneO(1)

Three caveats to know

1. Id gaps cause mild bias. If rows have been deleted, gaps in the Id sequence mean rows immediately after a gap are slightly more likely to be selected. For most use cases — sampling, rotation, A/B testing — this is an acceptable trade-off.

2. Ids may not start at 1. This is why we use @Min rather than hardcoding zero. If your identity seed started at 1000, NEWID() % MAX(Id) would generate values 0–999, which would never match any row and you’d always get the first row in the table.

3. CHECKSUM can return INT_MIN. ABS(INT_MIN) overflows back to negative in SQL Server. The fix is to apply the modulo before the ABS, keeping the intermediate value safely within range.

When you don’t need randomness at all

For round-robin rotation across a fixed set of rows — such as alternating between API credentials or cookie sessions — true randomness is unnecessary overhead. A deterministic slot based on the current second is even cheaper:

-- Rotates across N accounts, one per second, no writes required
WHERE Slot = DATEPART(SECOND, GETUTCDATE()) % TotalAccounts

This resolves to a constant integer comparison — effectively a single index seek — and scales to any number of accounts automatically. No tracking table, no writes, no contention.

The takeaway: whenever you reach for ORDER BY NEWID(), ask whether you actually need true randomness or just approximate distribution. In most production scenarios, a cheap seek beats an expensive sort by several orders of magnitude.

Enrich Your Qualtrics Surveys with Real-Time Respondent Data Using AvatarAPI

Qualtrics is excellent at capturing what respondents tell you. But what if you could automatically fill in what you already know — or can discover — the moment they enter their email address?

AvatarAPI resolves an email address into rich profile data in real time: a profile photo, full name, city, country, and the social network behind it. By embedding this lookup directly into your Qualtrics survey flow, you collect more information about each respondent without asking a single extra question.


What Data Does AvatarAPI Return?

When you pass an email address to the API, it returns the following fields — all of which can be mapped into Qualtrics Embedded Data and used anywhere in your survey:

FieldDescription
ImageURL to the respondent’s profile photo
NameResolved full name
CityCity of residence
CountryCountry code
ValidWhether the email address is real and reachable
IsDefaultWhether the avatar is a fallback/generic image
Source.NameThe social network the data came from
RawDataThe complete JSON payload

Watch the Video Walkthrough

Before diving into the written steps, watch this complete tutorial — from configuring the Web Service element to rendering the avatar photo on a results page:


Step-by-Step Integration Guide

You can either follow these steps from scratch, or import the ready-made AvatarAPI.qsf template file directly into Qualtrics (see Step 8).


Step 1 — Get Your AvatarAPI Credentials

Sign up at avatarapi.com to obtain a username and password. A free demo account is available for evaluation — use the credentials demo / demo to test before going live.

The API endpoint you will call is:

https://avatarapi.com/v2/api.aspx

Step 2 — Create an Email Capture Question

In your Qualtrics survey, add a Text Entry question with a Single Line selector. This is where respondents will enter their email address.

Note the Question ID assigned to this question (e.g. QID3) — you will reference it when configuring the Web Service. You can find the QID by opening the question’s advanced options.

Tip: Add email format validation via Add Validation → Content Validation → Email to ensure the value passed to the API is always well-formed.


Step 3 — Add a Web Service Element to Your Survey Flow

Navigate to Survey Flow (the flow icon in the left sidebar). Click Add a New Element Here and choose Web Service. Position this element after the block containing your email question and before your results block.

Configure it as follows:

  • URL: https://avatarapi.com/v2/api.aspx
  • Method: POST
  • Content-Type: application/json

Step 4 — Set the Request Body Parameters

Under Set Request Parameters, switch to Specify Body Params and add these three key-value pairs:

{
"username": "your_username",
"password": "your_password",
"email": "${q://QID3/ChoiceTextEntryValue}"
}

The Qualtrics piped text expression ${q://QID3/ChoiceTextEntryValue} dynamically inserts whatever email the respondent typed. Replace QID3 with the actual QID of your email question if it differs.


Step 5 — Map the API Response to Embedded Data Fields

Scroll down to Map Fields from Response. Add one row for each field you want to capture. The From Response column is the JSON key returned by AvatarAPI; the To Field column is the Embedded Data variable name.

From Response (JSON key)To Field (Embedded Data)
ImageImage
NameName
ValidValid
CityCity
CountryCountry
IsDefaultIsDefault
Source.NameSource.Name
RawDataRawData

Note: Qualtrics stores these variables automatically — you don’t need to pre-declare them as Embedded Data elsewhere in the flow, though doing so in the survey flow header keeps things organised.


Step 6 — Display the Avatar Photo on a Results Page

Add a Descriptive Text / Graphic question in a block placed after the Web Service call in your flow.

In the rich-text editor, switch to the HTML source view and paste this snippet:

<img
src="${e://Field/Image}"
alt="Profile Picture"
style="width:100px; height:100px; border-radius:50%;"
/>

The expression ${e://Field/Image} inserts the profile photo URL at runtime. The border-radius: 50% gives it a circular crop for a polished appearance.

You can display other fields using the same pattern:

Name: ${e://Field/Name}
City: ${e://Field/City}
Country: ${e://Field/Country}
Source: ${e://Field/Source.Name}

Step 7 — Test with the Demo Account

Before going live, test the integration using the demo credentials. Enter a well-known email address (such as a Gmail address you know has a Google profile photo) to verify the image and data return correctly.

After a test submission, check the Survey Data tab — all mapped fields (Image, Name, City, Country, etc.) should appear as columns alongside your standard question responses.

Rate limits & production use: The demo credentials are shared and rate-limited. Swap in your own account credentials before publishing a live survey to ensure reliable performance.


Step 8 — Import the Ready-Made QSF Template

Rather than building from scratch, you can import the AvatarAPI.qsf file directly into Qualtrics. This gives you a pre-configured survey with the email question, Web Service flow, and image display block already set up.

To import: go to Create a new project → Survey → Import a QSF file and upload AvatarAPI.qsf. Then update the Web Service credentials to your own username and password, and you’re ready to publish.


How the Survey Flow Works

Once configured, your survey flow has this simple three-part structure:

  1. Block — Respondent enters their email address
  2. Web Service — Silent POST to avatarapi.com/v2/api.aspx; response fields mapped to Embedded Data
  3. Block — Results page displays the avatar photo and enriched profile data

The respondent experiences a seamless survey: they enter their email on page one, the API call fires silently between pages, and they see a personalised result — including their own profile photo — on page two.


Practical Use Cases

Lead enrichment surveys — Capture a prospect’s email and automatically resolve their name, city, and country without asking. Append this data to your CRM export from Qualtrics.

Event registration flows — Display the registrant’s photo back to them as a confirmation step, increasing engagement and reducing drop-off.

Email validation checkpoints — Use the Valid flag in a branch logic condition to route respondents with unresolvable addresses to a correction screen or alternative path.

Research panels — Enrich responses with geographic signals without asking respondents to self-report location, reducing survey length and improving data quality.


Get Started

  • API documentation & sign-up: avatarapi.com
  • API endpoint: https://avatarapi.com/v2/api.aspx
  • Demo credentials: username demo / password demo
  • Video tutorial: Watch on YouTube