Home > Uncategorized > Building a Chinese Vehicle Database from MIIT Public Data

Building a Chinese Vehicle Database from MIIT Public Data

If you’ve ever tried to decode a Chinese VIN or look up vehicle specifications for a car manufactured in China, you’ll know the data is hard to come by. Commercial providers charge significant fees, the official sources are in Chinese, and the coverage is often incomplete. I recently spent several weeks solving this problem from first principles — scraping, parsing, and enriching the entire MIIT vehicle type approval database — and the result is now available as a data download.

Here’s how it was built, what’s in it, and why it might be useful to you.

What is the MIIT Gonggao system?

In China, every vehicle model that can be legally manufactured and sold must first be approved by the Ministry of Industry and Information Technology (MIIT). These approvals are published in batches called 公告 (Gonggao — literally “announcements”) through the Road Motor Vehicle Manufacturers and Products system (道路机动车辆生产企业及产品).

Each Gonggao entry represents a single approved vehicle variant, with a unique model code such as SGM7102JBA1 (a Buick sedan made by SAIC-GM) or TSL7000BEVAR4 (a Tesla Model Y built in Shanghai). The MIIT database is the authoritative registry of every vehicle permitted for manufacture in China — domestic brands, international joint ventures, electric vehicles, commercial trucks, buses, motorcycles, everything.

The data is public, but it’s fragmented across hundreds of batch announcements, buried in a query interface that’s entirely in Chinese, and has no bulk download option.

How the database was built

The MIIT provides a query API at service.miit-eidc.org.cn that powers their web interface. By reverse engineering the API calls made by the search page, it’s possible to query programmatically.

The collection process ran in two phases:

Phase 1 started with an existing seed list of known Gonggao codes and fetched their full technical specifications — dimensions, weight, engine details, VIN prefix, emission standard and so on.

Phase 2 ran a systematic scan through all two-letter prefix combinations (AA through ZZ), effectively enumerating the entire active catalogue. This discovered tens of thousands of models not in the seed list. Prefixes like CA (FAW/Jiefang) returned over 8,000 records alone.

The final collection covers 227,000+ vehicle records across all vehicle categories.

English make and model enrichment

The MIIT data is entirely in Chinese. Brand names like 别克(BUICK)牌 are recognisable to a Western reader, but most domestic Chinese brands are not. Vehicle types are bureaucratic category names (纯电动运动型多用途乘用车 = pure electric SUV) rather than marketing model names.

To make the data usable for an international audience, every record was passed through the Claude AI API using the Batch API at roughly $2.50 for the full 191,000-record run. The prompt provided the Chinese brand name, vehicle type, manufacturer name, dimensions, wheelbase and VIN prefix, and asked for the English make, model name and a confidence level.

The results are stored in EnglishMake, EnglishModel and EnglishConfidence fields. For well-known brands the accuracy is very high — Tesla, Toyota, BMW, BYD, Volkswagen, Geely and hundreds of others are correctly identified with high confidence. For obscure domestic commercial vehicle manufacturers, the model name may be generic (e.g. “Heavy Duty Tipper Truck”) but the make is generally correct.

VIN prefix data

One of the most useful aspects of the dataset is the IdentificationCode field, which contains the VIN prefix or prefixes assigned to each approved model.

A Chinese VIN follows the global standard: the first three characters are the World Manufacturer Identifier (WMI), and characters 4–8 are the Vehicle Descriptor Section (VDS) which identifies the specific model variant. The MIIT records the approved VIN prefix for each Gonggao entry, typically eight characters followed by placeholder characters (e.g. LRW3E7FA×××××××××).

This means the database can function as a Chinese VIN decoder: given any VIN from a Chinese-market vehicle, match the first 8–11 characters against the prefix table to identify the make, model, fuel type, dimensions, emission standard and production details.

For example:

  • LRW3E7FA → Tesla Model Y, 特斯拉(上海)有限公司, pure electric SUV, Shanghai
  • LSGEP83A → Buick sedan, 上汽通用(沈阳)北盛汽车有限公司, 999cc petrol engine, National VI emissions
  • LFV2A2AD → Volkswagen Lavida, 上汽大众汽车有限公司, 1.5L petrol

What’s in the download

The dataset is a single UTF-8 CSV file containing 227,000+ rows with the following fields:

Model — Gonggao model code — the official Chinese type approval identifier

Make — Manufacturer name (Chinese)

Brand — Brand/marque name (Chinese)

Type — Vehicle category (Chinese)

EnglishMake — Manufacturer name in English (AI-resolved)

EnglishModel — Model name in English (AI-resolved)

EnglishConfidence — Confidence level of English translation: high / medium / low

CompanyName — Full legal name of the manufacturing entity

EnterpriseAddress — Factory and production address

VehicleLength / VehicleWidth / VehicleHeight — Overall dimensions (mm)

Wheelbase — Wheelbase (mm)

KerbWeight — Kerb weight (kg)

TotalMass — Gross vehicle weight (kg)

MaxSpeed — Maximum speed (km/h)

PassengerCapacity — Approved seating capacity

FuelType — Fuel type (gasoline, electric, diesel, hybrid, hydrogen, etc.)

EmissionStandard — Emission standard (National III through National VI)

TireSpecs — Tyre size specifications

IdentificationCode — VIN prefix(es) for vehicle identification

Who is this for?

Automotive data businesses building VIN lookup APIs, vehicle history services, or parts fitment databases for the Chinese market. The VIN prefix data in particular is expensive to license commercially — this dataset provides a solid foundation at a fraction of the cost.

Researchers and analysts studying the Chinese automotive market — EV adoption rates, manufacturer market share, emission standard transitions, the rise of domestic brands versus joint ventures.

Parts and aftermarket businesses who need to match vehicle specifications to the correct components for Chinese-market vehicles.

Developers building applications for the Chinese automotive sector who need a structured, machine-readable vehicle reference database.

Limitations

The dataset reflects the MIIT active catalogue at the time of collection. Discontinued models that have been removed from the active catalogue are not included. The English make/model enrichment is AI-generated and should be treated as indicative for low-confidence records — the underlying Chinese fields are always authoritative. VIN prefix matching identifies the approved model family but may return multiple variants for a given prefix, as multiple trim levels can share a VIN prefix range.

Get the data

The full dataset is available for download at payhip.com/b/3S6PE.

The CSV is compatible with Excel, Google Sheets, Python (pandas), R, SQL Server, MySQL and any standard data tool.

Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a comment