Serverless RFID Tracking API Platform

Fully serverless API platform on AWS ingesting RFID reads from 2,500 antennas across 350+ facilities in 50+ countries into Aurora Serverless v2, with sub-second Multi-AZ failover, 200k req/5min capacity, and headroom to scale 10x — at ~$260–$360/month.

Overview

This is the API platform that sits between 2,500 RFID antennas deployed across 350+ facilities in 50+ countries and the downstream consumers who need to query that data — operational systems, reporting tools, partner integrations. The antennas write into an upstream EPCIS event store; that event store fans out to multiple targets, one of which is this platform.

The purpose of this layer is to take a high-volume, append-only event stream and turn it into a curated, deduplicated, enriched, query-optimised database that consumers can hit with a simple authenticated REST call. The upstream EPCIS keeps the immutable raw log. This platform keeps the version that's actually useful to applications — every raw read augmented with the business context that makes it queryable: asset identifier, origin and destination metadata, facility name, country, operator, read-point type.

Three production constraints shaped every decision:

High uptime on the ingest endpoint. The upstream sender retries for ~10 minutes and then marks events as permanently failed. Anything below ~99.9% uptime creates manual operational debt for someone else's team.
Volume headroom. 100,000 reads/day today, with realistic growth to millions/day as the deployment footprint expands. The platform was sized for 10x current peak with no architectural changes.
EU data residency. Reads include facility location data subject to GDPR. All processing, storage, and backups stay inside Frankfurt. No cross-region replication.

Every tier of the platform is serverless and elastic. API Gateway, Lambda, SQS, and Aurora Serverless v2 each scale independently in response to load. There are no EC2 instances on the data path, no clusters to resize, no manual capacity planning. The same configuration that handles 100k events/day today handles 1M events/day with the same code, the same templates, and the same operational model.

Architecture

The platform runs in a single AWS region — eu-central-1 (Frankfurt) — across three availability zones. The Aurora cluster runs Multi-AZ (writer + reader in separate AZs); both API tiers use regional API Gateway endpoints with no edge dependency. Data never leaves Frankfurt.

A bastion EC2 with SSM Session Manager provides break-glass access into the private subnets — no SSH keys, no public bastion port, all access audited via CloudTrail.

Ingest Path

The ingest path is asynchronous end-to-end:

The upstream sender POSTs each RFID read (single event, ~250 bytes) to the regional API Gateway endpoint over HTTPS with HTTP Basic Auth.
AWS WAF applies a rate-based rule (200,000 requests / 5 min per source IP) and AWS Managed Rules. Throttled requests get a custom 429 Too Many Requests response — chosen specifically because the upstream sender retries 429 automatically but treats 403 as a permanent failure.
API Gateway invokes a custom Authorizer Lambda that validates the partner credentials against a bcrypt hash stored in Secrets Manager, with a 5-minute IAM policy cache.
The Ingest Lambda validates the event shape (accepting both single events and batches) and enqueues the payload onto an SQS Standard queue. The Lambda runs outside the VPC for cold-start performance and returns 202 Accepted in tens of milliseconds.
A Worker Lambda (in VPC, reserved concurrency) drains the queue in batches, connects to the database via RDS Proxy, and bulk-inserts the rows. PostgreSQL AFTER INSERT triggers then apply three things atomically in the same transaction: deduplication (per tag and read point, keep first + latest), fan-out to per-domain partitioned tables, and enrichment via joins against reference data.
Failed messages flow to a Dead-Letter Queue with a CloudWatch alarm at depth ≥ 1.

Idempotency is enforced at the database with a unique constraint and INSERT ... ON CONFLICT DO NOTHING. Duplicate sends from the upstream — common after transient retries — become silent no-ops.

Read Path

The read path is synchronous and short:

Consumers call the read API over HTTPS with an x-api-key header.
AWS WAF + API Gateway apply rate limiting via a Usage Plan (per-key throttle and quota).
The Read Lambda (in VPC) connects to the Aurora reader endpoint via RDS Proxy and returns cursor-paginated results.

Splitting writes (Worker → writer) and reads (Read Lambda → reader) means BI workloads and high-volume report queries can't slow ingestion down.

Data Enrichment Layer

The platform's core value isn't storing reads — it's transforming them. Raw events arriving from the upstream sender carry only tag_id, reader_id, location_string, and timestamp. Consumers need business-level context: which asset is this, where did it originate, where is it headed, what kind of facility is the reader at, what country is that facility in.

Enrichment runs as PostgreSQL trigger logic in the same transaction as the insert. The triggers join the incoming row against four classes of reference data:

Asset binding tables — map RFID tag identifiers to the business-level asset they're attached to.
Facility reference tables — map facility codes to human-readable facility names, operator names, country codes, function types.
Location reference tables — map reader identifiers to facility codes and geographic data.
Code list tables — ISO country codes, type codes, status codes — the small lookup tables that turn opaque identifiers into readable values.

Reference data is refreshed on a schedule (EventBridge → Lambda → bulk reload), versioned in SQL migrations alongside the schema, and stored in the same Aurora cluster so joins are local and free. A single SELECT against the enriched table returns 15–20 columns of pre-joined context, not the 4 columns of the raw event.

Database Layer

Aurora Serverless v2, PostgreSQL 16, Multi-AZ (writer in one AZ, reader in another). ACU range: min 1, max 16 per instance — total 2 ACU at idle, scaling automatically up to 32 ACU under load. Storage auto-scales without limit. Encrypted at rest with a customer-managed KMS key, 28-day point-in-time recovery, deletion protection enabled at both cluster and instance level.

The reads tables are partitioned by month from day one. A scheduled retention Lambda drops partitions older than 24 months automatically — predictable storage costs and a clean audit trail for GDPR deletion requests.

RDS Proxy sits in front of the cluster, configured to pool up to 90% of Aurora's max connections. This solves the classic Lambda + RDS pattern where each cold-start opens a fresh connection and exhausts the database under burst. Two database roles, write and read-only, with credentials stored and automatically rotated in Secrets Manager.

Operational Tooling

Bastion EC2 with SSM Session Manager for ad-hoc queries and emergency access.
EventBridge schedules retention and rotation Lambdas for monthly partition cleanup and DB credential rotation.
CloudWatch dashboards track ingest request rate, WAF allowed/blocked counts, Aurora ACU, RDS Proxy connection usage, SQS depth, DLQ depth, Lambda errors, and 5xx rates — with horizontal threshold annotations on every chart.
SNS topic wires CloudWatch alarms (DLQ depth, worker errors, Aurora ACU near max, ingest 5xx) to email notifications.
All Lambda log groups have explicit 30-day retention.

Deployment

The entire platform is defined in AWS SAM templates committed to GitHub, organised into three independent stacks:

infra — VPC, subnets, security groups, Aurora cluster, RDS Proxy, KMS key, base secrets.
ingest — API Gateway, WAF, Authorizer, Ingest Lambda, SQS, DLQ, Worker, alarms.
read — API Gateway, WAF, Read Lambda, usage plans, API keys.

Each stack deploys to -dev and -prod independently. Dev and prod share the same Aurora cluster (different logical databases) — dev costs are minimal because Lambda, API Gateway, and SQS are pay-per-use.

AWS Services

Service	Purpose
API Gateway (REST)	Public HTTPS endpoints for ingest and read APIs
AWS WAF	Rate limiting (200k / 5 min / IP), AWS Managed Rules, custom 429 response
Lambda	Authorizer, Ingest, Worker, Read, Rotation/Retention handlers
SQS Standard	Async buffer between Ingest API and Worker
SQS DLQ	Failed-message capture with depth alarm
Aurora Serverless v2	PostgreSQL 16, Multi-AZ, 1–16 ACU per instance
RDS Proxy	Connection pooling for Lambda → Aurora
Secrets Manager	Partner credentials, DB role secrets, automatic rotation
KMS	Customer-managed CMK for Aurora, SQS, Secrets, Logs
EventBridge	Schedules retention and rotation Lambdas
CloudWatch	Logs (30-day retention), metrics, dashboards, alarms
SNS	Alarm notifications
EC2 (Bastion)	Break-glass access via SSM Session Manager
Systems Manager	Audited shell access without SSH keys
VPC + Endpoints	Private subnets, interface endpoints for SQS and Secrets Manager
IAM	Least-privilege roles per Lambda
X-Ray	Distributed tracing across Lambda and API Gateway
CloudFormation / SAM	Infrastructure as code, three independent stacks per environment
GitHub	Source of truth, deployments via SAM CLI

Key Design Decisions

Single region, not multi-region. A multi-region active-passive setup was on the table. We chose single-region with Multi-AZ instead. Single-region eliminates a class of replication-lag bugs, halves the operational surface, satisfies EU data residency cleanly, and recovers from any AZ failure in under a minute. The compute savings funded a more generous Aurora ACU ceiling.

Async ingest with SQS buffer. The ingest Lambda's only job is to validate and enqueue. Database hiccups, partition rebuilds, and worker restarts can never cause an ingest 5xx — and the upstream sender's 10-minute retry budget is never put at risk by a slow query.

Dedup, fan-out, and enrichment in PostgreSQL triggers, not application code. The transformation pipeline runs as AFTER INSERT trigger logic in the same transaction as the insert. There is no path to a half-state where one table has the row and another doesn't, and no path to an enriched row missing its joins. All logic is versioned in SQL migrations and deployed alongside the schema.

WAF returns 429, not 403, for throttled requests. The upstream sender's retry logic treats 429 as transient (auto-retried) but 403 as permanent (marked failed, manual intervention required). Returning the correct code preserves zero-touch operations during traffic bursts.

RDS Proxy is non-negotiable for Lambda + Aurora. Every Lambda cold start would otherwise open a fresh database connection. At 200k req / 5 min sustained, this would exhaust Aurora connections in seconds. RDS Proxy multiplexes onto a small pool and adds zero perceptible latency.

Aurora Serverless v2, not provisioned. The workload is bursty. Provisioned would either over-pay at idle or under-provision the burst. Serverless v2 with min 1 ACU per instance pays a small idle premium for an order-of-magnitude burst headroom.

Fully serverless across every tier. Every component scales on its own — API Gateway on request count, Lambda on concurrency, SQS on queue depth, Aurora Serverless v2 on ACU. No fixed-capacity instances anywhere on the data path. The only ceiling is Aurora's max ACU, set generously today and raisable in a one-line SAM parameter change.

Results

Metric	Value
Throughput cap (per source IP)	200,000 requests / 5 min (~666 req/s)
Current sustained load	~28 events/sec, ~100k events/day
Headroom for growth	~10x current peak with no architectural change
Multi-AZ failover	< 1 minute on AZ event
Idempotency	Duplicate sends are silent no-ops at DB level
Auto-retry compatibility	429 response triggers upstream auto-retry
Data retention	24-month rolling, monthly partition auto-drop
Recovery point objective	5 min (Aurora continuous backup)
Recovery time objective	< 5 min (Multi-AZ failover + RDS Proxy reconnect)

Cost (typical load): ~$260–$360/month, broken down as ~$200–$300 Aurora compute (Multi-AZ idle to typical), ~$50 application stack (Lambda, API Gateway, SQS, WAF, VPC endpoints, alarms, Secrets Manager), ~$5 storage. Dev environment adds ~$5/month because all application-tier services are pay-per-use.

The platform has been processing live traffic with zero ingest 5xx attributable to the platform itself, sub-100ms p95 ingest latency, and zero partner credential incidents. A 24-hour stress test at 20x normal throughput drove Aurora to 4 ACU and Lambda to single-digit concurrent invocations — well below every alarm threshold.