2D to 3D Generator — Internal Code Documentation
The Shoe 3D Generator runs on the wearfits-genai-api backend, which also serves the GenAI Try-On product. This document focuses on the shoe-3d-specific code paths while describing the shared infrastructure both products rely on.
Architecture
The application follows a layered architecture optimised for the Cloudflare Workers environment, where each request must complete within the platform's CPU and wall-clock limits. Long-running AI workloads are offloaded via Cloudflare Queues to a background processor, which in turn delegates GPU-intensive tasks to a Modal.com Python backend.
graph TD
Entry[src/index.ts] --> App[src/app.ts]
App --> Middleware[src/api/middleware/]
App --> Routes[src/api/routes/]
Routes --> Controllers[src/api/controllers/]
Controllers --> Services[src/services/]
Services --> Providers[src/providers/]
Services --> Storage[src/storage/]
subgraph "Async Loop"
Queue[Cloudflare Queue] --> Consumer[src/queue/consumer.ts]
Consumer --> Processor[src/queue/processor.ts]
Processor --> ServiceLogic[src/services/]
end
Layer Responsibilities
| Layer | Path | Role |
|---|---|---|
| API | src/api/ |
HTTP concerns: routing (Hono), schema validation (Zod), OpenAPI generation |
| Service | src/services/ |
Business logic orchestration across providers and storage |
| Provider | src/providers/ |
Normalised connectors to external AI services (Fal.ai, OpenRouter, OpenAI) |
| Queue | src/queue/ |
Background job processing for long-running tasks that exceed Worker limits |
| Storage | src/storage/ |
Abstractions over Cloudflare KV and R2 |
The high-level system topology, showing how edge computing (Cloudflare Workers) relates to external AI infrastructure (Modal, Fal.ai):
graph TD
User[Web/Mobile Client] -->|HTTP| CFW[Cloudflare Workers - Hono]
CFW -->|Auth| DB[(Prisma/PostgreSQL)]
CFW -->|Queue Job| Queue[Cloudflare Queue]
Queue -->|Process| Consumer[Queue Consumer]
Consumer -->|External AI| Fal[Fal.ai - 3D Gen]
Consumer -->|External AI| OpenRouter[OpenRouter - LLMs]
Consumer -->|Heavy GPU| Modal[Modal.com - Python Backend]
Modal -->|Body Mesh/Pose| SAM3D[SAM3D/MHR]
Consumer -->|Storage| R2[(Cloudflare R2)]
Consumer -->|Status| KV[(Cloudflare KV)]
User -->|Poll Status| CFW
Directory Structure
src/
├── api/
│ ├── routes/ # Hono route definitions using zod-openapi
│ ├── controllers/ # Route handlers; translate HTTP requests to service calls
│ ├── middleware/ # Auth, logging, rate limiting, error handling
│ └── views/ # Server-side rendered HTML (monitoring dashboard)
├── services/ # Core business logic (virtual try-on, shoe 3D, caching)
├── providers/ # Connectors for AI (Fal.ai, OpenRouter) and other APIs
├── queue/ # Queue consumer and job processing logic
├── schemas/ # Shared Zod schemas for validation and API contracts
├── storage/ # R2 and KV storage implementations and factory
└── utils/ # Generic helpers (crypto, URL, image, etc.)
Data Model
The API uses a minimal Prisma schema whose primary purpose is validating SaaS user API keys.
erDiagram
User ||--o{ ApiKey : "owns"
User {
string id PK
string name
string email
string image
datetime createdAt
}
ApiKey {
string id PK
string key "unique"
boolean isActive
datetime expiresAt
int usageCount
int maxUsage
string userId FK
}
Transient state lives outside the relational database:
- Job Status — Cloudflare KV namespace
WF_GENAI_JOB_STATUS - Result Assets — Cloudflare R2 bucket
wf-genai-results - Metadata — TTL and owner hash stored as R2 object metadata alongside each result
Shoe 3D Generation Flow
The shoe-3d pipeline is a multi-stage process that moves from raw product photos through image correction, 3D generation, texture enhancement, and mesh optimisation before delivering a GLB file and webhook notification.
sequenceDiagram
participant User
participant Worker as API Worker
participant Queue as Job Queue
participant Modal as Modal Backend
participant Fal as Fal.ai
participant R2 as Storage (R2)
User->>Worker: POST /shoe-3d (submit images)
Worker->>Queue: Submit Job
Worker-->>User: 202 Accepted (Job ID)
Queue->>Worker: Consume Job
Worker->>Modal: Correct Image (optional)
Worker->>Fal: Submit 3D Generation
Fal-->>Worker: GLB Result
Worker->>Modal: Render Views & Enhance Texture (optional)
Worker->>Worker: Optimise / Decimate Mesh
Worker->>R2: Upload GLB & Preview Grid
Worker->>User: Webhook Result (job.completed)
The caller receives an immediate 202 Accepted with a job ID and is notified of completion via webhook (or can poll GET /api/v1/jobs/{id}).
Texture Enhancement Pipeline
Texture enhancement is triggered by options.enhanceTexture: true on the shoe-3d request. Quality is controlled by options.genaiQuality:
| Mode | genaiQuality |
Views | Grid Size | Timeout |
|---|---|---|---|---|
| Default | "default" |
4-view | 2048 × 2048 | ~3 min |
| High | "high" (default) |
6-view | 2048 × 3072 | ~5 min |
4-view layout: 3 side views at 30°/150°/270° (15° elevation) + top-down → 2×2 grid
6-view layout: 4 side views at 0°/90°/180°/270° + top + bottom → 2×3 grid
Pipeline Steps
- Render views — transparent background renders (1024×1024 each) via a single shared GL context
- Compose grid — views are composited on a white background (AI models handle white better than transparency)
- Send to Gemini — grid is sent to Gemini Flash (or Pro for packshot correction in high-quality mode) via OpenRouter with reference packshot images
- Validate grid alignment — per-view silhouette IoU check; computes an N×N IoU matrix and finds the optimal view permutation. If a swap is detected (requires >0.5 total IoU improvement to avoid false positives on white shoes), views are remapped automatically
- Logo refinement (optional,
options.refineLogo: true) — a second AI pass focused solely on logos, brand text, and emblems, run between validation and projection so there is only one projection pass onto the original GLB - Project back — enhanced views are projected onto the GLB texture using the shared projection algorithm (vectorised NumPy, normalDotView falloff, texture dilation)
- Color correction — three-stage correction: polynomial color mapping (K-Means swatches in Lab space), L-gamma brightness lift, and selective white boost for near-white pixels
- Auto smooth normals — angle-based normal smoothing (30° threshold); edges below the threshold receive smooth averaged normals, edges above remain sharp
- Compress GLB — texture resizing via
v1-compress-glbto meet the WEARFITS 30 MB upload limit
Modal Backend (Python)
Seven endpoints are deployed to Modal.com and run on T4 GPU hardware with ModernGL + EGL:
| Endpoint | URL Slug | Description |
|---|---|---|
| Pose Transfer | v1-pose-transfer |
MHR pose transfer |
| Body Mask from Size | v1-body-mask-from-size |
Nearest-neighbour lookup against BodyM dataset |
| Image Resize | v1-image-resize |
Resize and format conversion |
| Compose Grid | v1-compose-grid |
Garment grid composition |
| Render GLB | v1-render-glb |
Render GLB from 6 angles as a grid image (GPU) |
| Texture Enhance GLB | v1-texture-enhance-glb |
Full AI texture enhancement pipeline (GPU) |
| Compress GLB | v1-compress-glb |
Auto smooth normals + texture resize |
Base URL pattern: https://wearfits--{endpoint}.modal.run
GPU rendering uses GPURenderWorker backed by ModernGL with EGL. GL contexts are shared across views within a single request to avoid redundant context creation and destruction.
Tech Stack
| Component | Technology |
|---|---|
| API framework | Hono with zod-openapi |
| Runtime | Cloudflare Workers (TypeScript) |
| Database | Prisma (PostgreSQL) via Prisma Accelerate |
| Storage | Cloudflare R2 (assets) + KV (job status) |
| Queues | Cloudflare Queues |
| 3D processing | @gltf-transform/core, meshoptimizer |
| GPU backend | Modal.com (Python 3.10+), ModernGL + EGL, T4 GPU |
| AI providers | Fal.ai (3D generation), OpenRouter / Gemini (texture enhancement) |
| Testing | Vitest with @cloudflare/vitest-pool-workers |
Resilience
524 Timeout Handling
Cloudflare imposes a ~100 s origin timeout. The 6-view texture enhancement pipeline regularly exceeds this, causing a 524 error to be returned to the worker even though Modal continues executing.
Resolution strategy:
- The worker passes a
job_idto Modal before the request is dispatched - Modal always writes
status.jsonto R2 atpose-transfer/texture-enhance-{job_id}-status/result.jsonupon completion - On a 524 or timeout, the worker begins polling R2 for
status.jsonevery 10 seconds - When
status.jsonappears, it is treated identically to a successful HTTP response (same payload structure)
Fal.ai 504 Handling
Fal.ai occasionally reports a job as COMPLETED but returns 502/503/504 (downstream_service_unavailable) when the worker fetches the result — indicating the downstream worker lost the output.
The fal-3d.ts provider distinguishes RESULT_UNAVAILABLE (gateway error, re-submittable) from RESULT_FETCH_FAILED (other fetch failure). On RESULT_UNAVAILABLE, the shoe-3d polling loop automatically re-submits the same generation (reusing the fal CDN image URL, no re-upload) up to 2 times using withTracedRetry. Submit options are shared between the original and retry submissions to avoid configuration drift.
Known Technical Debt
High Priority
- Usage tracking —
UserApiKeyServicedoes not yet incrementusageCountin the database after successful key validation (src/services/user-api-key-service.ts). - Multi-material textures — The GPU renderer in Modal handles multi-material GLBs incorrectly: all meshes share the first texture (
tools/pose-transfer/gpu_uv_renderer.py).
Medium Priority
- OpenRouter model IDs — Model IDs are hardcoded and may change without notice; consider making them configurable via KV or environment variables.
- Webhook scaling —
WebhookServicemay hit memory limits when handling large metadata objects.
Maintenance
- Localhost CORS — Explicit
localhostCORS allowance insrc/app.tsshould be removed after the testing period. - Direct Gemini API —
PhotoAnalysisServicecurrently routes through OpenRouter; implementing direct Gemini API calls would reduce external dependency.