2D to 3D Generator — Internal Code Documentation

The Shoe 3D Generator runs on the wearfits-genai-api backend, which also serves the GenAI Try-On product. This document focuses on the shoe-3d-specific code paths while describing the shared infrastructure both products rely on.

Architecture

The application follows a layered architecture optimised for the Cloudflare Workers environment, where each request must complete within the platform's CPU and wall-clock limits. Long-running AI workloads are offloaded via Cloudflare Queues to a background processor, which in turn delegates GPU-intensive tasks to a Modal.com Python backend.

graph TD
    Entry[src/index.ts] --> App[src/app.ts]
    App --> Middleware[src/api/middleware/]
    App --> Routes[src/api/routes/]
    Routes --> Controllers[src/api/controllers/]
    Controllers --> Services[src/services/]
    Services --> Providers[src/providers/]
    Services --> Storage[src/storage/]

    subgraph "Async Loop"
        Queue[Cloudflare Queue] --> Consumer[src/queue/consumer.ts]
        Consumer --> Processor[src/queue/processor.ts]
        Processor --> ServiceLogic[src/services/]
    end

Layer Responsibilities

Layer	Path	Role
API	`src/api/`	HTTP concerns: routing (Hono), schema validation (Zod), OpenAPI generation
Service	`src/services/`	Business logic orchestration across providers and storage
Provider	`src/providers/`	Normalised connectors to external AI services (Fal.ai, OpenRouter, OpenAI)
Queue	`src/queue/`	Background job processing for long-running tasks that exceed Worker limits
Storage	`src/storage/`	Abstractions over Cloudflare KV and R2

The high-level system topology, showing how edge computing (Cloudflare Workers) relates to external AI infrastructure (Modal, Fal.ai):

graph TD
    User[Web/Mobile Client] -->|HTTP| CFW[Cloudflare Workers - Hono]
    CFW -->|Auth| DB[(Prisma/PostgreSQL)]
    CFW -->|Queue Job| Queue[Cloudflare Queue]
    Queue -->|Process| Consumer[Queue Consumer]
    Consumer -->|External AI| Fal[Fal.ai - 3D Gen]
    Consumer -->|External AI| OpenRouter[OpenRouter - LLMs]
    Consumer -->|Heavy GPU| Modal[Modal.com - Python Backend]
    Modal -->|Body Mesh/Pose| SAM3D[SAM3D/MHR]
    Consumer -->|Storage| R2[(Cloudflare R2)]
    Consumer -->|Status| KV[(Cloudflare KV)]
    User -->|Poll Status| CFW

Directory Structure

src/
├── api/
│   ├── routes/          # Hono route definitions using zod-openapi
│   ├── controllers/     # Route handlers; translate HTTP requests to service calls
│   ├── middleware/      # Auth, logging, rate limiting, error handling
│   └── views/           # Server-side rendered HTML (monitoring dashboard)
├── services/            # Core business logic (virtual try-on, shoe 3D, caching)
├── providers/           # Connectors for AI (Fal.ai, OpenRouter) and other APIs
├── queue/               # Queue consumer and job processing logic
├── schemas/             # Shared Zod schemas for validation and API contracts
├── storage/             # R2 and KV storage implementations and factory
└── utils/               # Generic helpers (crypto, URL, image, etc.)

Data Model

The API uses a minimal Prisma schema whose primary purpose is validating SaaS user API keys.

erDiagram
    User ||--o{ ApiKey : "owns"
    User {
        string id PK
        string name
        string email
        string image
        datetime createdAt
    }
    ApiKey {
        string id PK
        string key "unique"
        boolean isActive
        datetime expiresAt
        int usageCount
        int maxUsage
        string userId FK
    }

Transient state lives outside the relational database:

Job Status — Cloudflare KV namespace WF_GENAI_JOB_STATUS
Result Assets — Cloudflare R2 bucket wf-genai-results
Metadata — TTL and owner hash stored as R2 object metadata alongside each result

Shoe 3D Generation Flow

The shoe-3d pipeline is a multi-stage process that moves from raw product photos through image correction, 3D generation, texture enhancement, and mesh optimisation before delivering a GLB file and webhook notification.

sequenceDiagram
    participant User
    participant Worker as API Worker
    participant Queue as Job Queue
    participant Modal as Modal Backend
    participant Fal as Fal.ai
    participant R2 as Storage (R2)

    User->>Worker: POST /shoe-3d (submit images)
    Worker->>Queue: Submit Job
    Worker-->>User: 202 Accepted (Job ID)

    Queue->>Worker: Consume Job
    Worker->>Modal: Correct Image (optional)
    Worker->>Fal: Submit 3D Generation
    Fal-->>Worker: GLB Result
    Worker->>Modal: Render Views & Enhance Texture (optional)
    Worker->>Worker: Optimise / Decimate Mesh
    Worker->>R2: Upload GLB & Preview Grid
    Worker->>User: Webhook Result (job.completed)

The caller receives an immediate 202 Accepted with a job ID and is notified of completion via webhook (or can poll GET /api/v1/jobs/{id}).

Texture Enhancement Pipeline

Texture enhancement is triggered by options.enhanceTexture: true on the shoe-3d request. Quality is controlled by options.genaiQuality:

Mode	`genaiQuality`	Views	Grid Size	Timeout
Default	`"default"`	4-view	2048 × 2048	~3 min
High	`"high"` (default)	6-view	2048 × 3072	~5 min

4-view layout: 3 side views at 30°/150°/270° (15° elevation) + top-down → 2×2 grid

6-view layout: 4 side views at 0°/90°/180°/270° + top + bottom → 2×3 grid

Pipeline Steps

Render views — transparent background renders (1024×1024 each) via a single shared GL context
Compose grid — views are composited on a white background (AI models handle white better than transparency)
Send to Gemini — grid is sent to Gemini Flash (or Pro for packshot correction in high-quality mode) via OpenRouter with reference packshot images
Validate grid alignment — per-view silhouette IoU check; computes an N×N IoU matrix and finds the optimal view permutation. If a swap is detected (requires >0.5 total IoU improvement to avoid false positives on white shoes), views are remapped automatically
Logo refinement (optional, options.refineLogo: true) — a second AI pass focused solely on logos, brand text, and emblems, run between validation and projection so there is only one projection pass onto the original GLB
Project back — enhanced views are projected onto the GLB texture using the shared projection algorithm (vectorised NumPy, normalDotView falloff, texture dilation)
Color correction — three-stage correction: polynomial color mapping (K-Means swatches in Lab space), L-gamma brightness lift, and selective white boost for near-white pixels
Auto smooth normals — angle-based normal smoothing (30° threshold); edges below the threshold receive smooth averaged normals, edges above remain sharp
Compress GLB — texture resizing via v1-compress-glb to meet the WEARFITS 30 MB upload limit

Seven endpoints are deployed to Modal.com and run on T4 GPU hardware with ModernGL + EGL:

Endpoint	URL Slug	Description
Pose Transfer	`v1-pose-transfer`	MHR pose transfer
Body Mask from Size	`v1-body-mask-from-size`	Nearest-neighbour lookup against BodyM dataset
Image Resize	`v1-image-resize`	Resize and format conversion
Compose Grid	`v1-compose-grid`	Garment grid composition
Render GLB	`v1-render-glb`	Render GLB from 6 angles as a grid image (GPU)
Texture Enhance GLB	`v1-texture-enhance-glb`	Full AI texture enhancement pipeline (GPU)
Compress GLB	`v1-compress-glb`	Auto smooth normals + texture resize

Base URL pattern: https://wearfits--{endpoint}.modal.run

GPU rendering uses GPURenderWorker backed by ModernGL with EGL. GL contexts are shared across views within a single request to avoid redundant context creation and destruction.

Tech Stack

Component	Technology
API framework	Hono with zod-openapi
Runtime	Cloudflare Workers (TypeScript)
Database	Prisma (PostgreSQL) via Prisma Accelerate
Storage	Cloudflare R2 (assets) + KV (job status)
Queues	Cloudflare Queues
3D processing	`@gltf-transform/core`, `meshoptimizer`
GPU backend	Modal.com (Python 3.10+), ModernGL + EGL, T4 GPU
AI providers	Fal.ai (3D generation), OpenRouter / Gemini (texture enhancement)
Testing	Vitest with `@cloudflare/vitest-pool-workers`

Resilience

524 Timeout Handling

Cloudflare imposes a ~100 s origin timeout. The 6-view texture enhancement pipeline regularly exceeds this, causing a 524 error to be returned to the worker even though Modal continues executing.

Resolution strategy:

The worker passes a job_id to Modal before the request is dispatched
Modal always writes status.json to R2 at pose-transfer/texture-enhance-{job_id}-status/result.json upon completion
On a 524 or timeout, the worker begins polling R2 for status.json every 10 seconds
When status.json appears, it is treated identically to a successful HTTP response (same payload structure)

Fal.ai 504 Handling

Fal.ai occasionally reports a job as COMPLETED but returns 502/503/504 (downstream_service_unavailable) when the worker fetches the result — indicating the downstream worker lost the output.

The fal-3d.ts provider distinguishes RESULT_UNAVAILABLE (gateway error, re-submittable) from RESULT_FETCH_FAILED (other fetch failure). On RESULT_UNAVAILABLE, the shoe-3d polling loop automatically re-submits the same generation (reusing the fal CDN image URL, no re-upload) up to 2 times using withTracedRetry. Submit options are shared between the original and retry submissions to avoid configuration drift.

Known Technical Debt

High Priority

Usage tracking — UserApiKeyService does not yet increment usageCount in the database after successful key validation (src/services/user-api-key-service.ts).
Multi-material textures — The GPU renderer in Modal handles multi-material GLBs incorrectly: all meshes share the first texture (tools/pose-transfer/gpu_uv_renderer.py).

Medium Priority

OpenRouter model IDs — Model IDs are hardcoded and may change without notice; consider making them configurable via KV or environment variables.
Webhook scaling — WebhookService may hit memory limits when handling large metadata objects.

Maintenance

Localhost CORS — Explicit localhost CORS allowance in src/app.ts should be removed after the testing period.
Direct Gemini API — PhotoAnalysisService currently routes through OpenRouter; implementing direct Gemini API calls would reduce external dependency.