Watch

No description

Python 97%
Batchfile 3%

Find a file

Repository files (latest commit first)
Filename	Latest commit message	Latest commit date
nikomo 14bc009123 Fix BC checkpoint selection: steer_raw_mae, not composite val loss Composite val loss was dominated by weighted brake MSE (brake-weight 3x on braking frames), which bottomed at epoch 2 while steering, trajectory, and aux-perception metrics kept improving for 20+ more epochs. bc_best.pt selection and early-stopping patience now key off val steer_raw_mae, matching the project's existing manual-judgment policy (raw wheel-space, not raw loss). Also fixes periodic epoch_NNN.pt checkpoints silently omitting steer_transform_power/train_config (only the bc_best.pt save path had them), which made loading a periodic checkpoint anywhere default to the wrong identity steering transform instead of the real value. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Nt2oQWBxSgYVYvtXujmgEk		2026-07-20 16:23:48 +03:00
plans	Async PPO: learner thread + crash-cut rollouts (game never pauses)	2026-07-16 09:36:22 +00:00
reports	Add RL reward function analysis report	2026-07-16 05:55:35 +00:00
tasks	Fix BC checkpoint selection: steer_raw_mae, not composite val loss	2026-07-20 16:23:48 +03:00
.gitignore	finalize: short warm-up, Dropout 0.5, gitignore, TRAINING_PROGRESS update	2026-06-13 20:00:16 +03:00
AssettoMemory.py	Restructure: DINOv3 backbone, BC eval tooling, RL pause-during-update	2026-07-15 22:17:35 +03:00
CLAUDE.md	docs: rewrite stale README (was ResNet-18 era); fix horizon comment	2026-07-15 22:23:21 +03:00
config.py	Async PPO: learner thread + crash-cut rollouts (game never pauses)	2026-07-16 09:36:22 +00:00
controller.py	Restructure: DINOv3 backbone, BC eval tooling, RL pause-during-update	2026-07-15 22:17:35 +03:00
drive.bat	Restructure: DINOv3 backbone, BC eval tooling, RL pause-during-update	2026-07-15 22:17:35 +03:00
eval_bc.py	Restructure: DINOv3 backbone, BC eval tooling, RL pause-during-update	2026-07-15 22:17:35 +03:00
IMPROVEMENTS.md	feat: huber steering loss, rpm telemetry, collapse warnings, hue jitter	2026-06-19 02:09:31 +03:00
inference.py	feat: huber steering loss, rpm telemetry, collapse warnings, hue jitter	2026-06-19 02:09:31 +03:00
main.py	Restructure: DINOv3 backbone, BC eval tooling, RL pause-during-update	2026-07-15 22:17:35 +03:00
model.py	perf: PPO update via one cuDNN sequence call per segment (~35x); per-tyre off-track penalty	2026-07-15 23:06:43 +03:00
PLAN.md	docs: broaden target from Mugello-only to all tracks	2026-06-19 01:30:11 +03:00
README.md	Async PPO: learner thread + crash-cut rollouts (game never pauses)	2026-07-16 09:36:22 +00:00
record.py	Restructure: DINOv3 backbone, BC eval tooling, RL pause-during-update	2026-07-15 22:17:35 +03:00
requirements.txt	Restructure: DINOv3 backbone, BC eval tooling, RL pause-during-update	2026-07-15 22:17:35 +03:00
screen_capture.py	Restructure: DINOv3 backbone, BC eval tooling, RL pause-during-update	2026-07-15 22:17:35 +03:00
self-driving.code-workspace	add workspace	2026-04-18 16:05:25 +03:00
telemetry_hud.py	Restructure: DINOv3 backbone, BC eval tooling, RL pause-during-update	2026-07-15 22:17:35 +03:00
test_async_ppo_cuda.py	Async PPO: learner thread + crash-cut rollouts (game never pauses)	2026-07-16 09:36:22 +00:00
test_compile.py	Restructure: DINOv3 backbone, BC eval tooling, RL pause-during-update	2026-07-15 22:17:35 +03:00
test_compile2.py	Restructure: DINOv3 backbone, BC eval tooling, RL pause-during-update	2026-07-15 22:17:35 +03:00
train.bat	Restructure: DINOv3 backbone, BC eval tooling, RL pause-during-update	2026-07-15 22:17:35 +03:00
train.py	Fix BC checkpoint selection: steer_raw_mae, not composite val loss	2026-07-20 16:23:48 +03:00
train_overfit.bat	Restructure: DINOv3 backbone, BC eval tooling, RL pause-during-update	2026-07-15 22:17:35 +03:00
train_rl.bat	Async PPO: learner thread + crash-cut rollouts (game never pauses)	2026-07-16 09:36:22 +00:00
train_rl.py	Async PPO: learner thread + crash-cut rollouts (game never pauses)	2026-07-16 09:36:22 +00:00
vjoy_bind_to_ac.py	control bind script	2026-04-26 23:25:33 +03:00
vjoy_test.py	rename vjoy_test	2026-04-26 23:25:24 +03:00

README.md

Assetto Corsa Autonomous Driving Agent

An end-to-end autonomous racing agent for Assetto Corsa (AC), built in Python/PyTorch. Target: Nissan GT-R R34, fastest lap time across all tracks.

The vision backbone is a ConvNeXt-Tiny distilled from DINOv3 feeding a GRU policy that also sees (heavily regularized) telemetry. Training happens in two phases:

Behavioral Cloning (BC, train.py) — supervised learning from recorded human driving, with per-timestep supervision over 8-frame GRU sequences.
Reinforcement Learning (RL, train_rl.py) — in-game PPO fine-tuning with a KL penalty toward the BC prior to prevent catastrophic forgetting.

Important

Windows-only. The project depends on Windows named shared memory (mmap), pyvjoy, and DXGI capture. It will not run on Linux/macOS.

Architecture

graph TD
    A[Assetto Corsa] -->|Shared Memory ctypes+mmap| B(AssettoMemory.py)
    A -->|DXGI capture thread| C(screen_capture.py)
    B -->|speed, gear, rpm| D[TelemetryNormalizer]
    C -->|288x512 RGB frame| E[ConvNeXt-Tiny DINOv3]
    E -->|256-dim vision features| F[GRU 256]
    D -->|3-dim, anti-shortcut masked| F
    F -->|steering, throttle, brake| G(controller.py)
    G -->|vJoy virtual controller| A

Model (`model.py`)

Vision: ConvNeXt-Tiny distilled from DINOv3 (timm convnext_tiny.dinov3_lvd1689m, pretrained; stem + stages 0–1 frozen), per frame at 288×512 RGB. Two taps concatenated → linear projection to 256-dim: avg-pool of the stride-32 map ("what", 768-dim) and per-channel spatial softmax on the stride-16 map ("where", 768-dim keypoint coordinates on an 18×32 grid).
Temporal: single-layer GRU (256 hidden) over SEQUENCE_LEN = 8 frames.
Telemetry: 3-dim (speed, gear, RPM) fused into the GRU input with heavy anti-shortcut regularization (dropout + full-channel masking) so the policy can't lean on telemetry instead of vision.
Heads: policy (steering tanh, throttle/brake sigmoid) + 4-step future action prediction; aux heads for speed and track position (BC only); value head (RL only, ActorCritic).

Key modules

Module	Role
`AssettoMemory.py`	ctypes structs + mmap readers for AC shared memory (`Local\acpmf_physics/graphics/static`)
`screen_capture.py`	DXGI capture thread (`bettercam`); BGR→RGB, resize, normalize, tensor
`controller.py`	vJoy interface (steering X, throttle Y, brake Z, gear buttons) + rule-based `GearShifter`
`inference.py`	`TelemetryNormalizer`, `SteerTransform`, `FrameStack`
`config.py`	Tunable constants (image dims, sequence length, shift RPM ratios, reset/pause coordinates)
`record.py`	Data recorder: frames + telemetry while a human drives
`telemetry_hud.py`	Always-on-top speed/gear/RPM window used while recording
`train.py`	Phase A: behavioral cloning
`eval_bc.py`	Offline BC evaluator — raw wheel-space steering/brake metrics by curvature bin
`train_rl.py`	Phase B: async PPO fine-tuning (learner thread updates while the collector drives; auto session restarts on crashes)
`main.py`	Real-time driving agent with crash-recovery state machine

Recorded data lives in data/{track}_NNN/ (640×360 JPEGs + telemetry.parquet, rescaled to 288×512 on the fly); checkpoints and training logs in checkpoints/. Both are gitignored.

Setup

1. Prerequisites

Assetto Corsa (the game exposes its shared-memory telemetry natively — no plugin needed).
vJoy: install and configure Device 1 with at least 3 axes (X = steering, Y = throttle, Z = brake) and buttons for gear shifts.
Python 3.14 (venv expected at .venv).
NVIDIA GPU recommended (dev machine: RTX 4080).

2. Install dependencies

.venv\Scripts\activate
pip install -r requirements.txt
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124

3. Bind vJoy inside AC

python vjoy_bind_to_ac.py    :: helper to bind vJoy axes in AC's settings menu
python vjoy_test.py          :: verify axes/buttons independently

Workflow

The in-game scripts (record.py, main.py, train_rl.py) need AC running with a session loaded, and may need an elevated console for the keyboard hotkey hooks to fire while AC has focus.

Phase 1 — Record human driving

python record.py

R toggles recording, Q/ESC saves and exits. Sessions land in data/{track}_NNN/. python telemetry_hud.py gives a standalone HUD demo.

Phase 2 — Behavioral cloning

train.bat

Use the batch file, not raw train.py — it encodes hard-won flag choices (no --compile: Triton is unavailable on Windows; no --cache-frames: Windows DataLoader workers each copy the cache; pinned --val-sessions so metrics are comparable between runs). Best weights save to checkpoints/bc_best.pt, logs to checkpoints\train_<timestamp>.log.

Judge runs by val steer MAE and pred-p99, not raw loss, and gate against the frozen baseline with the offline evaluator:

python eval_bc.py --checkpoint checkpoints/bc_best.pt

Phase 3 — PPO fine-tuning (in-game)

train_rl.bat [steps] [horizon] [fresh]

AC must be running; main.py must not be (both need the vJoy device). Training is asynchronous — the game never pauses: the collector (main thread) drives continuously while a learner thread runs each PPO update on its own CUDA stream and publishes fresh weights, which the collector swaps in between steps. Rollouts hand off at the horizon, or early when an episode crashes with at least config.MIN_ROLLOUT_STEPS collected — so early training updates on nearly every crash, with the update overlapping the ~15–30 s session reset. The horizon is a per-update data budget, not a cap on episode length. Crashes auto-restart the session. F8 stops gracefully and saves rl_final.pt; the best model by average rollout reward saves to checkpoints/rl_best.pt.

After changing the trainer, smoke-test the pipeline without AC: python test_async_ppo_cuda.py (CUDA required — cuDNN's threading/RNN rules don't reproduce on CPU).

Resumable: when checkpoints/rl_best.pt exists, train_rl.bat automatically resumes from it — model, optimizer, adaptive KL beta, step counter, and best-reward watermark are all restored (--resume on raw train_rl.py), so rl_best.pt is only overwritten on a genuine improvement and steps remains a global budget across runs. Pass fresh as the third argument to discard RL progress and restart from BC weights.

Live-tunable reward: the reward-shaping constants (speed/progress scales, off-track, slow-speed, damage, terminal penalties) live in config.py's "RL reward shaping" section and hot-reload between rollouts — edit and save config.py mid-run and the new values apply from the next rollout (changes are logged; a file saved mid-edit is skipped and retried). PPO hyperparameters in train_rl.py intentionally do not hot-reload.

Phase 4 — Drive

drive.bat

Runs main.py with checkpoints\bc_best.pt at a 60 fps loop cap (matches the recording rate — the GRU must step at the rate it was trained at). F7 toggles agent control, F8 kills control, resets inputs, and exits.

Safety and crash recovery

main.py runs a state machine: WAITING_FOR_SESSION → DRIVING → CRASHED → RESETTING. Crash detection: stuck (below walking pace >3 s), damage spikes, sustained off-track, reversing, wrong-way.

The automated session restart (main.perform_ac_reset, shared by train_rl.py) is built on two empirical facts about AC:

A restart always lands on the pre-drive screen, which is telemetry-indistinguishable from driving — the only reliable discriminator is a vJoy throttle blip (in-car the engine revs; pre-drive ignores controller input).
AC's in-game UI is mouse-only and driven by ordinary window messages, so buttons are clicked via PostMessage in client coordinates (no focus or cursor needed). Injected keyboard (ESC via SendInput) needs AC foreground and is used only where unavoidable.

The reset sequence: throttle-blip probe → trigger the restart (pause-menu or Session Control pane click, depending on state) → confirm via shared memory (status LIVE + lap timer reset) → click 'Drive' → re-blip to confirm the car responds. Click coordinates live in config.py (measured at a 1918×1078 client area — re-measure if the AC window size changes).

F7/F8 remain the human override at all times.

Development notes

Tunables live in config.py; model architecture constants at the top of model.py. CLAUDE.md documents the full workflow rules.
No formal test suite — verification is manual (run the affected script and observe). Minimum static gate: python -m pyflakes <files>. Anything touching the PPO update must be tested on CUDA, not just CPU.
Training sessions are the unit of train/val splitting — never split within a session (temporal leakage).

README.md Unescape Escape