Project2025

NEPL-LM

A continuous preference-learning system: feedback collection → semantic deduplication → SFT/DPO adapter distillation → inference server. It treats model behavior like a product that ships every sprint instead of every quarter.

Source ↗

Python · FastAPI · Uvicorn · SFT / DPO · Semantic dedup · CLI

hourly: updates without full retrains
SFT + DPO: adapter distillation
semantic: dataset deduplication

The loop

LLM fine-tuning is usually too slow and too expensive for rapid iteration, and the feedback loop is disconnected from training. NEPL-LM wires them together into one cycle that can run hourly.

  feedback ──▶ semantic dedup ──▶ dataset curation ──▶ SFT / DPO
     ▲                                                      │
     │                                                      ▼
  inference server ◀──────── distilled adapter ◀───────────┘   (hourly)

Data quality

Preference datasets are surprisingly dirty. A schema-validated tool-calling generator plus a semantic-deduplication pipeline keep malformed and duplicate samples out of training — because garbage in is garbage out.

← All work