← Index
Project2025
NEPL-LM
A continuous preference-learning system: feedback collection → semantic deduplication → SFT/DPO adapter distillation → inference server. It treats model behavior like a product that ships every sprint instead of every quarter.
Python · FastAPI · Uvicorn · SFT / DPO · Semantic dedup · CLI
- hourly
- updates without full retrains
- SFT + DPO
- adapter distillation
- semantic
- dataset deduplication
The loop
LLM fine-tuning is usually too slow and too expensive for rapid iteration, and the feedback loop is disconnected from training. NEPL-LM wires them together into one cycle that can run hourly.
feedback ──▶ semantic dedup ──▶ dataset curation ──▶ SFT / DPO
▲ │
│ ▼
inference server ◀──────── distilled adapter ◀───────────┘ (hourly)Data quality
Preference datasets are surprisingly dirty. A schema-validated tool-calling generator plus a semantic-deduplication pipeline keep malformed and duplicate samples out of training — because garbage in is garbage out.