LLM04

Data and Model Poisoning

Adversarial manipulation of training data or fine-tuning processes to embed backdoors or bias model behavior.

1 write-ups1 labs1 demos1 tools
How adversaries embed hidden backdoors in fine-tuned language models that activate only when specific trigger tokens appear.
backdoordata-poisoningfine-tuningtrigger-tokens