LLM04
Data and Model Poisoning
Adversarial manipulation of training data or fine-tuning processes to embed backdoors or bias model behavior.
1 write-ups1 labs1 demos1 tools
LLM04advancedcritical
How adversaries embed hidden backdoors in fine-tuned language models that activate only when specific trigger tokens appear.
backdoordata-poisoningfine-tuningtrigger-tokens