A research paper details how decomposing groups of neurons in a neural network into interpretable "features" may improve safety by enabling monitoring of LLMs (Anthropic)
Anthropic:
A research paper details how decomposing groups of neurons in a neural network into interpretable “features” may improve safety by enabling monitoring of LLMs — Neural networks are trained on data, not programmed to follow rules. With each step of training …
from Techmeme https://ift.tt/cXfTCvH
A research paper details how decomposing groups of neurons in a neural network into interpretable "features" may improve safety by enabling monitoring of LLMs (Anthropic)
Reviewed by swadu
on
October 07, 2023
Rating:
No comments: