[R] System 2 Attention (is something you might need too)

APaperADay · 1 year ago

[R] System 2 Attention (is something you might need too)

SatoshiNotMe · 1 year ago

That was exactly my thought! In Langroid (the agent-oriented LLM framework from ex-CMU/UW-Madison researchers), we call it Relevance Extraction — given a passage and a query, use the LLM to extract only the portions relevant to the query. In a RAG pipeline where you optimistically retrieve top k chunks (to improve recall), the chunks could be large and hence contain irrelevant/distracting text. We concurrently do relevance extraction from these k chunks: https://github.com/langroid/langroid/blob/main/langroid/agent/special/doc\_chat\_agent.py#L801
One thing often missed in this is the un-necessary cost (latency and token-cost) of parroting out verbatim text from context. In Langroid we use a numbering trick to mitigate this: pre-annotate the passage sentences with numbers, and ask the LLM to simply specify the relevant sentence-numbers. We have an elegant implementation of this in our RelevanceExtractorAgent using tools/function-calling.

Here’s a post I wrote about comparing Langroid’s method with LangChain’s naive equivalent of relevance extraction called `LLMChainExtractor.compress` , and no surprise Langroid’s methos is far faster and cheaper:
https://www.reddit.com/r/LocalLLaMA/comments/17k39es/relevance_extraction_in_rag_pipelines/

If I had the time, the next steps would have been: 1. give it a fancy name, 2. post on arxiv with a bunch of experiments, but I’d rather get on with building 😄