Accepted for publication in IJRSI Winter Journal 2026
KV Cache Recycling to
Expand Usable Context Capacity in Low Parameter LLMs
Whether attention key value (KV) states computed for one prompt for a small LLM
can be reused to accelerate inference on a new similar prompt, giving an increase to the space to its
context memory using token recycling.