Category: GenAI
-

Caching in GenAI powered systems – Semantic Caching
Recap: What is caching? Caching is the process of saving a once-computed value with a view to retrieving it again later on, to avoid re-computations. This is a very complex subject with many pitfalls, you can read about them in my different article – here When we need to get the value, we first check…
-

Cost optimization of LLM-Based systems
Introduction Most of modern GenAI based systems are powered by a Large Language Model (LLM) under the hood. If you are a tech startup, working on your pet project or even operate in a consulting commpany, chances are you are not using a self hosted model (they have great use cases but the maintenance overhead…