I want to know the tools and methods you use for the observability and monitoring of your ML (LLM) performance and responses in production.
You must log in or register to comment.
If you’re open to using an open source library, you can use LangCheck to monitor and visualize text quality metrics in production.
For example, you can compute & plot toxicity of users prompts and LLM responses from your logs. (A very simple example here.)
(Disclaimer: I’m one of the contributors of LangCheck)