3 hours ago
So my team pushed an update to our AI agent recently and it completely broke the logic. Every single query turned into massive hallucinations making the bot unusable. We had to roll the whole thing back to the previous version just to keep things running. Now we need to thoroughly test how the new model reacts to our existing queries and optimize the prompts before trying another deployment. What tools are out there for running these kinds of tests and tweaking prompts for specific model versions?

