3faf8b09ce
- eval/run_ragas.py: collect contexts (RetrieverService) + answers (/chat API), evaluate with faithfulness / answer_relevancy / context_recall / context_precision - eval/dataset.jsonl: 5 Korean Q&A pairs for initial evaluation - eval/requirements.txt: ragas==0.2.9, datasets, langchain-google-vertexai - Evaluator LLM priority: OpenAI > Anthropic > local Qwen3 - Runtime shim for ragas 0.2 / langchain-community 0.4+ vertexai incompatibility Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4 lines
63 B
Plaintext
4 lines
63 B
Plaintext
ragas==0.2.9
|
|
datasets>=2.14.0
|
|
langchain-google-vertexai>=2.0.0
|