Files

T

shinalok 06bcdb03ac Implement Phase 4~14: LangGraph Agent, RAG pipeline, Gradio Web UI, voice interface

- Upgrade LLM to Qwen3-14B-4bit with Thinking mode (MlxChatModel as LangChain BaseChatModel)
- Add LangGraph ReAct agent with tool calling loop (search_documents, web_search, get_current_date, remember/recall_user_info)
- Add RAG pipeline: BAAI/bge-m3 embeddings + Qdrant vector store + semantic chunking (SemanticSplitter via cosine similarity)
- Replace fixed-size RecursiveCharacterTextSplitter with meaning-based SemanticSplitter (numpy only, no extra deps)
- Add Gradio Web UI (app.py): chat, document ingestion, document management tabs
- Add multi-user support (user_id isolation in DB + per-user agent cache + dropdown selector)
- Add conversation history restore from MySQL on agent init (Phase 11)
- Add UserProfileRepository for persistent user profile (remember/recall tools)
- Add thread-local DB connections to fix pymysql thread-safety with LangGraph ToolNode
- Add Phase 14 voice interface: Whisper STT (microphone → text) + macOS TTS (say -v Yuna)
- Enforce search_documents-first policy in system prompt and tool descriptions
- Update ROADMAP2.md: Phase 14 완료, Phase 13 청킹 부분 완료

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-27 14:06:22 +09:00

10 KiB

Raw Permalink Blame History

template, version, feature, date, author, project, status

template	version	feature	date	author	project	status
plan	1.4	rag-tool-chain	2026-04-27	sal	youlbot	Draft

rag-tool-chain Planning Document

Summary: mlx-lm을 LangChain BaseChatModel로 래핑하고, LangGraph 에이전트로 RAG + Tool Calling을 통합한다. 커스텀 구현은 최소화하고 LangChain/LangGraph 생태계를 최대한 활용한다.

Project: youlbot Author: sal Date: 2026-04-27 Status: Draft

Executive Summary

Perspective	Content
Problem	현재 율봇은 모델 파라미터 지식에만 의존하며, Tool Calling·RAG를 직접 구현하면 유지보수 부담이 큼
Solution	mlx-lm을 `BaseChatModel`로 1회 래핑 후 LangGraph 에이전트와 LangChain RAG 생태계를 그대로 활용
Function/UX Effect	육아·금융 전문 문서 기반 답변, Tool 호출로 동적 정보 처리 가능
Core Value	커스텀 코드 최소화 — LangGraph가 Tool Calling 루프·상태 관리를 담당, LangChain이 RAG 파이프라인을 담당

Context Anchor

Key	Value
WHY	Tool Calling 루프·히스토리 관리·RAG 오케스트레이션을 직접 구현하면 버그 표면적이 넓고 유지보수 비용이 높음
WHO	개발자 (sal) — 단독 개발
RISK	mlx-lm `BaseChatModel` 래퍼가 LangGraph와 완전 호환되는지 검증 필요
SUCCESS	`create_react_agent(llm, tools)` 수준의 단순한 에이전트 구성으로 RAG·Tool Calling 동작
SCOPE	Phase 1: mlx-lm BaseChatModel 래퍼 / Phase 2: RAG 파이프라인 / Phase 3: LangGraph 에이전트 통합

1. Overview

1.1 Architecture 결정 (Option B)

mlx-lm
  └─ MlxChatModel(BaseChatModel)   ← 1회 구현 (~80줄)
       └─ LangGraph ReAct Agent    ← Tool Calling 루프 내장
            ├─ RAG Tool            ← LangChain-Qdrant 검색
            └─ 기타 Tools

LangGraph가 처리하는 것 (커스텀 불필요):

Tool Calling 루프 (tool_call → 실행 → 재요청)
대화 상태 및 히스토리 관리
조건부 라우팅 (일반 답변 vs Tool 호출)
최대 반복 횟수 제한

LangChain이 처리하는 것 (커스텀 불필요):

문서 로딩 (PDF, TXT, MD)
텍스트 청킹
임베딩 생성
Qdrant 벡터 스토어 연동

직접 구현하는 것 (최소):

MlxChatModel(BaseChatModel) — mlx-lm 래퍼 (~80줄)
Tool 구현체 (비즈니스 로직 함수들)
IoC Container 배선

1.2 Background

율봇의 도메인: 육아, 금융 — 신뢰성 있는 출처 기반 답변이 중요
Qwen2.5-7B-Instruct는 Tool Calling 네이티브 지원
LangGraph는 LangChain 공식 에이전트 오케스트레이션 프레임워크 (2024년 이후 표준)

2. Scope

2.1 In Scope

Phase 1 — MlxChatModel 래퍼

services/model/mlx_chat_model.py — BaseChatModel 서브클래스
- _generate() — 단일 응답 (tool_call 포함 AIMessage 반환)
- _stream() — 스트리밍 청크
- bind_tools() — LangChain 표준 Tool 바인딩

Phase 2 — RAG 파이프라인

services/rag/ingestion_service.py — 문서 로드 → 청크 → 임베딩 → Qdrant 저장
services/rag/retriever_service.py — Qdrant 검색 → LangChain Tool 래핑
config.py 확장 — Qdrant, 임베딩 모델, RAG 설정

Phase 3 — LangGraph 에이전트 통합

services/agent/agent_service.py — LangGraph create_react_agent 조립
services/agent/tools.py — Tool 구현체 (@tool 데코레이터)
container.py 업데이트 — 신규 서비스 IoC 등록
기존 ChatService 보존, AgentService로 선택적 전환

2.2 기존 코드 처리

기존 코드	처리 방향
`AbstractModelService` + `MlxModelService`	보존 (LangGraph 없는 단순 모드용)
`ChatService`	보존
`HistoryService`	LangGraph State로 대체 (Phase 3)
`CompactService`	LangGraph Memory 전략으로 추후 대체
`EventBus` / `StreamTokenHandler`	LangGraph Streaming callback으로 대체 (Phase 3)

2.3 Out of Scope

웹 API 레이어 (FastAPI 등)
문서 관리 UI
외부 API 기반 Tool (날씨, 금융 API 등) — 추후 Phase
LangGraph 퍼시스턴스 (체크포인터, 장기 메모리) — 추후 Phase

3. Requirements

3.1 Functional Requirements

ID	Requirement	Priority
FR-01	`MlxChatModel`이 LangChain `BaseChatModel` 인터페이스를 완전히 구현	High
FR-02	`bind_tools()`로 Tool을 바인딩하면 모델이 tool_call을 생성	High
FR-03	문서(PDF, TXT, MD)를 Qdrant에 수집·저장하는 수집 파이프라인	High
FR-04	LangGraph ReAct 에이전트가 RAG Tool을 자동 호출하여 컨텍스트 확보	High
FR-05	Tool Calling 루프는 LangGraph가 관리 (직접 구현 금지)	High
FR-06	스트리밍 출력은 LangGraph의 `stream()` 인터페이스 활용	Medium

3.2 Non-Functional Requirements

Category	Criteria
커스텀 코드 최소화	LangGraph/LangChain이 제공하는 기능은 직접 구현하지 않음
교체 용이성	`MlxChatModel`을 `ChatOllama` 등으로 교체 시 `AgentService` 코드 변경 없음
성능	임베딩 모델 Singleton으로 1회만 로딩
안정성	Tool 실행 실패 시 LangGraph가 에러를 메시지로 처리, 대화 중단 없음

4. Architecture

4.1 디렉터리 구조

services/
  model/
    base.py                    # AbstractModelService (기존 유지)
    mlx_model.py               # MlxModelService (기존 유지)
    mlx_chat_model.py          # MlxChatModel : BaseChatModel (신규, Phase 1)
  rag/
    __init__.py
    ingestion_service.py       # 문서 로드/청크/임베딩/Qdrant 저장 (Phase 2)
    retriever_service.py       # Qdrant 검색 → LangChain Retriever (Phase 2)
  agent/
    __init__.py
    agent_service.py           # LangGraph create_react_agent 조립 (Phase 3)
    tools.py                   # @tool 데코레이터 Tool 구현체 (Phase 3)
  chat/                        # 기존 전부 유지
  db/                          # 기존 전부 유지
  events/                      # 기존 전부 유지
  ui/                          # 기존 전부 유지

4.2 MlxChatModel 인터페이스 (Phase 1 핵심)

class MlxChatModel(BaseChatModel):
    model_id: str
    max_tokens: int = 1024

    def _generate(self, messages, stop=None, **kwargs) -> ChatResult:
        prompt = self._tokenizer.apply_chat_template(messages, ...)
        text = generate(self._model, self._tokenizer, prompt, ...)
        return ChatResult(generations=[ChatGeneration(message=AIMessage(content=text))])

    def _stream(self, messages, stop=None, **kwargs) -> Iterator[ChatGenerationChunk]:
        prompt = self._tokenizer.apply_chat_template(messages, ...)
        for chunk in stream_generate(...):
            yield ChatGenerationChunk(message=AIMessageChunk(content=chunk.text))

4.3 LangGraph 에이전트 흐름 (Phase 3)

# AgentService의 핵심 — 대부분 라이브러리가 처리
llm = MlxChatModel(model_id=config.model_id)
tools = [rag_search_tool, get_current_date_tool, ...]

agent = create_react_agent(llm, tools)

# 실행 — Tool Calling 루프, 히스토리, 에러 처리 모두 LangGraph 담당
result = agent.invoke({"messages": [HumanMessage(content=user_input)]})

4.4 RAG Tool 구조 (Phase 2 + Phase 3)

@tool
def search_documents(query: str) -> str:
    """육아·금융 관련 문서에서 관련 내용을 검색합니다."""
    docs = retriever.invoke(query)
    return format_docs(docs)

4.5 의존성

# 신규 추가
langchain-core
langchain-community      # 문서 로더, HuggingFace 임베딩
langchain-text-splitters
langchain-qdrant         # Qdrant 벡터 스토어
langgraph                # 에이전트 오케스트레이션
sentence-transformers    # 로컬 임베딩 (BAAI/bge-m3)
qdrant-client

4.6 Config 확장

# Qdrant
qdrant_host: str = "localhost"
qdrant_port: int = 6333
qdrant_collection: str = "youlbot_docs"

# Embedding
embedding_model_id: str = "BAAI/bge-m3"

# RAG
rag_top_k: int = 3
rag_score_threshold: float = 0.5

5. Success Criteria

MlxChatModel이 llm.invoke([HumanMessage(...)]) 호출로 정상 응답
llm.bind_tools(tools).invoke(messages) 호출 시 tool_call 포함 응답 생성
PDF/TXT 문서를 수집해 Qdrant에 저장, 쿼리로 관련 청크 검색 가능
LangGraph 에이전트가 RAG Tool을 자동 호출하고 결과를 반영하여 최종 답변 생성
MlxChatModel을 ChatOllama로 교체해도 AgentService 코드 변경 없음

6. Risks

Risk	Impact	Likelihood	Mitigation
`MlxChatModel`의 tool_call 파싱이 LangGraph와 불일치	High	Medium	Phase 1에서 단위 검증 후 Phase 3 진행
Qwen2.5-7B의 ReAct 프롬프트 준수 불안정	Medium	Medium	LangGraph 프롬프트 커스터마이징, few-shot 추가
로컬 임베딩 모델(BGE-M3) 최초 로딩 시간 (~30초)	Medium	High	Singleton 1회 로딩, 진행 안내 메시지
Qdrant 미실행 시 에이전트 전체 불가	High	Medium	RAG Tool 비활성화 config 플래그
LangChain/LangGraph 버전 충돌	Low	Low	버전 고정, 의존성 테스트

7. Architecture Decisions

Decision	Selected	Rationale
LLM 통합 방식	mlx-lm → `BaseChatModel` 래퍼 (Option B)	mlx Apple Silicon 최적화 유지 + LangChain 생태계 전체 활용
에이전트 프레임워크	LangGraph `create_react_agent`	Tool Calling 루프·상태 관리 직접 구현 불필요, LangChain 공식 표준
Tool 정의 방식	`@tool` 데코레이터	LangGraph 표준, JSON 스키마 자동 생성
임베딩 모델	BAAI/bge-m3 (로컬)	한국어 포함 다국어 지원, 서버 불필요
Qdrant 운영	로컬 Docker	개발 단계 외부 의존 최소화
기존 코드 처리	보존 (병행 운영)	ChatService(단순 모드) / AgentService(RAG+Tool 모드) 선택적 사용

10 KiB Raw Permalink Blame History