Files
youlbot/ingest.py
T
shinalok 06bcdb03ac Implement Phase 4~14: LangGraph Agent, RAG pipeline, Gradio Web UI, voice interface
- Upgrade LLM to Qwen3-14B-4bit with Thinking mode (MlxChatModel as LangChain BaseChatModel)
- Add LangGraph ReAct agent with tool calling loop (search_documents, web_search, get_current_date, remember/recall_user_info)
- Add RAG pipeline: BAAI/bge-m3 embeddings + Qdrant vector store + semantic chunking (SemanticSplitter via cosine similarity)
- Replace fixed-size RecursiveCharacterTextSplitter with meaning-based SemanticSplitter (numpy only, no extra deps)
- Add Gradio Web UI (app.py): chat, document ingestion, document management tabs
- Add multi-user support (user_id isolation in DB + per-user agent cache + dropdown selector)
- Add conversation history restore from MySQL on agent init (Phase 11)
- Add UserProfileRepository for persistent user profile (remember/recall tools)
- Add thread-local DB connections to fix pymysql thread-safety with LangGraph ToolNode
- Add Phase 14 voice interface: Whisper STT (microphone → text) + macOS TTS (say -v Yuna)
- Enforce search_documents-first policy in system prompt and tool descriptions
- Update ROADMAP2.md: Phase 14 완료, Phase 13 청킹 부분 완료

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 14:06:22 +09:00

29 lines
698 B
Python

"""문서 수집 CLI.
사용법:
python ingest.py <파일경로> [<파일경로> ...]
예시:
python ingest.py docs/육아가이드.pdf docs/금융상품안내.txt
"""
import sys
from container import Container
def main() -> None:
files = sys.argv[1:]
if not files:
print("사용법: python ingest.py <파일경로> [<파일경로> ...]")
sys.exit(1)
container = Container()
service = container.ingestion_service()
print(f"{len(files)}개 파일 수집 시작...")
count = service.ingest(files)
print(f"완료: {count}개 청크가 Qdrant({container.config().qdrant_url})에 저장되었습니다.")
if __name__ == "__main__":
main()