06bcdb03ac
- Upgrade LLM to Qwen3-14B-4bit with Thinking mode (MlxChatModel as LangChain BaseChatModel) - Add LangGraph ReAct agent with tool calling loop (search_documents, web_search, get_current_date, remember/recall_user_info) - Add RAG pipeline: BAAI/bge-m3 embeddings + Qdrant vector store + semantic chunking (SemanticSplitter via cosine similarity) - Replace fixed-size RecursiveCharacterTextSplitter with meaning-based SemanticSplitter (numpy only, no extra deps) - Add Gradio Web UI (app.py): chat, document ingestion, document management tabs - Add multi-user support (user_id isolation in DB + per-user agent cache + dropdown selector) - Add conversation history restore from MySQL on agent init (Phase 11) - Add UserProfileRepository for persistent user profile (remember/recall tools) - Add thread-local DB connections to fix pymysql thread-safety with LangGraph ToolNode - Add Phase 14 voice interface: Whisper STT (microphone → text) + macOS TTS (say -v Yuna) - Enforce search_documents-first policy in system prompt and tool descriptions - Update ROADMAP2.md: Phase 14 완료, Phase 13 청킹 부분 완료 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
29 lines
698 B
Python
29 lines
698 B
Python
"""문서 수집 CLI.
|
|
|
|
사용법:
|
|
python ingest.py <파일경로> [<파일경로> ...]
|
|
|
|
예시:
|
|
python ingest.py docs/육아가이드.pdf docs/금융상품안내.txt
|
|
"""
|
|
import sys
|
|
from container import Container
|
|
|
|
|
|
def main() -> None:
|
|
files = sys.argv[1:]
|
|
if not files:
|
|
print("사용법: python ingest.py <파일경로> [<파일경로> ...]")
|
|
sys.exit(1)
|
|
|
|
container = Container()
|
|
service = container.ingestion_service()
|
|
|
|
print(f"{len(files)}개 파일 수집 시작...")
|
|
count = service.ingest(files)
|
|
print(f"완료: {count}개 청크가 Qdrant({container.config().qdrant_url})에 저장되었습니다.")
|
|
|
|
|
|
if __name__ == "__main__":
|
|
main()
|