Implement Phase 12 feedback, Phase 13 Semantic Chunker, Phase 13-B Reranker, Bug 5 thinking fix

- Phase 12: FeedbackRepository + td_feedback 테이블, Gradio 👍/👎 이벤트, run_id 추적, LangSmith create_feedback() 연동
- Phase 13: 커스텀 _SemanticSplitter 제거 → langchain_experimental.SemanticChunker 교체, buffer_size/threshold_type 환경변수 적용
- Phase 13-B: RerankService (Cross-Encoder), RetrieverService.search()에 reranker 통합, tools.py as_retriever() → search() 전환
- Bug 5: mlx_chat_model enable_thinking 런타임 오버라이드, agent_service stream_mode=["messages","custom"] 이중 스트림, thinking 토큰 custom 이벤트로 emit
- ROADMAP: LLM 모델명 8B 반영, RAG에 Reranker 추가, 추천 진행 순서 갱신

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
sal
2026-05-29 17:41:36 +09:00
parent e1d7e9cc21
commit 145b0cc96f
13 changed files with 469 additions and 143 deletions
+14 -5
View File
@@ -82,7 +82,13 @@ class MlxChatModel(BaseChatModel):
})
return result
def _build_prompt(self, messages: List[BaseMessage], tools: Optional[list] = None) -> str:
def _build_prompt(
self,
messages: List[BaseMessage],
tools: Optional[list] = None,
enable_thinking: Optional[bool] = None,
) -> str:
_enable_thinking = enable_thinking if enable_thinking is not None else self.enable_thinking
kwargs: dict = {
"tokenize": False,
"add_generation_prompt": True,
@@ -91,7 +97,7 @@ class MlxChatModel(BaseChatModel):
kwargs["tools"] = tools
# Qwen3 thinking 모드 — 지원하지 않는 모델은 무시됨
try:
kwargs["enable_thinking"] = self.enable_thinking
kwargs["enable_thinking"] = _enable_thinking
return self._tokenizer.apply_chat_template(self._to_chat_dicts(messages), **kwargs)
except TypeError:
kwargs.pop("enable_thinking")
@@ -145,7 +151,8 @@ class MlxChatModel(BaseChatModel):
from mlx_lm import generate
tools = kwargs.get("tools")
prompt = self._build_prompt(messages, tools)
enable_thinking_override = kwargs.pop("enable_thinking", None)
prompt = self._build_prompt(messages, tools, enable_thinking=enable_thinking_override)
text = generate(
self._model,
self._tokenizer,
@@ -169,7 +176,9 @@ class MlxChatModel(BaseChatModel):
from mlx_lm import stream_generate
tools = kwargs.get("tools")
prompt = self._build_prompt(messages, tools)
enable_thinking_override = kwargs.pop("enable_thinking", None)
_enable_thinking = enable_thinking_override if enable_thinking_override is not None else self.enable_thinking
prompt = self._build_prompt(messages, tools, enable_thinking=_enable_thinking)
OPEN_THINK = "<think>"
CLOSE_THINK = "</think>"
@@ -178,7 +187,7 @@ class MlxChatModel(BaseChatModel):
SAFE = max(len(OPEN_THINK), len(CLOSE_THINK), len(OPEN_TOOL), len(CLOSE_TOOL))
# enable_thinking=False 모델은 <think> 블록을 생성하지 않으므로 post_think에서 시작
state = "pre_think" if self.enable_thinking else "post_think"
state = "pre_think" if _enable_thinking else "post_think"
buf = ""
out: list[ChatGenerationChunk] = []