Implement Phase 12 feedback, Phase 13 Semantic Chunker, Phase 13-B Reranker, Bug 5 thinking fix
- Phase 12: FeedbackRepository + td_feedback 테이블, Gradio 👍/👎 이벤트, run_id 추적, LangSmith create_feedback() 연동 - Phase 13: 커스텀 _SemanticSplitter 제거 → langchain_experimental.SemanticChunker 교체, buffer_size/threshold_type 환경변수 적용 - Phase 13-B: RerankService (Cross-Encoder), RetrieverService.search()에 reranker 통합, tools.py as_retriever() → search() 전환 - Bug 5: mlx_chat_model enable_thinking 런타임 오버라이드, agent_service stream_mode=["messages","custom"] 이중 스트림, thinking 토큰 custom 이벤트로 emit - ROADMAP: LLM 모델명 8B 반영, RAG에 Reranker 추가, 추천 진행 순서 갱신 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -82,7 +82,13 @@ class MlxChatModel(BaseChatModel):
|
||||
})
|
||||
return result
|
||||
|
||||
def _build_prompt(self, messages: List[BaseMessage], tools: Optional[list] = None) -> str:
|
||||
def _build_prompt(
|
||||
self,
|
||||
messages: List[BaseMessage],
|
||||
tools: Optional[list] = None,
|
||||
enable_thinking: Optional[bool] = None,
|
||||
) -> str:
|
||||
_enable_thinking = enable_thinking if enable_thinking is not None else self.enable_thinking
|
||||
kwargs: dict = {
|
||||
"tokenize": False,
|
||||
"add_generation_prompt": True,
|
||||
@@ -91,7 +97,7 @@ class MlxChatModel(BaseChatModel):
|
||||
kwargs["tools"] = tools
|
||||
# Qwen3 thinking 모드 — 지원하지 않는 모델은 무시됨
|
||||
try:
|
||||
kwargs["enable_thinking"] = self.enable_thinking
|
||||
kwargs["enable_thinking"] = _enable_thinking
|
||||
return self._tokenizer.apply_chat_template(self._to_chat_dicts(messages), **kwargs)
|
||||
except TypeError:
|
||||
kwargs.pop("enable_thinking")
|
||||
@@ -145,7 +151,8 @@ class MlxChatModel(BaseChatModel):
|
||||
from mlx_lm import generate
|
||||
|
||||
tools = kwargs.get("tools")
|
||||
prompt = self._build_prompt(messages, tools)
|
||||
enable_thinking_override = kwargs.pop("enable_thinking", None)
|
||||
prompt = self._build_prompt(messages, tools, enable_thinking=enable_thinking_override)
|
||||
text = generate(
|
||||
self._model,
|
||||
self._tokenizer,
|
||||
@@ -169,7 +176,9 @@ class MlxChatModel(BaseChatModel):
|
||||
from mlx_lm import stream_generate
|
||||
|
||||
tools = kwargs.get("tools")
|
||||
prompt = self._build_prompt(messages, tools)
|
||||
enable_thinking_override = kwargs.pop("enable_thinking", None)
|
||||
_enable_thinking = enable_thinking_override if enable_thinking_override is not None else self.enable_thinking
|
||||
prompt = self._build_prompt(messages, tools, enable_thinking=_enable_thinking)
|
||||
|
||||
OPEN_THINK = "<think>"
|
||||
CLOSE_THINK = "</think>"
|
||||
@@ -178,7 +187,7 @@ class MlxChatModel(BaseChatModel):
|
||||
SAFE = max(len(OPEN_THINK), len(CLOSE_THINK), len(OPEN_TOOL), len(CLOSE_TOOL))
|
||||
|
||||
# enable_thinking=False 모델은 <think> 블록을 생성하지 않으므로 post_think에서 시작
|
||||
state = "pre_think" if self.enable_thinking else "post_think"
|
||||
state = "pre_think" if _enable_thinking else "post_think"
|
||||
buf = ""
|
||||
out: list[ChatGenerationChunk] = []
|
||||
|
||||
|
||||
Reference in New Issue
Block a user