youlbot

Files

T

shinalok 68f741af72 Phase 17: Multimodal image understanding via analyze_image tool

Dual-model approach (C): Qwen3-8B handles conversation, Qwen2.5-VL-7B
analyzes images on demand via analyze_image LangChain tool.

- services/model/mlx_vision_model.py: MlxVisionModel (mlx-vlm wrapper, lazy load)
- services/agent/tools.py: make_vision_tool(vision_model, image_path)
- agent_service.py: stream_response(image_path=None), dynamic tool binding
  via config["image_path"] — thread-safe per-request rebinding
- container.py: vision_model Singleton provider
- config.py: vision_enabled, vision_model_id, vision_max_tokens
- api.py: image_base64 in ChatRequest, decode to temp file, cleanup after stream

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-06-02 13:52:10 +09:00

agent

Phase 17: Multimodal image understanding via analyze_image tool

2026-06-02 13:52:10 +09:00

chat

- **Bootstrap IoC-based architecture with modular services.**

2026-04-25 01:14:37 +09:00

Implement Phase 12 feedback, Phase 13 Semantic Chunker, Phase 13-B Reranker, Bug 5 thinking fix

2026-05-29 17:41:36 +09:00

events

- **Bootstrap IoC-based architecture with modular services.**

2026-04-25 01:14:37 +09:00