Bo-Yu Chen 陳柏宇

AI & Full-Stack Engineer

Engineer → researcher. Built multi-agent LLM systems in production; now exploring the theory behind agent orchestration and task delegation. 工程師 → 研究者。在生產環境打造多 Agent LLM 系統；現在投入 Agent 編排與任務委派的理論研究。

/ UTSA, Computer Science
// AI & Full-Stack Engineer @ Accounting Firm AI & Full-Stack 工程師 @ 會計師事務所
-- Taipei, Taiwan
@ boyu.chen@my.utsa.edu

Hero section 首頁

From Shipping
to Studying
Agent Systems 從交付到研究
Agent 系統

Engineer turned researcher. I've built multi-agent LLM systems serving real enterprise users. Now I want to understand why they work, and how to make them work better. 從工程師走向研究者。我建構了服務真實企業用戶的多 Agent LLM 系統。現在我想深入理解它們為何有效，以及如何做得更好。

Production Systems 上線系統

50+

Daily Users 每日使用者

10yr

Robotics 機器人經歷

///SHIPPED///PRODUCTION///MULTI-AGENT///RAG///MCP///ENTERPRISE///LLM///SYSTEMS/// ///SHIPPED///PRODUCTION///MULTI-AGENT///RAG///MCP///ENTERPRISE///LLM///SYSTEMS///

Featured work 專案作品

Cost-Aware Hybrid Router 成本感知混合路由器

GitHub ↗ →

4-stage cascade router (Keyword → Embedding → LLM → Hybrid) tested on CLINC150 with 3 seeds. R4 Hybrid matches full-LLM accuracy (82.6% vs 82.9%, McNemar p > 0.3) while cutting 74% of LLM calls. Total experiment cost: $0.44. ACL workshop paper ready for arXiv. 四階段級聯路由器（Keyword → Embedding → LLM → Hybrid），在 CLINC150 上以 3 組種子驗證。R4 Hybrid 準確率與 Full-LLM 無顯著差異（82.6% vs 82.9%，McNemar p > 0.3），同時減少 74% 的 LLM 呼叫。實驗總成本 $0.44。ACL workshop 論文已準備上 arXiv。

Python Claude API TF-IDF SetFit McNemar Test

Aria

→

AI-powered decision system with a 7-member "parliament" of competing analyst personas. 6-agent pipeline (macro → screening → analysis → parliament → verdict), ~10K LOC across 13 Python modules, FastAPI dashboard with SSE streaming. First-week directional accuracy: 80% (8/10). AI 驅動的決策系統，核心是 7 位「議員」角色的對抗式辯論。6-agent 流水線（總經→選股→分析→議會→裁決），13 個 Python 模組共 ~10K LOC，FastAPI 儀表板搭配 SSE 串流。首週方向準確率：80%（8/10）。

Python FastAPI Multi-Agent SSE Notion API NLP

Multi-Agent LLM Platform 多模型 Agent 平台

→

3 agents, 11 MCP tools, 3 orchestration modes. Chat interface: 6.6s avg / 80% accuracy. Quick UI: 5.9s avg / 60% accuracy. The counterintuitive finding that structured input doesn't always win drove my research into Intent Density. 3 個 Agent、11 個 MCP 工具、3 種編排模式。Chat 介面：平均 6.6s / 80% 準確率。Quick UI：平均 5.9s / 60% 準確率。結構化輸入不一定更好的反直覺發現，驅動了我對意圖密度的研究。

Claude API Ollama MCP FastAPI React

taxFormatTool 永盛會計資料轉換系統

→

56 commits, 14K LOC, 28 API endpoints. Multi-tenant platform with row-level security, configurable field mapping, post-export locking. Replaced a manual Excel workflow that took accountants ~4 hours per client; now completes in under 10 minutes. Serving 10+ clients across 3 POS vendors. 56 次 commit、14K 行程式碼、28 個 API。多租戶平台搭配 row-level security、可配置欄位映射、匯出後鎖定。取代了每個客戶要花會計 ~4 小時的手動 Excel 流程，現在 10 分鐘內完成。服務 10+ 客戶、橫跨 3 家 POS 廠商。

FastAPI PostgreSQL AWS JWT Nginx

egcpa_helper 工作小幫手

→

~100 commits, 4 modules, v2.0.0. Equity CTE traversal, Odoo ERP integration, work logs, case tracking. Used daily by 50+ staff for task management and case oversight. Biggest lesson: ERP integration was 3x over budget; 60+ commits on Odoo spike alone taught me to enforce read-only boundaries. 約 100 次 commit、4 個模組、v2.0.0。股權 CTE 遍歷、Odoo ERP 整合、工作日誌、案件追蹤。每天 50+ 位同仁用於任務管理與案件監控。最大教訓：ERP 整合超出預算 3 倍，光 Odoo spike 就 60+ 次 commit，讓我學會強制唯讀邊界。

React FastAPI PostgreSQL Azure AD Docker

POS → MERP Converter

→

Turned a tedious manual process into a one-click pipeline. Maps POS export fields to ERP schema, validates data integrity, flags gaps. Reduced per-client monthly data entry from ~3 hours to ~5 minutes, now used by the entire accounting team for all POS-integrated clients. 將繁瑣的人工流程變成一鍵流水線。POS 匯出欄位自動對應 ERP 結構、驗證資料完整性、標記缺口。將每個客戶每月的資料輸入從約 3 小時縮短至約 5 分鐘，現為全所會計團隊用於所有 POS 整合客戶。

Python pandas Automation

Monthly Report Generator 月結報表生成系統

→

End-to-end report automation: ingests raw POS data, runs financial calculations, generates formatted PDF reports. Cut monthly close from 2 days to ~30 minutes per client. Now handles reporting for 8+ clients. Accountants review and approve rather than build from scratch. 端到端報表自動化：吃進原始 POS 資料、執行財務計算、生成格式化 PDF。將每個客戶的月結從 2 天縮短至約 30 分鐘。現為 8+ 個客戶處理月報，會計只需審閱核准，不再從零開始。

Python ReportLab Data Analysis

Memory Palace Skill Memory Palace 記憶技能

GitHub ↗ →

Cross-conversation memory skill for AI assistants, built on top of MemPalace by milla-jovovich. Adapts the palace metaphor into a lightweight Claude Code skill: wings for topics, notes for sessions, tunnels for cross-references. Hot Cache pattern loads full context in ~170 tokens. Includes visualization, stats, and Obsidian export. AI 跨對話記憶技能，基於 milla-jovovich 的 MemPalace 開發。將記憶宮殿隱喻改造為輕量 Claude Code skill：wings 分主題、notes 存對話、tunnels 串跨主題連結。Hot Cache 模式用 ~170 tokens 載入全局。附帶視覺化、統計和 Obsidian 匯出。

Python Markdown Mermaid Open Source

Ecommerce Platform 電商平台

→

Full-stack e-commerce application built during CS coursework at UTSA. Complete shopping experience with JWT auth, admin dashboard, and Docker deployment. 在 UTSA 課程中開發的全端電商應用。完整購物流程、JWT 認證、管理後台與 Docker 部署。

React Spring Boot PostgreSQL Docker

About me 關於我

The Path Here 我的路徑

I started building things with my hands: 10 years of competitive robotics across WRO, FLL, FRC, VEX, and APRA, always as part of a team. In FRC we had 20+ members splitting into mechanical, electrical, programming, and strategy sub-teams; in WRO my team of three earned a WRO World Championship representing the USA. Those years taught me that orchestration matters more than any single component. A mediocre robot with excellent sub-system coordination beats a brilliant one with poor integration. 我從動手做東西開始：橫跨 WRO、FLL、FRC、VEX、APRA 的十年機器人競賽經歷，每一場都是團隊作戰。FRC 隊伍 20+ 人分成機構、電控、程式、策略小組；WRO 三人小隊拿下代表美國出賽的WRO 世界賽。這些年教會我：協調比單一零件重要。子系統配合好的普通機器人，會贏過整合差的天才機器人。

The turning point was CMU's Robotics Feiyue Program in 2019. Walking through the Gates Center for Computer Science, seeing labs where robots learned from experience rather than following fixed rules, I realized the next frontier wasn't mechanical; it was intelligence. That's when I decided to study CS at UTSA, shifting from hardware systems to software, from robots to AI. 轉折點是 2019 年的 CMU Robotics 飛躍計劃。走進 Gates 電腦科學大樓，看到實驗室裡的機器人不是按照固定規則運作，而是從經驗中學習，我意識到下一個前沿不是機械，而是智慧。那時我決定到 UTSA 讀資工，從硬體系統轉向軟體，從機器人轉向 AI。

At an accounting firm, I got to answer that question with real stakes. I designed and shipped a multi-agent LLM platform with hybrid task routing, dual-interface design, and RAG knowledge bases, serving 50+ daily users handling real financial data. Along the way I built 4 production systems from scratch. 在會計師事務所，我得以在真實場景中回答這個問題。我設計並交付了一套多 Agent LLM 平台，包含混合任務路由、雙介面設計與 RAG 知識庫，每天服務50+ 位使用者處理真實財務資料。過程中從零打造了 4 套上線系統。

But building exposed gaps that engineering alone can't close. My mock router handles 70% of queries through keywords, but where exactly does the remaining 30% fail, and why does Claude succeed where rules don't? I tuned a 5-iteration agent loop by instinct, but I want to know the principled way to set that threshold. I can make agents work, but I want to understand why. These are the questions that pull me toward formal research in LLM agent systems. 但建構的過程暴露了工程手段無法填補的缺口。我的 mock 路由器靠關鍵字處理了 70% 的查詢，但剩下 30% 到底在哪裡失敗？為什麼 Claude 能在規則做不到的地方成功？我靠直覺調了 5 輪的 agent 迴圈上限，但我想知道設定這個閾值的理論依據。我能讓 Agent 運作，但我想理解為什麼。這些問題把我拉向LLM Agent 系統的正式研究。

In a previous life 經歷

2025.12 — present

AI & Full-Stack Engineer

Yong Sheng Accounting Firm 永盛聯合會計師事務所

2021 — 2025.12

B.S. Computer Science

University of Texas at San Antonio

2019

CMU Robotics Feiyue Program

Carnegie Mellon University

2011 — 2020

Competitive Robotics 機器人競賽

WRO / FLL / FRC / VEX / APRA

///WRO///FRC///VEX///APRA///CMU///CHAMPION///WORLD///ROBOTICS/// ///WRO///FRC///VEX///APRA///CMU///CHAMPION///WORLD///ROBOTICS///

Achievements & competitions 競賽與成就

International 國際賽事

2017

WRO World Championship

Certificate of Excellence — Regular Junior, San José, Costa Rica. Represented USA as National Champion. Certificate of Excellence — Regular Junior，哥斯大黎加。以全美冠軍代表美國出賽。

2019

WRO USA National Championships

Finalist — Regular Senior, Team "HY Return X" Finalist — Regular Senior，隊伍「HY Return X」

2025

VEX Robotics World Championship

Competed at VEX Worlds 2025 參加 VEX 2025 世界賽

2016

APRA Hong Kong — Champion

Champion, PK Challenge Sumo — Asia Pacific Robot Alliance, Hong Kong. Taiwan Representatives. 冠軍，PK 挑戰賽 — 亞太機械人聯盟香港選拔賽，台灣代表團。

FIRST Robotics FIRST 機器人

2020

FRC Pittsburgh — 4th Place

FIRST Robotics Competition

2017

FLL Taiwan — 1st Place + Best Presentation

Team "Gas and gaiters": 1st place (Taichung), 2nd Presentation Award, Best Popularity Award at nationals 隊伍「Gas and gaiters」：台中場第一名、創意簡報獎第二、全國最佳人氣獎

2016

FLL Taiwan — Against All Odds Award

Team "Magic Power" (魔幻力量): Against All Odds Award at nationals 隊伍「魔幻力量」：台灣選拔賽百戰百勝獎

2020

FRC — Team Lead

Software & Mechanical Groups Leader, Taipei 軟體與機械組長，台北

Skills & Certifications 技能與證照

2020

50th National Skills Competition — Robotics 第50屆全國技能競賽 — 機器人

Honorable Mention, Northern Regional (Ministry of Labor) 北區分區賽佳作（勞動部）

2019

CMU Robotics Feiyue Program

FIRST Robotics Summer Boot Camp at Carnegie Mellon. Mechanical design, electronics, programming. 卡內基美隆大學機器人研究所暑期課程。機械設計、電子組裝、程式控制。

cert

Machining & CAD — Class C Technician 機械加工丙級 & 電腦繪圖丙級

National technician licenses (Taiwan) 中華民國技術士證照

2023

IBM — Exploratory Data Analysis for ML

Coursera certificate, IBM Machine Learning Professional series Coursera 證書，IBM 機器學習專業系列

2015

TTRA Robot Practice Certification

Primary level — Taiwan Teenage Robot Association 初級合格 — 台灣青少年機器人協會

Leadership & Mentoring 領導與指導

2020

APRA Co-Judge

Co-Judge at Asia Pacific Robot Alliance Competition 亞太機器人聯盟競賽裁判

2019

TIRT Junior Referee & APRA Volunteer

Junior Referee at TOP International Robotics Tournament; APRA competition volunteer (10hrs) TIRT 國際機器人大賽助理裁判；APRA 競賽志工（10小時）

—

FRC Team Mentor

Mentoring FIRST Robotics Competition teams 指導 FRC 機器人競賽隊伍

2022 — 2024

UTSA Taiwanese Student Association — Secretary UTSA 台灣學生會 — 秘書

Served 2 terms as Secretary, coordinating events and connecting Taiwanese students abroad 連任兩屆秘書，統籌活動並連結海外台灣留學生

2018

Mayor's Award — New Taipei City 新北市畢業生市長獎

Outstanding academic achievement, Ge-Zhi High School 格致高中 106學年度成績優異

Tech stack 技術棧

Python

FastAPI

React

Claude / LLM APIs

PostgreSQL

AWS / Azure

RAG / Embeddings

MCP Protocol

Open questions 正在探索的問題

Open Questions 正在探索的問題

Questions I ran into while shipping production agent systems, the ones where the right answer wasn't obvious from the code, and where I suspect a systematic answer is possible. 這些是我在交付上線 Agent 系統時遇到、答案在程式碼裡看不出來、但我認為存在系統性解答的問題。

LLM Agent Orchestration in Enterprise Environments 企業環境中的 LLM Agent 編排

Building a multi-agent system for an accounting firm taught me that the hard problem isn't making agents work; it's making them work together, reliably, at acceptable cost. My system uses hybrid routing (rule-based + LLM-classified) to delegate tasks across models, and most design choices were made by intuition rather than principle. These are the questions I'd like to answer more rigorously. 在會計事務所建構多 Agent 系統讓我明白，困難的不是讓 Agent 運作，而是讓它們協作、可靠地、以可接受的成本運作。我的系統用混合路由（規則 + LLM 分類）分配跨模型任務，多數設計選擇基於直覺而非原則。以下是我想更嚴謹回答的問題。

When does structured input beat natural language? 結構化輸入什麼時候才真的贏過自然語言？

My dual-interface experiment showed Chat hitting 80% accuracy while a form-based Quick UI only reached 60%, the opposite of what I expected. The form's dropdown selections get stringified into natural language before reaching the orchestrator, and the stringification step is where information gets lost. The interesting question: is there a measurable property of the input ("intent density," effective slots per token) that predicts when structured wins and when it loses? 我的雙介面實驗顯示 Chat 準確率 80%，而表單式 Quick UI 只有 60%，跟我預期相反。表單的下拉選擇在送入編排器前會被字串化成自然語言，資訊就在這個字串化步驟遺失。有趣的問題：是否存在一個可量測的輸入屬性（「意圖密度」，每 token 的有效 slot 數），能預測結構化何時勝出、何時輸？

What I'd do next: treat ATIS and SNIPS as known baselines, build a small domain-specific set with paired NL/structured versions of the same intent, measure token count and routing accuracy across both modalities. 接下來會做什麼：用 ATIS 和 SNIPS 當既有 baseline，建一個小型領域資料集，同一個 intent 配對自然語言/結構化兩種版本，跨模態量測 token 數和路由準確率。

How far can cheap routers go before the LLM has to step in? 便宜的 router 能走多遠，LLM 才真的非介入不可？

My three-mode orchestrator showed keyword rules handling ~70% of queries on their own. My follow-up CLINC150 experiment (3 seeds, McNemar-tested) found a keyword → embedding → LLM cascade matches full-LLM routing accuracy (82.6% ± 1.2pp vs 82.9% ± 0.6pp) while calling the LLM on only 26% of queries — a 74% LLM cost reduction. That's one dataset in English commercial domains. I don't yet know whether enterprise/domain-specific traffic is systematically more cascadeable than general-purpose traffic, or whether the cascade's ceiling simply rises and falls with keyword coverage. 我的三模式編排器顯示，關鍵字規則單獨就處理 ~70% 查詢。後續在 CLINC150 上的實驗（3 seeds、McNemar 檢定）發現 keyword → embedding → LLM cascade 能達到與全量 LLM 相同的準確率（82.6% ± 1.2pp vs 82.9% ± 0.6pp），但只對 26% 的查詢呼叫 LLM — LLM 成本降低 74%。這只是一個英文商業領域的資料集。我還不知道企業/特定領域的流量是否系統性地比通用流量更適合 cascade，或 cascade 的天花板只是跟著關鍵字覆蓋率上下浮動。

What I'd do next: replicate on BANKING77 + HWU64, ablate thresholds to draw the full cost/accuracy Pareto curve, and compare against a fine-tuned DistilBERT baseline to separate "cascade is clever" from "the dataset is easy". 接下來會做什麼：在 BANKING77 + HWU64 上複現，做閾值 ablation 畫出完整的成本/準確率 Pareto 曲線，並對上 fine-tune 過的 DistilBERT baseline，區分「cascade 聰明」和「資料集簡單」。

Adjacent interests 相鄰研究興趣

Multi-Model Cost Optimization Agent Memory & Context Engineering Tool Use & MCP Protocol Fail-Closed Security in Agent Systems Evaluation Metrics for Agent Systems LLM-as-Judge

///PAPERS///NOTES///REFLECTIONS///RESEARCH///READING///OPINIONS///BLOG///LLM/// ///PAPERS///NOTES///REFLECTIONS///RESEARCH///READING///OPINIONS///BLOG///LLM///

My writing & blog 文章與部落格

Writing & Blog 文章與部落格

Paper readings, system reflections, and thoughts on where LLM agents are heading. 論文閱讀、系統反思、以及對 LLM Agent 發展方向的想法。

⟐

2026.04

Hybrid Router: Matching LLM Accuracy with 74% Fewer LLM Calls on CLINC150 Hybrid Router 實驗：在 CLINC150 上用 26% 的 LLM 呼叫達到相同準確率

I tested four routing strategies on CLINC150 across 3 random seeds (n=1,200 pooled LLM calls). A keyword→embedding→LLM cascade matches full-LLM accuracy (82.6% ± 1.2pp vs 82.9% ± 0.6pp, McNemar not significant in 3/3 seeds) while calling the LLM on only 26% of queries — a 74% LLM cost reduction with no accuracy loss. 我在 CLINC150 上跨 3 個 random seed（總 pooled n=1,200）測試了四種路由策略。keyword→embedding→LLM 的 cascade 與全量 LLM 路由準確率相同（82.6% ± 1.2pp vs 82.9% ± 0.6pp，McNemar 在 3/3 seeds 皆 not significant），但只對 26% 查詢呼叫 LLM — LLM 成本降低 74%，準確率無損。

Experiment Routing

Read full blog 閱讀全文 →

⟐

2026.02

Dual Interface Experiment: Chat vs. Quick UI Intent Parsing 雙介面實驗：Chat vs. Quick UI 的意圖解析效率

Two interfaces, same MCP tools. The UI was 39% faster on ambiguous queries. The difference was intent density, not interface quality. 兩個介面、相同 MCP 工具。UI 在模糊查詢上快了 39%。差異在意圖密度，不在介面品質。

Technical Reflection

→

2026.04

Paper Read: To CoT or Not: Chain-of-Thought Isn't Always the Answer 論文閱讀：To CoT or Not，思維鏈不是萬靈丹

UT Austin's Durrett meta-analyzes 100+ papers. CoT mostly helps on math only. What this means for agent routing costs. UT Austin Durrett 後設分析 100+ 篇論文。CoT 主要只在數學上有用。這對 agent 路由成本的意義。

Paper Review Reasoning

→

⚠

2026.03

Dissecting Claude Code: What 512K Lines of Leaked Source Reveal 拆解 Claude Code：51 萬行洩漏源碼的架構解讀

Six-layer architecture, fail-closed tool design, three-tier memory, five-level compression, KAIROS daemon mode, and anti-distillation: a deep technical read. 六層架構、fail-closed 工具設計、三層記憶、五級壓縮、KAIROS 守護程式模式、反蒸餾：一次深度技術解讀。

Architecture Analysis

→

▣

2026.03

Paper Read: TheAgentCompany: Agents Complete 24% 論文閱讀：TheAgentCompany，Agent 只完成 24%

CMU's Neubig benchmarks agents on real workplace tasks. Best model: 24%. Why that's both damning and expected. CMU Neubig 在真實工作任務上測試 agent。最佳模型：24%。為什麼這既令人失望又在預期中。

Paper Review Evaluation

→

⊘

2026.03

Paper Read: Multi-Agent ToT Validator: Reasoning Needs a Referee 論文閱讀：多 Agent ToT 驗證器，推理需要裁判

UTSA's Najafirad adds a validator agent to Tree-of-Thought. The pattern matters more than the 5.6% gain. UTSA Najafirad 在思維樹上加了驗證 agent。這個模式比 5.6% 的增益更重要。

Paper Review Reasoning

→

◎

2026.02

Paper Read: The Context Trap: Modular vs. Monolithic 論文閱讀：The Context Trap，模組化 vs. 單體

E2E audio models degrade on multi-turn dialogue. But is modularity inherently better, or just a crutch for weaker models? E2E 語音模型在多輪對話中退化。但模組化是天生更好，還是只是弱模型的暫時拐杖？

Paper Review Dialogue

→

◇

2026.02

Paper Read: SalesBot: Strategy ≠ Planning in Agent Design 論文閱讀：SalesBot，Agent 設計中策略 ≠ 規劃

CoT-injected dialogue strategies work for sales. But does the approach generalize to domains with wider strategy trees? CoT 注入的對話策略在銷售中有效。但這個方法能泛化到策略樹更寬的領域嗎？

Paper Review Dialogue

→

◈

2026.01

From Odoo Read-Only to Multi-Agent: My Architecture Evolution 從 Odoo Read-Only 到多 Agent：我的系統架構演進

How a 100-commit internal platform with Odoo JSON-RPC integration led me to design a multi-agent orchestrator. 一個 100 次提交的內部平台如何帶領我走向多 Agent 編排器的設計。

Reflection Technical

→

2026.01

Three-Mode Orchestrator: Mock → Ollama → Claude Hybrid Routing 三模式 Orchestrator：Mock → Ollama → Claude 的混合路由

My orchestrator supports mock, local Qwen2.5:7b, and cloud Claude Sonnet. The latency gap is 5x. Here's the design. 我的編排器支援 mock、本地 Qwen2.5:7b 和雲端 Claude Sonnet。延遲差距是 5 倍。以下是設計過程。

Technical Opinion

→

△

2025.12

32 Skills + RAG: Building a Domain Knowledge System for Accounting 32 個 Skill + RAG：為會計事務所打造領域知識系統

22 tax/accounting skills, a pure JSON RAG pipeline, and Claude Agent SDK. How I built a knowledge system without ML models. 22 個稅務會計 skill、純 JSON 的 RAG pipeline、Claude Agent SDK。我如何不用 ML 模型就建構知識系統。

Technical Reflection

→

{}

2025.12

Paper Read: AutoGen: Comparing to My Multi-Agent System 論文閱讀：AutoGen，與我的多 Agent 系統之比較

Microsoft's conversational multi-agent framework vs. my centralized orchestrator. Same problem, opposite design choices. 微軟的對話式多 Agent 框架 vs. 我的集中式編排器。相同問題，相反的設計選擇。

Paper Review Technical

→

⟳

2025.11

Paper Read: ReAct: Why My Agent Loop Works 論文閱讀：ReAct，為什麼我的 Agent 循環有效

I built a ReAct loop without knowing it had a name. Comparing my mock vs. Claude gap to the paper's ablation studies. 我建了一個 ReAct 循環卻不知道它有名字。將我的 mock vs. Claude 差距與論文的消融研究做比較。

Paper Review Technical

→

Get in touch 聯繫方式

Let's Connect 與我聯繫

Interested in collaborating on LLM agent systems research. Open to discussion and feedback on my work. 歡迎 LLM Agent 系統研究方面的合作邀請。歡迎交流與對我作品的回饋。

boyu.chen@my.utsa.edu GitHub

From Shippingto StudyingAgent Systems 從交付到研究Agent 系統

Cost-Aware Hybrid Router 成本感知混合路由器

Aria

Multi-Agent LLM Platform 多模型 Agent 平台

taxFormatTool 永盛會計資料轉換系統

egcpa_helper 工作小幫手

POS → MERP Converter

Monthly Report Generator 月結報表生成系統

Memory Palace Skill Memory Palace 記憶技能

Ecommerce Platform 電商平台

The Path Here 我的路徑

AI & Full-Stack Engineer

B.S. Computer Science

CMU Robotics Feiyue Program

Competitive Robotics 機器人競賽

International 國際賽事

FIRST Robotics FIRST 機器人

Skills & Certifications 技能與證照

Leadership & Mentoring 領導與指導

Open Questions 正在探索的問題

LLM Agent Orchestration in Enterprise Environments 企業環境中的 LLM Agent 編排

When does structured input beat natural language? 結構化輸入什麼時候才真的贏過自然語言？

How far can cheap routers go before the LLM has to step in? 便宜的 router 能走多遠，LLM 才真的非介入不可？

Writing & Blog 文章與部落格

Hybrid Router: Matching LLM Accuracy with 74% Fewer LLM Calls on CLINC150 Hybrid Router 實驗：在 CLINC150 上用 26% 的 LLM 呼叫達到相同準確率

Dual Interface Experiment: Chat vs. Quick UI Intent Parsing 雙介面實驗：Chat vs. Quick UI 的意圖解析效率

Paper Read: To CoT or Not: Chain-of-Thought Isn't Always the Answer 論文閱讀：To CoT or Not，思維鏈不是萬靈丹

Dissecting Claude Code: What 512K Lines of Leaked Source Reveal 拆解 Claude Code：51 萬行洩漏源碼的架構解讀

Paper Read: TheAgentCompany: Agents Complete 24% 論文閱讀：TheAgentCompany，Agent 只完成 24%

Paper Read: Multi-Agent ToT Validator: Reasoning Needs a Referee 論文閱讀：多 Agent ToT 驗證器，推理需要裁判

Paper Read: The Context Trap: Modular vs. Monolithic 論文閱讀：The Context Trap，模組化 vs. 單體

Paper Read: SalesBot: Strategy ≠ Planning in Agent Design 論文閱讀：SalesBot，Agent 設計中策略 ≠ 規劃

From Odoo Read-Only to Multi-Agent: My Architecture Evolution 從 Odoo Read-Only 到多 Agent：我的系統架構演進

Three-Mode Orchestrator: Mock → Ollama → Claude Hybrid Routing 三模式 Orchestrator：Mock → Ollama → Claude 的混合路由

32 Skills + RAG: Building a Domain Knowledge System for Accounting 32 個 Skill + RAG：為會計事務所打造領域知識系統

Paper Read: AutoGen: Comparing to My Multi-Agent System 論文閱讀：AutoGen，與我的多 Agent 系統之比較

Paper Read: ReAct: Why My Agent Loop Works 論文閱讀：ReAct，為什麼我的 Agent 循環有效

Let's Connect 與我聯繫

From Shipping
to Studying
Agent Systems 從交付到研究
Agent 系統