News
RFCB: Russian Function Calling Benchmark
Abstract
We present RFCB (Russian Function-Calling Benchmark) – a schema-preserving Russian localization of selected Single-turn and Multi-turn subsets from the Berkeley Function-Calling Leaderboard (BFCL). RFCB retains the structure, evaluation semantics, and JSON schemas of the original BFCL while providing fully translated user prompts, documentation, and string-valued targets in Russian. This benchmark enables apples-to-apples cross-lingual comparison of tool-using LLMs and serves as a foundation for evaluating function-calling capabilities in Russian. We evaluate proprietary and open-source models of various sizes. On top of the localized benchmark and its evaluation, we use a training pipeline that collects executable trajectories and supports three optimization regimes: supervised fine‑tuning (SFT), direct preference optimization (DPO), and group relative policy optimization (GRPO). The pipeline is implemented with a modified Feedback-Driven Tool-Use Improvements (FTRL) based framework that performs multi‑path exploration. We report cross‑lingual comparisons on BFCL single‑turn metrics, multi‑turn state‑based success, and robustness to long context and missing information, together with efficiency indicators. Our results show that single-turn accuracy remains close to baseline levels, with Russian consistently lagging behind English, whereas multi-turn evaluation exposes clear benefits of scaling and reinforcement-based optimization. RL-based methods (DPO, GRPO) markedly improve multi-turn behaviors across both languages. In particular, GRPO training yields the highest overall scores, moreover, with Russian results exceeding English by +6.5 percentage points, effectively reversing the usual cross-lingual gap.
Keywords
Edition
Proceedings of the Institute for System Programming, vol. 38, issue 3, part 4, 2026, pp. 131-144
ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).
DOI: 10.15514/ISPRAS-2026-38(3)-51
For citation
Full text of the paper in pdf
Back to the contents of the volume