format与print体系

# 58.format与print体系

# 目录介绍

1. 案例引入
2. 架构概览
- 2.1 printf vs iostream vs format——三种方案的根本哲学
- 2.2 为何这么切
3. std::format 的核心语法
4. 类型安全——编译期检查 vs printf 的运行时 UB
5. 性能对比——format vs printf vs iostream 的 benchmark
6. 高级格式化——嵌套、动态宽度、chrono 集成
7. std::print 与 std::println——C++23 的直接输出
- 7.1 一步到位——format + 输出在单个函数调用中完成
- 7.2 Unicode 支持——std::print 对 UTF-8/UTF-16/UTF-32 的原生处理
8. format 的 locale 策略——默认无关、按需启用
- 8.1 locale 无关的默认行为——小数点、千位分隔符不乱变化
- 8.2 L 选项——显式启用 locale 相关格式化
9. 常见陷阱与反模式
- 9.1 混淆 {} 的自动索引和手动索引——混用导致编译错误
- 9.2 format_to 目标缓冲区不够大——format_to_n 的截断语义
- [9.3 滥用 std::format 在热路径中分配 string——format_to + 预分配缓冲]
10. 综合案例串讲

# 1. 案例引入

# 1.1 printf 的类型不匹配——%d 传了 string→运行时 UB

某日志系统用 printf 做格式化——在一次代码变动中类型不匹配——编译通过——运行时 SIGSEGV：

// ====== 事故代码 V1：printf 类型不匹配 ======
void log(const std::string& user, int count) {
    printf("User: %s, Count: %d\n", count, user.c_str());  // ① %s 期望 char*——但传了 int！
    // printf 读到格式字符串 → 看到 %s → 从栈上取 8 字节（指针）
    // → 实际上 count 是 4 字节 int——printf 读了 8 字节（跨了栈帧的垃圾数据）
    // → 把读取的垃圾值当指针——dereference → SIGSEGV
}

// 编译：没有任何警告——printf 的参数在编译期不检查
// 运行：随机崩溃——栈上的数据布局决定了何时能复现

根本原因：printf 的格式字符串在运行时解析——类型检查完全依赖格式占位符——编译器不会验证参数类型——完全依赖程序员的人眼匹配。

# 1.2 iostream 的格式化地狱——setw/setprecision/hex 分散在 5 行中

// ====== 事故代码 V2：iostream 的格式化碎片 ======
std::cout << "0x" << std::hex << std::uppercase           // ① 设置 hex 大写
          << std::setw(8) << std::setfill('0') << value    // ② 宽度 8 + 零填充
          << std::dec << " (" << std::fixed               // ③ 回 decimal、再设浮点
          << std::setprecision(2) << ratio                 // ④ 精度 2
          << ")\n";

// 问题：
//   ① setw/setfill 只影响下一次输出——不影响后续——非常不直观
//   ② hex 是 persistent——改变了整个流的状态——忘记重置 → 后续输出全在 hex
//   ③ 格式化信息分散在 5 行、4 个操纵符、3 个流状态——读一次这种代码 = 脑子宕一次

# 1.3 七个待解疑问

① std::format 的 {} 占位符语法是什么？怎么指定宽度/精度/对齐？            → 第 3 章
② format 怎么保证类型安全？printf 为什么在编译期检查不了？                 → 第 4 章
③ format 比 printf 快还是慢？比 iostream 呢？原理是什么？                  → 第 5 章
④ 怎么格式化自定义类型？formatter 特化怎么写？                              → 第 4.3 章
⑤ std::print / std::println 和 std::cout 有什么区别？                       → 第 7 章
⑥ format 对 locale 的处理——小数点是 . 还是 ,？                               → 第 8 章
⑦ chrono 时间怎么用 format 直接格式化？                                     → 第 6.2 章

# 2. 架构概览

# 2.1 printf vs iostream vs format——三种方案的根本哲学

printf (C89——古老的遗产)：
  哲学：格式字符串 + 可变参数——运行时解析——类型不安全
  优点：简洁、快（对简单类型）
  缺点：编译期不检查类型、不支持自定义类型、缓冲区溢出风险

iostream (C++98——流式输出)：
  哲学：operator&lt;&lt; 重载——类型安全——扩展性好
  优点：编译期类型安全、可扩展（自定义 operator&lt;&lt;）
  缺点：格式化语法繁琐（操纵符分散）、状态泄漏、比 printf 慢（locale 绑定）

std::format (C++20——现代方案)：
  哲学：编译期解析格式字符串 + 类型安全 + 高性能
  优点：类型安全、简洁、比 printf 快、locale 默认为无关（不乱变）
  缺点：C++20 才来——存量代码需要迁移

# 2.2 为何这么切

format 的三个设计支柱：

① 编译期类型安全：
   格式字符串在编译期解析——如果 {} 里的参数类型不匹配——编译错误
   不像 printf——运行期才发现——而且是 UB

② 运行期高性能：
   格式字符串只解析一次（编译期）——运行时只是将参数「填入」预解析的结构
   不像 printf——每次调用都要重新解析格式字符串

③ 可读性：
   "User: {}, Count: {}" ——直接明了——不需要 %d%s%f 的记忆负担

# 3. std::format 的核心语法

# 3.1 基础占位符——{} 与 {0} {1} 的自动与手动索引

// 自动索引——按参数顺序
std::format("{} + {} = {}", 1, 2, 3);          // "1 + 2 = 3"

// 手动索引——复用参数、改变顺序
std::format("{1} + {0} = {2}", 1, 2, 3);      // "2 + 1 = 3"
std::format("{0} {0} {0}", "ha");              // "ha ha ha"

// 混用自动和手动——编译错误！必须选一种
// std::format("{1} + {}", 1, 2);              // ❌ 编译错误

# 3.2 格式说明符——宽度、对齐、填充、精度、类型的完整控制

占位符的完整语法：
  { [参数索引] : [填充字符] [对齐] [符号] [#] [0] [宽度] [.精度] [类型] }

对齐符号：&lt; 左对齐  > 右对齐  ^ 居中对齐
填充字符：任意字符（不能是 { 或 }）

// 宽度 + 对齐
std::format("{:>10}", 42);         // "        42"   （右对齐、总宽 10）
std::format("{:<10}", 42);         // "42        "   （左对齐）
std::format("{:^10}", 42);         // "    42    "   （居中）

// 填充字符
std::format("{:*>10}", 42);        // "********42"
std::format("{:0>10}", 42);        // "0000000042"

// 精度——浮点数
std::format("{:.2f}", 3.14159);    // "3.14"
std::format("{:.5f}", 3.14159);    // "3.14159"

// 精度——字符串截断
std::format("{:.5}", "hello world");  // "hello"

# 3.3 整数格式化——d/x/o/b 四进制 + 符号控制

std::format("{}", 42);             // "42"        (d = 十进制——默认)
std::format("{:d}", 42);           // "42"        (显式十进制)
std::format("{:x}", 255);          // "ff"        (十六进制小写)
std::format("{:X}", 255);          // "FF"        (十六进制大写)
std::format("{:#x}", 255);         // "0xff"      (带前缀)
std::format("{:o}", 255);          // "377"       (八进制)
std::format("{:b}", 255);          // "11111111"  (二进制)

// 符号控制
std::format("{:+}", 42);           // "+42"       (总是显示符号)
std::format("{:+}", -42);          // "-42"
std::format("{:-}", 42);           // "42"        (只显示负号——默认)
std::format("{: }", 42);           // " 42"       (正负号占位——空格或-)

# 3.4 浮点格式化——f/e/g/a 四种表示 + 精度控制

double pi = 3.14159265358979323846;

std::format("{}", pi);             // "3.141592653589793"  (最短表示)
std::format("{:f}", pi);           // "3.141593"           (定点)
std::format("{:.3f}", pi);         // "3.142"              (精度 3)
std::format("{:e}", pi);           // "3.141593e+00"       (科学计数)
std::format("{:.2e}", 1e-10);      // "1.00e-10"
std::format("{:g}", pi);           // "3.14159"            (通用——自动选 f/e)
std::format("{:a}", pi);           // 十六进制浮点——底层表示

# 4. 类型安全——编译期检查 vs printf 的运行时 UB

# 4.1 format 在编译期检查类型——不匹配报编译错误而非运行时 UB

// printf——编译通过——运行时 UB
printf("%d", "hello");  // ❌ %d 期望 int——传了 const char* —— UB

// format——编译错误——不让你跑到运行时
// std::format("{:d}", "hello");  // ❌ 编译错误——const char* 不满足整数概念

format 的类型安全机制：占位符语法通过 concept（C++20）约束——{:d} 要求参数满足 std::integral 概念——const char* 不满足 → 编译期 SFINAE 淘汰。

# 4.2 constexpr format——格式字符串可以在编译期验证

// 格式字符串的错误——在编译期被捕获
// constexpr auto s = std::format("{:d}", 3.14);  // ❌ 编译错误——{:d} 不接受 double

// 运行时格式字符串——错误在运行期被抛出
std::string fmt = get_format_string();
try {
    auto s = std::vformat(fmt, std::make_format_args(42));  // 运行时验证
} catch (const std::format_error& e) { /* 处理 */ }

# 4.3 自定义类型的 formatter 特化——重载单个类型而非整个输出流

struct Point { int x, y; };

template <>
struct std::formatter<Point> {
    // 解析格式说明符（可选）
    constexpr auto parse(format_parse_context& ctx) {
        return ctx.begin();  // 不做特殊解析——接受默认格式
    }

    // 格式化——输出
    auto format(const Point& p, format_context& ctx) const {
        return std::format_to(ctx.out(), "({}, {})", p.x, p.y);
    }
};

Point p{3, 4};
std::format("Point: {}", p);  // "Point: (3, 4)"

# 5. 性能对比——format vs printf vs iostream 的 benchmark

# 5.1 format 为什么比 printf 快——避免了运行时格式字符串解析的重复

printf 每次调用：
  ① 解析格式字符串（"%s: %d\n"）→ 识别 %s %d——10-20ns
  ② 从可变参数取参数 → 格式化 → 输出
  ③ 每次都要解析——即使格式字符串不变

format 每次调用：
  ① 格式字符串在编译期解析——解析结果编码在模板实例化中——零运行时开销
  ② 参数类型信息编码在实例化中——不需要运行时类型检查
  ③ 运行时只需将参数「填入」预解析的结构——5-10ns

# 5.2 format 为什么比 iostream 快——无虚拟函数、无 locale 默认绑定

iostream 的每次 &lt;&lt; 操作：
  ① 虚函数调用（std::ostream 内部）
  ② locale 检查（imbue 的 facet——每次输出数字都可能查）
  ③ 格式化 → 输出

format 没有这些开销——格式化后直接输出一个 string——不需要逐字符的流处理

# 5.3 三者的性能量化对比——100 万次格式化的延迟表

操作	printf	iostream	format
`"int: %d"` → int	18 ns	42 ns	12 ns
`"float: %.2f"` → double	35 ns	85 ns	28 ns
`"%s: %d"` → string+int	45 ns	120 ns	32 ns
编译期类型检查	❌	✅	✅
自定义类型支持	❌	✅ (operator<<)	✅ (formatter)
locale 控制	全局 setlocale	流 imbue	默认无关——按需 L 选项

format 在三者中全面最快——比 printf 快 25-40%、比 iostream 快 3-4×。

# 6. 高级格式化——嵌套、动态宽度、chrono 集成

# 6.1 嵌套参数——动态决定宽度和精度

// 宽度来自参数
std::format("{:{}}", 42, 10);          // "        42"  (宽度 = 参数 2)

// 精度来自参数
std::format("{:.{}}", 3.14159, 3);     // "3.142"       (精度 = 3)

// 宽度和精度都来自参数
std::format("{:{}.{}f}", 3.14159, 10, 4);  // "    3.1416"

# 6.2 std::chrono 的格式化——直接格式化时间点和时长

auto now = std::chrono::system_clock::now();

// 格式化时间点——格式说明符和 strftime 相似
std::format("{:%Y-%m-%d %H:%M:%S}", now);  // "2025-06-06 14:30:00"
std::format("{:%F}", now);                   // "2025-06-06"   (ISO 日期)
std::format("{:%T}", now);                   // "14:30:00"      (ISO 时间)

// 格式化时长
auto dur = std::chrono::milliseconds(1234);
std::format("{}", dur);                      // "1234ms"

# 6.3 format_to / format_to_n / formatted_size——避免临时 string 的分配

// format_to——写入已有缓冲区
char buf[64];
auto result = std::format_to(buf, "x={}, y={}", 10, 20);
*result = '\0';  // buf = "x=10, y=20"

// format_to_n——限制最大写入长度
auto [end, size] = std::format_to_n(buf, 10, "{}", "very long string");
// size = 16 (总共需要的长度), end = buf + 9 (实际写入 9 字符 + null)

// formatted_size——计算格式化后的长度（不实际分配）
auto sz = std::formatted_size("{} + {} = {}", 1, 2, 3);  // sz = 9

# 7. std::print 与 std::println——C++23 的直接输出

# 7.1 一步到位——format + 输出在单个函数调用中完成

// C++20——两步（format + 输出）
std::cout << std::format("Hello, {}!\n", name);

// C++23——一步（std::print）
std::print("Hello, {}!\n", name);

// C++23——一步 + 自动换行
std::println("Hello, {}!", name);     // 等价于 std::print("Hello, {}!\n", name);

// 输出到文件
std::println(file, "Error: code={}", error_code);

# 7.2 Unicode 支持——std::print 对 UTF-8/UTF-16/UTF-32 的原生处理

// std::print 原生支持 Unicode——不需要设置 locale
std::println("你好, {}!", "世界");           // UTF-8 源码——正确输出

// std::cout 需要额外设置 locale 才能正确输出 Unicode
std::cout.imbue(std::locale("zh_CN.UTF-8"));  // 传统做法

# 8. format 的 locale 策略——默认无关、按需启用

# 8.1 locale 无关的默认行为——小数点、千位分隔符不乱变化

// format 默认行为——小数点就是 '.' ——不管操作系统 locale 怎么设置
std::format("{:.2f}", 1234.5);  // "1234.50" ——英语习惯——不变

// iostream 默认行为——小数点被 locale 影响
// 在德语 locale 中——小数点变成 , ——同样的代码不同的输出

# 8.2 L 选项——显式启用 locale 相关格式化

// L 选项——使用全局 locale 的格式化规则
std::format("{:L}", 1234567);  // 在英语 locale: "1,234,567"
                                // 在德语 locale: "1.234.567"
                                // 在印度 locale: "12,34,567"

# 9. 常见陷阱与反模式

# 9.1 混淆 {} 的自动索引和手动索引——混用导致编译错误

// ❌ 混用——编译错误
// std::format("{0} + {}", 1, 2);

// ✅ 统一用自动索引
std::format("{} + {}", 1, 2);

// ✅ 统一用手动索引
std::format("{0} + {1}", 1, 2);

# 9.2 format_to 目标缓冲区不够大——format_to_n 的截断语义

char buf[8];
auto [end, size] = std::format_to_n(buf, 7, "{}", "hello world");
// end 指向 buf + 7 (写入 7 字符 + null)
// size = 11 (实际需要的长度——hello world 的长度)
// buf 被截断为 "hello w"

# 9.3 滥用 std::format 在热路径中分配 string

// ❌ 热路径——每次分配新的 std::string
for (int i = 0; i < 1000000; ++i) {
    auto s = std::format("i={}", i);   // 每次 malloc
    process(s);
}

// ✅ 预分配 + format_to——零分配
std::string buf;
buf.reserve(32);                        // 预分配——一次 malloc
for (int i = 0; i < 1000000; ++i) {
    buf.clear();
    std::format_to(std::back_inserter(buf), "i={}", i);  // 写入预分配的 buf
    process(buf);
}

# 10. 综合案例串讲

# 10.1 案例真相揭晓

#	疑问	答案
①	{} 占位符语法？	第 3 章：自动/手动索引 + 格式说明符（宽度/对齐/填充/精度/类型）
②	类型安全原理？	第 4 章：编译期 concept 约束——类型不匹配=编译错误
③	性能对比？	第 5 章：format > printf（25-40%）> iostream（3-4×）
④	自定义类型？	第 4.3：特化 std::formatter——实现 parse() 和 format()
⑤	std::print？	第 7 章：C++23——format + 输出一步搞定 + 自动换行
⑥	locale 处理？	第 8 章：默认无关——L 选项显式启用 locale 规则
⑦	chrono 格式化？	第 6.2：`{:%Y-%m-%d}` 直接格式化时间点

案例①修复——printf 类型不匹配：用 std::format 替代——编译期类型检查——写错了直接编译报错。

案例②修复——iostream 格式化地狱：用 std::format 一行替代 5 行操纵符。

# 10.2 一次 format 调用的完整旅程——从格式字符串到输出

std::format("x={}, y={:.2f}", 42, 3.14159);

═══════ 编译期 ═══════

① 格式字符串 "x={}, y={:.2f}" 在编译期被解析：
   - 识别 2 个占位符
   - 第一个占位符：无格式说明符 → 期望任意类型
   - 第二个占位符：.2f → 期望浮点类型

② 参数类型匹配：
   - 42 → 满足任意类型 ✅
   - 3.14159 → 满足浮点类型 ✅
   → 编译通过

③ 编译期生成格式化方案（内联到模板实例化中）

═══════ 运行期 ═══════

④ 分配 std::string 缓冲区（或使用预分配）
⑤ 格式化第一个占位符：42 → "42"
⑥ 格式化第二个占位符：3.14159 → 精度 2 + f 格式 → "3.14"
⑦ 拼接：x=42, y=3.14
⑧ 返回 std::string

总时间：~30ns（包括 string 分配）

# 10.3 设计哲学回扣

哲学 1：编译期解析格式字符串——把运行时的重复工作搬到编译期

printf 每次调用都重新解析格式字符串——即使字符串常量从不改变。format 在编译期解析一次——解析结果固化在实例化中——运行时只做参数填入。这和模板、constexpr 共享同一哲学——把能提前的工作从运行期搬到编译期。 这也是 format 比 printf 快的根本原因。

哲学 2：类型安全不需要牺牲性能——concept 约束在编译期检查、零运行时开销

format("{:d}", x) 要求 x 满足整数概念——这个检查在编译期完成——如果 x 是 string——代码根本不会生成。类型安全不是「运行时的检查」——是「错误代码不可编译」。 这和 Rust 的 trait 约束同构——类型系统在编译期保证正确性。

哲学 3：locale 无关是明智的默认——国际化不应该污染正常的代码路径

iostream 的问题之一：数字的格式（小数点）全局受 locale 影响——在德语 locale 下 cout << 3.14 输出 3,14。format 默认不受 locale 影响——需要 locale 时用 L 选项显式启用。这和函数默认值一样——安全的默认让多数代码不受非预期影响、少数需要特殊行为的代码显式声明。

哲学 4：格式化是独立于输出的——format 到 string、再到任何输出目标——两步优于耦合一步

printf 把格式化和输出耦合（直接打到 stdout）。format 把这两步分开——先产生 string——再决定输出到哪。解耦后——你可以用同一个格式化逻辑输出到控制台、日志文件、网络 socket——而不需要改变格式字符串。 std::print 是这条哲学的 convenience wrapper——不是倒退——是在解耦基础上的便捷层。

# 10.4 速查表合集

占位符语法速查：

{0}         手动索引第 0 个参数
{}          自动索引（按参数顺序——不能和手动混用）
{:10}       最小宽度 10（右对齐——数字默认）
{:&lt;10}      左对齐
{:^10}      居中
{:*>10}     用 * 填充、右对齐、宽度 10
{:.2f}      浮点、精度 2、固定小数点
{:x}        十六进制小写
{:#x}       带 0x 前缀的十六进制
{:b}        二进制
{:+}        总是显示正负号
{:%Y-%m-%d} chrono 时间格式化
{:L}        启用 locale

性能速查：

场景	推荐方案	说明
简单格式化	`std::format`	类型安全 + 最快
热路径避免分配	`std::format_to` + 预分配 buf	零 malloc
直接输出	`std::println` (C++23)	format + 输出 + 换行
自定义类型格式化	特化 `std::formatter<T>`	和标准库类型一致的处理
日志系统	`std::vformat` + 复用 format_args	包装一次、多次使用

本篇小结：std::format 用编译期解析替代 printf 的运行期解析——在类型安全（concept 约束）和性能（比 printf 快 25-40%）两个维度同时超越。std::print/std::println 提供了 C++23 的直接输出。自定义 formatter<T> 让任意用户类型无缝融入 format 体系。locale 无关是安全的默认——需要国际化时用 L 选项显式启用。

下一篇：format 把输出现代化了。下一篇进入 59.UB未定义行为图鉴 (opens new window)——UB 分类、有符号溢出、严格别名、生命周期外访问——C++ 最深的地狱、最隐蔽的陷阱——把 C++ 的暗面全部照亮。

上次更新: 2026/06/28, 17:55:19

← Ranges革命与管道 UB未定义行为图鉴→