Polars 中按模式合并列并重复非模式列的完整教程

霞舞

发布时间：2026-01-30 14:11:17

998人浏览过

来源于php中文网

原创

Polars 中按模式合并列并重复非模式列的完整教程

本文介绍如何在 polars 中将匹配命名模式（如 `a_0`, `a_1`, `a_2`）的多列纵向堆叠为单列（如 `a`），同时自动复制其他列（如 `words`, `groups`）以保持行对齐，适用于数据重塑与长格式转换场景。

在 Polars 数据分析中，常需将宽格式中具有相同前缀、不同后缀（如 _0, _1, _2）的列“纵向合并”为单列，同时保留其他辅助列（如分组标识、文本描述等）——但这些辅助列不能简单丢弃或聚合，而需按原顺序重复多次，使其长度与合并后的新列一致。这本质上是构建“长格式”（long format）数据的过程，但不同于常规 unpivot，它要求：

按列名前缀分组（如所有 a_* → a，所有 b_* → b）；
同一组内列按列序垂直拼接（a_0 全部在前，接着 a_1，再 a_2）；
非模式列（如 words, groups）需被精确复制 m 次（m 为每组列数），且保持与对应值的逻辑配对。

实现该目标的核心思路是：先 unpivot 构建索引骨架 → 清洗变量名生成逻辑分组 → 添加组内序号 → 多键 pivot 重构结构。以下是完整、可复用的解决方案：

import polars as pl
import numpy as np
import string

# 构造示例数据（同原文）
rng = np.random.default_rng(42)
nr = 3
letters = list(string.ascii_letters)
uppercase = list(string.ascii_uppercase)
words, groups = [], []
for i in range(nr):
    word = ''.join([rng.choice(letters) for _ in range(rng.integers(3, 20))])
    words.append(word)
    group = rng.choice(uppercase)
    groups.append(group)
df = pl.DataFrame({
    "a_0": np.linspace(0, 1, nr),
    "a_1": np.linspace(1, 2, nr),
    "a_2": np.linspace(2, 3, nr),
    "b_0": np.random.rand(nr),
    "b_1": 2 * np.random.rand(nr),
    "b_2": 3 * np.random.rand(nr),
    "words": words,
    "groups": groups,
})

# ✅ 关键转换：合并 a_*, b_* 列，并重复 words/groups
result = (
    df
    .unpivot(
        index=["words", "groups"],  # 将非模式列设为 pivot 索引（即保留在每行）
        on=[col for col in df.columns if "_" in col and col.split("_")[-1].isdigit()]  # 显式指定要 unpivot 的列（更安全）
    )
    .with_columns(
        pl.col("variable").str.replace(r"_\d+$", "")  # 提取前缀：a_0 → "a", b_1 → "b"
    )
    .with_columns(
        index=pl.int_range(pl.len()).over("variable")  # 在每个前缀组内编号（0,1,2,...），确保 a_0/a_1/a_2 的值按列序堆叠
    )
    .pivot(
        on="variable",
        index=["index", "words", "groups"],
        values="value",
        aggregate_function=None  # 禁用聚合，确保一对一映射
    )
    .drop("index")  # 删除临时索引列
)

print(result)

输出结果与预期完全一致：

WPS AI

金山办公发布的AI办公应用，提供智能文档写作、阅读理解和问答、智能人机交互的能力。

下载

shape: (9, 4)
┌─────────────────┬────────┬─────┬──────────┐
│ words           ┆ groups ┆ a   ┆ b        │
│ ---             ┆ ---    ┆ --- ┆ ---      │
│ str             ┆ str    ┆ f64 ┆ f64      │
╞═════════════════╪════════╪═════╪══════════╡
│ OIww            ┆ W      ┆ 0.0 ┆ 0.653892 │
│ KkeB            ┆ Z      ┆ 0.5 ┆ 0.408888 │
│ NLOAgRxAtjWOHuQ ┆ O      ┆ 1.0 ┆ 0.423949 │
│ OIww            ┆ W      ┆ 1.0 ┆ 0.234362 │
│ KkeB            ┆ Z      ┆ 1.5 ┆ 0.213767 │
│ NLOAgRxAtjWOHuQ ┆ O      ┆ 2.0 ┆ 0.646378 │
│ OIww            ┆ W      ┆ 2.0 ┆ 0.880558 │
│ KkeB            ┆ Z      ┆ 2.5 ┆ 1.833025 │
│ NLOAgRxAtjWOHuQ ┆ O      ┆ 3.0 ┆ 0.116173 │
└─────────────────┴────────┴─────┴──────────┘