pandas - 维基百科，自由的百科全书

pandas
原作者	Wes McKinney（英语：Wes McKinney）
開發者	社区
首次发布	2008年1月11日，16年前
当前版本	2.2.3（2024年9月20日，24天前）
源代码库	github.com/pandas-dev/pandas;
编程语言	Python, Cython, C
操作系统	跨平台
类型	数据分析
许可协议	三条款BSD许可证
网站	pandas.pydata.org

在计算机编程中，pandas是用于数据操纵和分析的Python 软件库。它建造在NumPy基础上，并为操纵数值表格和时间序列，提供了数据结构和运算操作。它是在三条款BSD许可证下发行的自由软件^[2]。它的名字衍生自术语“面板数据”（panel data），这是计量经济学的术语，即包括了对同一个体在多个时期内的观测的数据集^[3]。它的名字还可解释为对短语“Python data analysis”的玩笑^[4]。

历史

2008年，原作者Wes McKinney（英语：Wes McKinney）开始在AQR资本管理公司制作pandas，用来满足在财务数据上进行定量分析（英语：Quantitative analysis (finance)）时，对高性能、灵活工具的需要。2009年，他在离开AQR之前，说服管理者允许他将这个软件库开放源代码。下面是其开发过程的时间线^[5]：

2008年，pandas開發開始。
2009年，pandas開源。
2012年，另一个AQR雇员Chang She加入了这个项目，并成为这个软件库的第二个主要贡献者。第一版《Python for Data Analysis》發布。
2015年，pandas签约为NumFOCUS的一个财务赞助项目，NumFOCUS是美国的501(c)(3)非营利公益组织。
2018年，举行了第一次面對面的“核心開發者衝刺”。
2022年，第三版《Python for Data Analysis》公开版在线发行^[6]。

数据模型

pandas的序列（Series）是一维的加标签数据结构，它能够持有任何数据类型，如整数、字符串、浮点数和Python对象等，轴标签在集体上称为索引（index）。序列表现得非常类似于NumPy的ndarray数据结构，并且是大多数NumPy函数的有效实际参数。

pandas提供了类似于R语言中data.frame对象的数据帧（DataFrame），它是二维的加标签数据结构，其诸纵列潜在的可能具有不同的类型；数据帧就像是电子表格或SQL 表，或者是序列对象的字典^[7]，这种格局也叫做数组之结构（英语：AoS_and_SoA）（SoA）。pandas允许各种数据操纵运算操作，比如选择^[8]、归并^[9]和重制形状^[10]，还有数据清洗和数据加工（英语：data wrangling）特征。

主要特征

pandas提供了快速而高效的数据帧对象，用于凭借其集成的索引进行数据操纵。它的主要特征有：

易于将在其他的Python和NumPy数据结构中，参差不齐或不同索引的数据，转换成数据帧对象。
大小可变性，可以在数据帧和更高维对象中插入或删除纵列。
自动和显式的“数据对齐”，标签和数据之间的联系是固有的，但是可以显式的控制二元运算的匹配和广播行为^[11]。两个序列对象按标签自动对齐，两个数据帧对象自动对齐于纵列标签和索引（即横行标签）二者上，二元运算的结果对象具有双方的纵列标签和横行标签的并集；数据帧与序列对象之间的默认行为，是序列的索引自动对齐于数据帧的纵列标签，从而逐横行广播^[12]。
易于处理缺失数据，它被表示为用于浮点数的NaN（即NumPy的nan）、用于日期时间的NaT或跨数据类型的NA^[13]。
智能的对大数据集的基于标签的分片（英语：Array slicing），多重索引和其他花样索引，依据布尔值向量的子集化（英语：Subsetting）。
直观的数据集的归并和连接。
强大而灵活的分组（英语：Group by (SQL)）（groupby）功能，用来在数据集上进行分离-应用-合并（split-apply-combine）运算，它可用于数据聚合（英语：Aggregate function）和变换（英语：Data transformation (computing)）二者。
灵活的数据集的重制形状（reshape）和枢轴汇总。
轴可以有层级标签，从而在绘图时每个刻度可能有多重标签。
健壮的I/O工具，用于从CSV和其他平面文件、JSON、Parquet（英语：Apache Parquet）、SQL 表和查询、Excel文件装载数据，并以极快的HDF5格式保存/装载数据。^[14]
特定于时间序列的功能，例如日期范围生成和频率转换，移动窗口统计，日期移位（英语：Shift operator）和滞后（英语：Lag operator）。

pandas经过了高度的性能优化，关键代码路径用Cython或C语言写成。pandas可以利用PyArrow来扩展功能并增进各种API的性能^[15]。pandas的缺省绘图后端是matplotlib，还可以扩展上其他第三方绘图（英语：Plot (graphics)）后端^[16]，比如Plotly Express^[17]。进程内（英语：Embedded database）SQL OLAP 列式数据库 DuckDB，可以在pandas数据帧上执行SQL^[18]。

示例

在下面的梗概示例中，展示针对数据帧的纵列和横行的基本运算：

>>> import pandas as pd >>> import numpy as np >>> import matplotlib.pyplot as plt >>>  >>> data = { ...     'variable': ['A'] * 3 + ['B'] * 3 + ['C'] * 3 + ['D'] * 3, ...     'date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03'] * 4), ...     'value': [x + 0.1 for x in range(12)] ... } >>>  >>> df = pd.DataFrame(data) >>> type(df['value']) == pd.Series True >>> >>> df['value1'] = df['value'] + 0.1 >>> df    variable       date  value  value1 0         A 2023-01-01    0.1     0.2 1         A 2023-01-02    1.1     1.2 2         A 2023-01-03    2.1     2.2 3         B 2023-01-01    3.1     3.2 4         B 2023-01-02    4.1     4.2 5         B 2023-01-03    5.1     5.2 6         C 2023-01-01    6.1     6.2 7         C 2023-01-02    7.1     7.2 8         C 2023-01-03    8.1     8.2 9         D 2023-01-01    9.1     9.2 10        D 2023-01-02   10.1    10.2 11        D 2023-01-03   11.1    11.2 >>>  >>> df.index RangeIndex(start=0, stop=12, step=1) >>>  >>> df.columns Index(['variable', 'date', 'value', 'value1'], dtype='object') >>>  >>> df.loc[[1, 2], ['value', 'value1']]     value  value1 1    1.1     1.2 2    2.1     2.2 >>>  >>> [df.columns.get_loc(x) for x in ['value', 'value1']] [2, 3] >>>  >>> df.iloc[[1, 2], [2, 3]]     value  value1 1    1.1     1.2 2    2.1     2.2 >>>  >>> df[(df['value']/2 > 1) & (df['value1'] < 3)]   variable       date  value  value1 2        A 2023-01-03    2.1     2.2 >>>  >>> df.query('value/2 > 1 & value1 < 3')   variable       date  value  value1 2        A 2023-01-03    2.1     2.2 >>>

数据帧中的数据经常存储为两种格式：堆叠格式或记录格式。在堆栈格式中，针对每个主题（subject）在适用情况下有多个横行，故而也称为“长”格式。在记录格式中，针对每个主题典型地有一个横行，故而也称为“宽”格式。在这个例子中，如果要对每个唯一的变量（'A', 'B', 'C', 'D'）进行时间序列运算，更好的表示形式为：诸纵列都对应唯一的变量（即对应不同的观测地点或观测者），而日期索引（'date'）标识出每个（不可细分的）个体观测。为此使用pivot()，将数据帧从堆叠格式重制形状为记录格式：

>>> df.drop([0, 4, 8]).pivot(index='date', columns='variable')            value                 value1                 variable       A    B    C     D      A    B    C     D date                                                    2023-01-01   NaN  3.1  6.1   9.1    NaN  3.2  6.2   9.2 2023-01-02   1.1  NaN  7.1  10.1    1.2  NaN  7.2  10.2 2023-01-03   2.1  5.1  NaN  11.1    2.2  5.2  NaN  11.2 >>>

这里给pivot()的输入数据帧的诸纵列中，除了指定用作index和columns的纵列（'date'和'variable'），仍有多个值纵列（'value', 'value1'）；这里没有通过指定values参数来选取其中之一，故而结果数据帧的诸纵列被纳入层级式索引（即多重索引MultiIndex），其最顶层指示出各自的值纵列（即依据观测量的不同而进行顶层分组）。

使用concat()和merge()，对数据帧进行串接（英语：Set operations (SQL)）和归并运算：

>>> df1 = df.drop(columns='value').rename(columns={'value1': 'value'}) >>> df1 = pd.concat([df.drop(columns='value1'), df1], ignore_index=True) >>> df1.shape (24, 3) >>>  >>> data1 = [ ...     ('A', pd.Timestamp('2023-01-01'), 0.3), ...     ('A', pd.Timestamp('2023-01-02'), 1.3) ... ] >>>  >>> rows = pd.DataFrame(data1, columns=['variable', 'date', 'value']) >>> pd.concat([df1, rows], ignore_index=True).tail(3)    variable       date  value 23        D 2023-01-03   11.2 24        A 2023-01-01    0.3 25        A 2023-01-02    1.3 >>>  >>> right = pd.DataFrame(data1[0:1], columns=['variable', 'date', 'value1']) >>> pd.merge(df1, right, on=['variable', 'date'], how='inner')   variable       date  value  value1 0        A 2023-01-01    0.1     0.3 1        A 2023-01-01    0.2     0.3 >>>

使用groupby()和agg()，对数据帧进行分组（英语：Group by (SQL)）和聚合（英语：Aggregate function）运算：

>>> df2 = df1.groupby(['date', 'variable']).agg({'value': 'sum'}) >>> df2                      value date       variable        2023-01-01 A           0.3            B           6.3            C          12.3            D          18.3 2023-01-02 A           2.3            B           8.3            C          14.3            D          20.3 2023-01-03 A           4.3            B          10.3            C          16.3            D          22.3 >>>  >>> df2.shape (12, 1) >>>  >>> df2.index MultiIndex([('2023-01-01', 'A'),             ('2023-01-01', 'B'),             ('2023-01-01', 'C'),             ('2023-01-01', 'D'),             ('2023-01-02', 'A'),             ('2023-01-02', 'B'),             ('2023-01-02', 'C'),             ('2023-01-02', 'D'),             ('2023-01-03', 'A'),             ('2023-01-03', 'B'),             ('2023-01-03', 'C'),             ('2023-01-03', 'D')],            names=['date', 'variable']) >>>  >>> df2.columns Index(['value'], dtype='object') >>>  >>> df2.loc[('2023-01-02', 'A')] value    2.3 Name: (2023-01-02 00:00:00, A), dtype: float64 >>>  >>> df2.loc['2023-01-02']           value variable        A           2.3 B           8.3 C          14.3 D          20.3 >>>  >>> df2.xs('A', level='variable')             value date              2023-01-01    0.3 2023-01-02    2.3 2023-01-03    4.3 >>>

使用pivot_table()，对数据帧进行枢轴汇总运算：

>>> df3 = df1.pivot_table(index='date', columns='variable', aggfunc='sum') >>> df3            value                   variable       A     B     C     D date                               2023-01-01   0.3   6.3  12.3  18.3 2023-01-02   2.3   8.3  14.3  20.3 2023-01-03   4.3  10.3  16.3  22.3 >>>  >>> df3.shape (3, 4) >>>  >>> df3.to_numpy() array([[ 0.3,  6.3, 12.3, 18.3],        [ 2.3,  8.3, 14.3, 20.3],        [ 4.3, 10.3, 16.3, 22.3]]) >>>  >>> df3.index DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03'], dtype='datetime64[ns]', name='date', freq=None) >>>  >>> df3.columns MultiIndex([('value', 'A'),             ('value', 'B'),             ('value', 'C'),             ('value', 'D')],            names=[None, 'variable']) >>>  >>> df1.pivot_table(index='date', columns='variable', values='value', aggfunc='sum').columns Index(['A', 'B', 'C', 'D'], dtype='object', name='variable') >>>  >>> df3['value'].columns Index(['A', 'B', 'C', 'D'], dtype='object', name='variable') >>>  >>> df3[('value', 'A')] date 2023-01-01    0.3 2023-01-02    2.3 2023-01-03    4.3 Name: (value, A), dtype: float64 >>>

用matplotlib为数据帧绘制条形图：

>>> ax = df3.plot.bar() >>> h, l = ax.get_legend_handles_labels() >>> ax.legend(h, df3.columns.get_level_values(1), title=None, loc='upper left') <matplotlib.legend.Legend object at 0x7fdd1cff96d0> >>> ax.set_xticklabels([x.strftime('%Y-%m-%d') for x in df3.index], rotation=0) [Text(0, 0, '2023-01-01'), Text(1, 0, '2023-01-02'), Text(2, 0, '2023-01-03')] >>> ax.get_xaxis().get_label().set_visible(False) >>> ax.grid(axis='y', linestyle=':') >>> ax.set_axisbelow(True) >>> for i, m in enumerate(ax.containers): ...     for j, n in enumerate(m.get_children()): ...         n.set_x(j - 0.8*(0.5 - i/df3.columns.size)) ...         n.set_width(0.8/df3.columns.size)           ...     ax.bar_label(m, fontsize='small') ...  [Text(0, 0, '0.3'), Text(0, 0, '2.3'), Text(0, 0, '4.3')] [Text(0, 0, '6.3'), Text(0, 0, '8.3'), Text(0, 0, '10.3')] [Text(0, 0, '12.3'), Text(0, 0, '14.3'), Text(0, 0, '16.3')] [Text(0, 0, '18.3'), Text(0, 0, '20.3'), Text(0, 0, '22.3')] >>> plt.show() >>>

导出和导入CSV文件：

>>> df3.to_csv('dftest.csv', float_format='%.1f') >>>  >>> df4 = pd.read_csv('dftest.csv', header=[0, 1], index_col=0) >>> df4.shape (3, 4)

这里的标头（header）指定用作纵列名字的横行，而索引列（index_col）指定用作索引（index）即横行标签的纵列。

使用util-linux 工具组成员column来查看导出的CSV文件：

$ cat dftest.csv | column -s, -o, -t            ,value,value,value,value variable  ,A    ,B    ,C    ,D date      ,     ,     ,     , 2023-01-01,0.3  ,6.3  ,12.3 ,18.3 2023-01-02,2.3  ,8.3  ,14.3 ,20.3 2023-01-03,4.3  ,10.3 ,16.3 ,22.3

导出和导入JSON文件：

>>> df3.to_json('dftest.json', orient='index', date_format='iso', date_unit='s') >>>  >>> df4 = pd.read_json('dftest.json', orient='index') >>> df4.shape (3, 4) >>>  >>> df4.columns Index(['('value', 'A')', '('value', 'B')', '('value', 'C')', '('value', 'D')'], dtype='object') >>>  >>> df4.columns = pd.MultiIndex.from_tuples([eval(x) for x in df4.columns]) >>> df4.index.name = df3.index.name >>> df4.columns.name = df3.columns.name

这里指定了方向（orient）为索引（'index'），即采用横行为主（英语：Row- and column-major order）次序；这里指定了日期时间格式为ISO 8601标准格式，并且时间单位为秒。

使用jq编程语言（英语：jq (programming language)）实现jq来查看导出的JSON文件：

$ cat dftest.json | jq {   "2023-01-01T00:00:00": {     "('value', 'A')": 0.3,     "('value', 'B')": 6.3,     "('value', 'C')": 12.3,     "('value', 'D')": 18.3   },   "2023-01-02T00:00:00": {     "('value', 'A')": 2.3,     "('value', 'B')": 8.3,     "('value', 'C')": 14.3,     "('value', 'D')": 20.3   },   "2023-01-03T00:00:00": {     "('value', 'A')": 4.3,     "('value', 'B')": 10.3,     "('value', 'C')": 16.3,     "('value', 'D')": 22.3   } }

导出和导入HDF5文件基于了PyTables^[19]：

>>> df3.to_hdf('dftest.h5', key='df3', mode='w') >>>  >>> df4 = pd.read_hdf('dftest.h5', key='df3') >>> df4.shape (3, 4)

这里通过键（key）参数，指定了与数据帧相对应的在HDF5文件中的群组（Group），对它采用了缺省的固定（'fixed'）存储格式，而文件打开模态'w'是为“写”（write）即“新建”。

使用hdf5-tools工具组成员h5ls来查看导出的HDF5文件：

$ h5ls -r -d dftest.h5 /                        Group /df3                     Group /df3/axis0_label0        Dataset {4}     Data:          0, 0, 0, 0 /df3/axis0_label1        Dataset {4}     Data:          0, 1, 2, 3 /df3/axis0_level0        Dataset {1}     Data:          "value" /df3/axis0_level1        Dataset {4}     Data:          "A", "B", "C", "D" /df3/axis1               Dataset {3}     Data:          1672531200000000000, 1672617600000000000, 1672704000000000000 /df3/block0_items_label0 Dataset {4}     Data:          0, 0, 0, 0 /df3/block0_items_label1 Dataset {4}     Data:          0, 1, 2, 3 /df3/block0_items_level0 Dataset {1}     Data:          "value" /df3/block0_items_level1 Dataset {4}     Data:          "A", "B", "C", "D" /df3/block0_values       Dataset {3, 4}     Data:          0.3, 6.3, 12.3, 18.3, 2.3, 8.3, 14.3, 20.3, 4.3, 10.3, 16.3, 22.3

这里的HDF5文件中的日期时间表示，是以纳秒为单位的UNIX时间纪元（英语：Epoch (computing)）时间戳。这种存储格式保存了数据帧的两个轴^[20]和所有的块^[21]。由于这里只有一个块，这个块的items的内容同于axis0，而导出前面的数据帧df之时，它的四个纵列会整合（consolidate）为三个块，其items的并集同于axis0。

保存HDF5文件还可采用表格（'table'）格式，HDF5文件中这种存储格式的群组，可以直接在其上进行查询和删除：

>>> df3.to_hdf('dftest.h5', key='df4', format='table', mode='a') >>>  >>> pd.read_hdf('dftest.h5', key='df4', where='index > 20230101', columns=[('value', 'A'), ('value', 'C')])            value       variable       A     C date                   2023-01-02   2.3  14.3 2023-01-03   4.3  16.3

这里的文件打开模态'a'是为“附加”（append）。

查看变更后的HDF5文件：

$ h5ls dftest.h5 df3                      Group df4                      Group $ h5ls -r dftest.h5/df4 /_i_table                Group /_i_table/index          Group /_i_table/index/abounds  Dataset {0/Inf} /_i_table/index/bounds   Dataset {0/Inf, 127} /_i_table/index/indices  Dataset {0/Inf, 131072} /_i_table/index/indicesLR Dataset {131072} /_i_table/index/mbounds  Dataset {0/Inf} /_i_table/index/mranges  Dataset {0/Inf} /_i_table/index/ranges   Dataset {0/Inf, 2} /_i_table/index/sorted   Dataset {0/Inf, 131072} /_i_table/index/sortedLR Dataset {131201} /_i_table/index/zbounds  Dataset {0/Inf} /table                   Dataset {3/Inf} $ h5ls -d dftest.h5/df4/table table                    Dataset {3/Inf}     Data:          {1672531200000000000, [0.3,6.3,12.3,18.3]},          {1672617600000000000, [2.3,8.3,14.3,20.3]},          {1672704000000000000, [4.3,10.3,16.3,22.3]}

这里的_i_table/index群组存储了PyTables的tables.index模块所存取的内容^[22]。

导出和导入netCDF文件可以借助xarray，它依赖于pandas，它通过netcdf4-python支持导入导出netCDF-4格式数据^[23]，通过SciPy支持其他版本netCDF格式。xarray能够在自身的数据阵列（DataArray）与pandas的序列之间，在自身的数据集（Dataset）与pandas的数据帧之间，进行相互转换^[24]：

>>> df3.stack().shape (12, 1) >>>  >>> import xarray as xr >>> df3.stack().to_xarray().to_netcdf('dftest.nc') >>>  >>> df4 = xr.open_dataset('dftest.nc').to_dataframe().unstack() >>> df4.shape (3, 4)

使用netcdf-bin工具组成员ncdump来查看导出的netCDF文件：

$ ncdump dftest.nc netcdf dftest { dimensions: 	date = 3 ; 	variable = 4 ; variables: 	int64 date(date) ; 		date:units = "days since 2023-01-01 00:00:00" ; 		date:calendar = "proleptic_gregorian" ; 	string variable(variable) ; 	double value(date, variable) ; 		value:_FillValue = NaN ; data:   date = 0, 1, 2 ;   variable = "A", "B", "C", "D" ;   value =   0.3, 6.3, 12.3, 18.3,   2.3, 8.3, 14.3, 20.3,   4.3, 10.3, 16.3, 22.3 ; } $ ncdump -k dftest.nc netCDF-4 $ h5ls -r -d dftest.nc /                        Group /date                    Dataset {3}     Data:          0, 1, 2 /value                   Dataset {3, 4}     Data:          0.3, 6.3, 12.3, 18.3, 2.3, 8.3, 14.3, 20.3, 4.3, 10.3, 16.3, 22.3 /variable                Dataset {4}     Data:          "A", "B", "C", "D"

xarray的日期时间表示遵循了气候和预报元数据约定（英语：Climate and Forecast Metadata Conventions）^[25]，这里的时间单位为距离某个指定的开始日期时间的日数，历法为前推格里高利历。ncdump的输出采用了netCDF的“公用数据语言”（CDL）^[26]，它所称谓的变量，代表相同类型的值的多维阵列，变量声明指定了变量的数据类型、名字和以维度名字列表描述的形状。这里有三个变量：value是数据变量，date和variable是坐标变量，而变量声明double value(date, variable)中，转换得来的维度名字同于坐标变量名字。

参见

引用

^ ^1.0 ^1.1 Release 2.2.3. 2024年9月20日 [2024年9月22日].
^ License – Package overview – pandas 1.0.0 documentation. pandas. 28 January 2020 [30 January 2020]. （原始内容存档于2012-02-14）.
^ Wes McKinney. pandas: a Foundational Python Library for Data Analysis and Statistics (PDF). 2011 [2 August 2018]. （原始内容 (PDF)存档于2015-05-13）.
^ McKinney, Wes. Python for Data Analysis, Second Edition. O'Reilly Media. 2017: 13. ISBN 9781491957660.
^ About pandas — History of development — Timeline. [2023-09-30]. （原始内容存档于2023-10-10）.
^ Python for Data Analysis, 3E. [2023-10-06]. （原始内容存档于2023-11-07）.
^ DataFrame. [2022-09-01]. （原始内容存档于2022-09-01）.
^ Indexing and selecting data. [2020-09-12]. （原始内容存档于2020-09-15）.
^ Merge, join, concatenate and compare. [2020-09-12]. （原始内容存档于2020-09-15）.
^ Reshaping and pivot tables. [2020-09-12]. （原始内容存档于2020-09-15）.
^ Essential basic functionality — Matching / broadcasting behavior. [2023-12-22]. （原始内容存档于2024-04-21）.
^ Intro to data structures — Data alignment and arithmetic. [2023-12-22]. （原始内容存档于2022-09-01）.
^ Working with missing data. [2023-12-22]. （原始内容存档于2024-05-16）.
^ IO tools (text, CSV, HDF5, …). [2020-09-12]. （原始内容存档于2020-09-15）.
^ McKinney, Wes. Apache Arrow and the "10 Things I Hate About pandas". wesmckinney.com. 21 September 2017 [21 December 2023]. （原始内容存档于2024-05-25）（英语）.
^ Python tools for data visualization — High-level tools. [2023-09-28]. （原始内容存档于2023-09-28）.
^ Pandas Plotting Backend in Python.
^ DuckDB Guides — SQL on Pandas. [2023-09-29]. （原始内容存档于2023-10-03）.
^ PyTables: hierarchical datasets in Python. [2023-09-28]. （原始内容存档于2023-08-24）.
^ What does axis in Pandas mean?. [2023-12-25]. （原始内容存档于2023-12-25）.
^ Internal Structure of Pandas DataFrames. [2023-12-25]. （原始内容存档于2023-12-25）.
^ Source code for tables.index. [2023-12-25]. （原始内容存档于2023-12-25）.
^ netcdf4-python: Python/numpy interface to the netCDF C library. [2023-10-07]. （原始内容存档于2023-10-12）.
^ xarray User Guide － Working with pandas. [2022-09-04]. （原始内容存档于2022-09-04）.
^ NetCDF Climate and Forecast (CF) Metadata Conventions — Time Coordinate. [2023-10-09]. （原始内容存档于2023-10-12）.
xarray User Guide — Weather and climate data. [2023-10-09]. （原始内容存档于2023-10-12）.
^ Documentation for Common Data Language. [2023-12-26]. （原始内容存档于2024-02-06）.

延伸阅读

McKinney, Wes. Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition. O'Reilly. 2022 [2023-10-06]. ISBN 978-1-0981-0403-0. （原始内容存档于2023-10-07）.
Chen, Daniel Y. Pandas for Everyone : Python Data Analysis 2nd Edition. Addison-Wesley. 2022 [2023-10-06]. ISBN 978-0-1378-9105-4. （原始内容存档于2023-10-07）.
Molin, Stefanie. Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python. Packt. 2019 [2023-10-06]. ISBN 978-1-7896-1532-6. （原始内容存档于2023-10-07）.
VanderPlas, Jake. Python Data Science Handbook: Essential Tools for Working with Data. O'Reilly. 2016 [2023-10-06]. ISBN 978-1-4919-1205-8. （原始内容存档于2023-10-08）.

外部链接

Pathak, Chankey. Pandas Cookbook. 2018 [2023-10-06]. （原始内容存档于2023-10-07）.

[wikidata-3f2416acfe8187adf98e2ed6878cdc790b90d1bf-v3-1] 1.0 ^1.1 Release 2.2.3. 2024年9月20日 [2024年9月22日].

[2] License – Package overview – pandas 1.0.0 documentation. pandas. 28 January 2020 [30 January 2020]. （原始内容存档于2012-02-14）.

[3] Wes McKinney. pandas: a Foundational Python Library for Data Analysis and Statistics (PDF). 2011 [2 August 2018]. （原始内容 (PDF)存档于2015-05-13）.

[4] McKinney, Wes. Python for Data Analysis, Second Edition. O'Reilly Media. 2017: 13. ISBN 9781491957660.

[5] About pandas — History of development — Timeline. [2023-09-30]. （原始内容存档于2023-10-10）.

[6] Python for Data Analysis, 3E. [2023-10-06]. （原始内容存档于2023-11-07）.

[7] DataFrame. [2022-09-01]. （原始内容存档于2022-09-01）.

[8] Indexing and selecting data. [2020-09-12]. （原始内容存档于2020-09-15）.

[9] Merge, join, concatenate and compare. [2020-09-12]. （原始内容存档于2020-09-15）.

[10] Reshaping and pivot tables. [2020-09-12]. （原始内容存档于2020-09-15）.

[11] Essential basic functionality — Matching / broadcasting behavior. [2023-12-22]. （原始内容存档于2024-04-21）.

[12] Intro to data structures — Data alignment and arithmetic. [2023-12-22]. （原始内容存档于2022-09-01）.

[13] Working with missing data. [2023-12-22]. （原始内容存档于2024-05-16）.

[14] IO tools (text, CSV, HDF5, …). [2020-09-12]. （原始内容存档于2020-09-15）.

[15] McKinney, Wes. Apache Arrow and the "10 Things I Hate About pandas". wesmckinney.com. 21 September 2017 [21 December 2023]. （原始内容存档于2024-05-25）（英语）.

[16] Python tools for data visualization — High-level tools. [2023-09-28]. （原始内容存档于2023-09-28）.

[17] Pandas Plotting Backend in Python.

[18] DuckDB Guides — SQL on Pandas. [2023-09-29]. （原始内容存档于2023-10-03）.

[19] PyTables: hierarchical datasets in Python. [2023-09-28]. （原始内容存档于2023-08-24）.

[20] What does axis in Pandas mean?. [2023-12-25]. （原始内容存档于2023-12-25）.

[21] Internal Structure of Pandas DataFrames. [2023-12-25]. （原始内容存档于2023-12-25）.

[22] Source code for tables.index. [2023-12-25]. （原始内容存档于2023-12-25）.

[23] tcdf4-python: Python/numpy interface to the netCDF C library. [2023-10-07]. （原始内容存档于2023-10-12）.

[24] xarray User Guide － Working with pandas. [2022-09-04]. （原始内容存档于2022-09-04）.

[25] NetCDF Climate and Forecast (CF) Metadata Conventions — Time Coordinate. [2023-10-09]. （原始内容存档于2023-10-12）.
xarray User Guide — Weather and climate data. [2023-10-09]. （原始内容存档于2023-10-12）.

[26] Documentation for Common Data Language. [2023-12-26]. （原始内容存档于2024-02-06）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]