语言基础
Python
Pandas

其它

# 重新索引

reindex() 只是根据 index 改变各个元素（value）的排列顺序，而不是改变元素（value）与 index 的对应关系
参数说明
- idnex：用作索引的序列
- method: 填充方式
- fill_value：缺失值的替代值
- limit：向前或向后填充时的最大填充量
- tolerance：向前或者向后填充时，填充不准确匹配项的最大间距
- level：在 MultiIndex 的指定级别上匹配简单索引，否则选取子集
- copy：默认为 True，无论如何都复制；如果为 False，则新旧相等就不复制

import numpy as np
import pandas as pd
from pandas import Series
from pandas import DataFrame

1
2
3
4

# Series 上的 reindex

s = Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])
s

1
2

a    1
b    2
c    3
d    4
e    5
f    6
dtype: int64

s2 = s.reindex(['f', 'e', 'd', 'c', 'b', 'a'])
s2

1
2

f    6
e    5
d    4
c    3
b    2
a    1
dtype: int64

# DataFrame 上的 reindex

1

1

# 唯一值以及成员资格

# 移除重复值

df.duplicated()返回一个布尔型 Series，表示各行是否有重复行（是否重复出现前面的行）

df.drop_duplicates()返回一个去掉重复行的 DataFrame

以上两个方法默认判断全部列，也可以自己指定判断是否重复的列。他们默认保留的是第一个出现的值组合，传入 keep=‘last’则保留最后一个

df = DataFrame([[1,2], [2,3], [3,4], [2,4], [3,5], [2,3], [2,4]], columns=['A', 'B'])
df

1
2

	A	B
0	1	2
1	2	3
2	3	4
3	2	4
4	3	5
5	2	3
6	2	4

df.duplicated() # 默认判断所有行是否重复

1

0    False
1    False
2    False
3    False
4    False
5     True
6     True
dtype: bool

df.drop_duplicates()

1

	A	B
0	1	2
1	2	3
2	3	4
3	2	4
4	3	5

df.duplicated('A')

1

0    False
1    False
2    False
3     True
4     True
5     True
6     True
dtype: bool

df.drop_duplicates('A')

1

	A	B
0	1	2
1	2	3
2	3	4

map()
apply()
apply_map()
replace()
repalce(1, -1) 可以替换一个值
repalce([1, 2], -1) 可以一次替换多个值
repalce([1,2], [-1, -2]) 一次替换多个值，每个被替换的值都不一样
repalce({1:-1, 2:-2}) 同上

# 轴命名索引

df = DataFrame(np.arange(9).reshape(3, 3), index=['a', 'b', 'c'], columns=['bj', 'tj', 'sh'])
df

1
2

	bj	tj	sh
a	0	1	2
b	3	4	5
c	6	7	8

func = lambda x:x.upper()
df.index.map(func)

1
2

Index(['A', 'B', 'C'], dtype='object')

df

1

	bj	tj	sh
a	0	1	2
b	3	4	5
c	6	7	8

# 简单用法
df.rename(index=str.title, columns=str.upper)

1
2

	BJ	TJ	SH
A	0	1	2
B	3	4	5
C	6	7	8

df

1

	bj	tj	sh
a	0	1	2
b	3	4	5
c	6	7	8

# 结合字典
df.rename(index={'a':'AAA', 'b':'BBB', 'c':'CCC'}, columns={'bj':'bj1', 'tj':'tj1', 'sh':'sh1'})

1
2

	bj1	tj1	sh1
AAA	0	1	2
BBB	3	4	5
CCC	6	7	8

df

1

	bj	tj	sh
a	0	1	2
b	3	4	5
c	6	7	8

# 判断正负号

df = DataFrame(np.random.randn(5, 5))
df

1
2

	0	1	2	3	4
0	0.727472	-0.180341	-2.063163	0.662542	0.154078
1	1.225638	-1.441499	1.261144	0.707968	-0.079686
2	0.139142	-0.410972	1.070285	1.213725	-0.779184
3	2.482763	-1.469214	1.169715	0.399087	-0.417736
4	-1.140476	1.138731	0.305860	0.062746	-0.402571

np.sign(df)

1

	0	1	2	3	4
0	1.0	-1.0	-1.0	1.0	1.0
1	1.0	-1.0	1.0	1.0	-1.0
2	1.0	-1.0	1.0	1.0	-1.0
3	1.0	-1.0	1.0	1.0	-1.0
4	-1.0	1.0	1.0	1.0	-1.0

方法	说明
append	连接另一个Index对象，产生一个新的Index
difference	计算差集，并得到一个Index
intersection	计算交集
union	计算并集
isin	计算各值是否包含在参数集合中，返回一个布尔数组
delete	删除索引i处的元素，并得到新的Index
drop	删除传入的值，并得到新的Index
insert	将元素插入到索引i处，并得到新的Index
is_monotonic	当各元素均大于等于前一个元素时，返回True
is_unique	当Index没有重复值时，返回True
unique	计算Index中唯一值得数组

上次更新: 2023/11/01, 03:11:44