时间序列
import numpy as np
import pandas as pd
import datetime
1
2
3
2
3
# 创建
# to_datetime
pd.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, box=True, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)
- arg:integer, float, string, datetime, list, tuple, 1-d array, Series or DataFrame/dict-like
- errors:{'ignore', 'raise', 'coerce'}, 默认为 'raise'
- 如果为 ignore,遇到无法解析的字符串会返回原字符串
- 如果为 raise,遇到无法解析的字符串会抛出异常
- 如果为 coerce,遇到无法解析的字符串会转为 NaT
- dayfirst:指定解析顺序(如果 arg 参数为字符串或类似于列表的对象)。如果为 True,10/11/12 会被解析为 2012/11/10
- yearfirst:指定解析顺序(如果 arg 参数为字符串或类似于列表的对象)。如果为 True,10/11/12 会被解析为 2010/11/12
- 如果 dayfirst 和 yearfirst 都为 True 的话,yearfirst 优先级高(默认)
- format:指定解析格式。
- pd.to_datetime('12-2010-10 00:00', format='%d-%Y-%m %H:%M') 会被解析为 2010-10-12 00:00:00
返回类型依赖于输入:
- 输入标量,返回 Timestap
- 输入数组,返回 DatetimeIndex
- 输入一个 Series/DataFrame,返回 Series
# 输入标量,返回Timestap
pd.to_datetime('2019')
1
2
2
Timestamp('2019-01-01 00:00:00')
# 输入数组,返回DatetimeIndex
pd.to_datetime(['20190101', '20190201', '20190301'])
1
2
2
DatetimeIndex(['2019-01-01', '2019-02-01', '2019-03-01'], dtype='datetime64[ns]', freq=None)
# 输入一个Series,返回Series
s = pd.Series(['20190101', '20190201', '20190301'])
pd.to_datetime(s)
1
2
3
2
3
0 2019-01-01
1 2019-02-01
2 2019-03-01
dtype: datetime64[ns]
也可以通过 DataFrame 来创建时间序列,但是需要通过列名称来指定时间单位:
year
, month
, day
是必选列名
hour
, minute
, second
, millisecond
, microsecond
, nanosecond
是可选列名
# 输入一个DataFrame,返回一个Series
df = pd.DataFrame({'year': [2018, 2019],'month': [3, 4], 'day': [6, 8],'hour': [3, 1], 'minute': [10, 20]})
pd.to_datetime(df)
pd.to_datetime(df[['year', 'month', 'day']])
1
2
3
4
5
2
3
4
5
0 2018-03-06 03:10:00
1 2019-04-08 01:20:00
dtype: datetime64[ns]
0 2018-03-06
1 2019-04-08
dtype: datetime64[ns]
# pandas中的时间解析是很灵活的
datestrs = ['2019-07-06 12:00:00', '1/09/2019', '20190101', 'Jul 31, 2019', np.datetime64('2018-01-01'), datetime.datetime.now()]
pd.to_datetime(datestrs)
1
2
3
2
3
DatetimeIndex([ '2019-07-06 12:00:00', '2019-01-09 00:00:00',
'2019-01-01 00:00:00', '2019-07-31 00:00:00',
'2018-01-01 00:00:00', '2022-08-04 20:39:54.929756'],
dtype='datetime64[ns]', freq=None)
# date_range
pd.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs)
生成固定频率的DatetimeIndex
- start:string 或 datetime-like,默认值是 None,表示日期的起点
- end:string 或 datetime-like,默认值是 None,表示日期的终点
- periods:integer 或 None,默认值是 None,表示你要从这个函数产生多少个日期索引值;如果是 None 的话,那么 start 和 end 必须不能为 None
- freq:string 或 DateOffset,默认值是’D’,表示以自然日为单位,这个参数用来指定计时单位,比如’5H’表示每隔 5 个小时计算一次。在这里 (opens new window)可以看到所有可选值。
- Y 表示年
- M 表示月
- D 表示日
- W 表示周
- H 表示时
- T 表示分
- S 表示秒
- B 表示工作日
- tz:string 或 None,表示时区,例如:’Asia/Hong_Kong’
- normalize:bool,默认值为 False,如果为 True 的话,那么在产生时间索引值之前会先把 start 和 end 都转化为当日的午夜 0 点
- name:str,默认值为 None,给返回的时间索引指定一个名字
- closed:string 或者 None,默认值为 None,表示 start 和 end 这个区间端点是否包含在区间内,可以有三个值,’left’表示左闭右开区间,’right’表示左开右闭区间,None 表示两边都是闭区间
start
, end
, periods
, freq
这四个参数至少需要指定三个,其中freq
默认为 D
pd.date_range('2019-01-01', periods=3, freq='T') # freq 默认为D,可以写成3D,表示间隔为3天
1
DatetimeIndex(['2019-01-01 00:00:00', '2019-01-01 00:01:00',
'2019-01-01 00:02:00'],
dtype='datetime64[ns]', freq='T')
pd.date_range('20190101', periods=4, freq='10T')
1
DatetimeIndex(['2019-01-01 00:00:00', '2019-01-01 00:10:00',
'2019-01-01 00:20:00', '2019-01-01 00:30:00'],
dtype='datetime64[ns]', freq='10T')
# bdate_range
pd.bdate_range(start=None, end=None, periods=None, freq='B', tz=None, normalize=True, name=None, weekmask=None, holidays=None, closed=None, **kwargs)
按照工作日计算
pd.bdate_range(start='2018-01-01', end='2019-01-01')
1
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
'2018-01-05', '2018-01-08', '2018-01-09', '2018-01-10',
'2018-01-11', '2018-01-12',
...
'2018-12-19', '2018-12-20', '2018-12-21', '2018-12-24',
'2018-12-25', '2018-12-26', '2018-12-27', '2018-12-28',
'2018-12-31', '2019-01-01'],
dtype='datetime64[ns]', length=262, freq='B')
pd.date_range(start='2018-01-01', end='2019-01-01')
1
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
'2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',
'2018-01-09', '2018-01-10',
...
'2018-12-23', '2018-12-24', '2018-12-25', '2018-12-26',
'2018-12-27', '2018-12-28', '2018-12-29', '2018-12-30',
'2018-12-31', '2019-01-01'],
dtype='datetime64[ns]', length=366, freq='D')
# 索引
d = pd.date_range('20180101', '20190601')
ds = pd.Series(np.random.randn(len(d)), index=d)
ds
1
2
3
2
3
2018-01-01 -0.250983
2018-01-02 1.000452
2018-01-03 -0.619409
2018-01-04 0.838948
2018-01-05 1.530228
...
2019-05-28 0.148896
2019-05-29 -0.767799
2019-05-30 -1.536733
2019-05-31 0.190801
2019-06-01 -2.300904
Freq: D, Length: 517, dtype: float64
ds.index
1
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
'2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',
'2018-01-09', '2018-01-10',
...
'2019-05-23', '2019-05-24', '2019-05-25', '2019-05-26',
'2019-05-27', '2019-05-28', '2019-05-29', '2019-05-30',
'2019-05-31', '2019-06-01'],
dtype='datetime64[ns]', length=517, freq='D')
ds['2019-05-01']
1
1.7245340453263973
ds['2019-05-01':]
1
2019-05-01 1.724534
2019-05-02 0.446775
2019-05-03 0.327490
2019-05-04 -0.016504
2019-05-05 -0.949755
2019-05-06 -0.412805
2019-05-07 -0.599558
2019-05-08 1.047344
2019-05-09 0.648256
2019-05-10 0.637291
2019-05-11 -0.519630
2019-05-12 -0.372287
2019-05-13 1.195050
2019-05-14 -0.527554
2019-05-15 0.951183
2019-05-16 0.153516
2019-05-17 -0.818237
2019-05-18 0.635959
2019-05-19 0.543335
2019-05-20 1.706608
2019-05-21 -1.126285
2019-05-22 -1.210101
2019-05-23 -0.096912
2019-05-24 -0.810127
2019-05-25 0.454461
2019-05-26 1.536894
2019-05-27 -0.029693
2019-05-28 0.148896
2019-05-29 -0.767799
2019-05-30 -1.536733
2019-05-31 0.190801
2019-06-01 -2.300904
Freq: D, dtype: float64
ds['2019-05']
1
2019-05-01 1.724534
2019-05-02 0.446775
2019-05-03 0.327490
2019-05-04 -0.016504
2019-05-05 -0.949755
2019-05-06 -0.412805
2019-05-07 -0.599558
2019-05-08 1.047344
2019-05-09 0.648256
2019-05-10 0.637291
2019-05-11 -0.519630
2019-05-12 -0.372287
2019-05-13 1.195050
2019-05-14 -0.527554
2019-05-15 0.951183
2019-05-16 0.153516
2019-05-17 -0.818237
2019-05-18 0.635959
2019-05-19 0.543335
2019-05-20 1.706608
2019-05-21 -1.126285
2019-05-22 -1.210101
2019-05-23 -0.096912
2019-05-24 -0.810127
2019-05-25 0.454461
2019-05-26 1.536894
2019-05-27 -0.029693
2019-05-28 0.148896
2019-05-29 -0.767799
2019-05-30 -1.536733
2019-05-31 0.190801
Freq: D, dtype: float64
ds['2019-04':'2019-05']
1
2019-04-01 1.745744
2019-04-02 0.809381
2019-04-03 0.418581
2019-04-04 0.580189
2019-04-05 -0.008010
...
2019-05-27 -0.029693
2019-05-28 0.148896
2019-05-29 -0.767799
2019-05-30 -1.536733
2019-05-31 0.190801
Freq: D, Length: 61, dtype: float64
ds['2019']
1
2019-01-01 -1.278967
2019-01-02 0.023677
2019-01-03 -0.469362
2019-01-04 1.063142
2019-01-05 1.833249
...
2019-05-28 0.148896
2019-05-29 -0.767799
2019-05-30 -1.536733
2019-05-31 0.190801
2019-06-01 -2.300904
Freq: D, Length: 152, dtype: float64
# 时间/日期属性
属性 | 描述 |
---|---|
year | The year of the datetime |
month | The month of the datetime |
day | The days of the datetime |
hour | The hour of the datetime |
minute | The minutes of the datetime |
second | The seconds of the datetime |
microsecond | The microseconds of the datetime |
nanosecond | The nanoseconds of the datetime |
date | Returns datetime.date (does not contain timezone information) |
time | Returns datetime.time (does not contain timezone information) |
timetz | Returns datetime.time as local time with timezone information |
dayofyear | The ordinal day of year |
weekofyear | The week ordinal of the year |
week | The week ordinal of the year |
dayofweek | The number of the day of the week with Monday=0, Sunday=6 |
weekday | The number of the day of the week with Monday=0, Sunday=6 |
weekday_name | The name of the day in a week (ex: Friday) |
quarter | Quarter of the date: Jan-Mar = 1, Apr-Jun = 2, etc. |
days_in_month | The number of days in the month of the datetime |
is_month_start | Logical indicating if first day of month (defined by frequency) |
is_month_end | Logical indicating if last day of month (defined by frequency) |
is_quarter_start | Logical indicating if first day of quarter (defined by frequency) |
is_quarter_end | Logical indicating if last day of quarter (defined by frequency) |
is_year_start | Logical indicating if first day of year (defined by frequency) |
is_year_end | Logical indicating if last day of year (defined by frequency) |
is_leap_year | Logical indicating if the date belongs to a leap year |
today = pd.to_datetime(datetime.datetime.now())
today
1
2
2
Timestamp('2022-08-04 20:39:55.255040')
today.year
today.month
today.day
1
2
3
2
3
2022
8
4
上次更新: 2023/11/01, 03:11:44