1:A 2:D 3:C 4:A 5:D 6:B 7:A 8:B
(请在问题下面的空白框写出代码并执行以输出结果)
请读者从akshare网站(https://www.akshare.xyz/) 选取2000年1月~2017年12月的全国居民消费价格指数CPI(月度数据,上年同月=100)作为样本数据,用Python语言命令进行数据分析,并用预测方法进行 预测。
#!pip install akshare
import akshare as ak
cpi = ak.macro_china_cpi_monthly()
cpi.sort_index(inplace=True)
cpi
1996-02-01 2.1 1996-03-01 2.3 1996-04-01 0.6 1996-05-01 0.7 1996-06-01 -0.5 ... 2020-12-09 -0.6 2021-01-11 0.7 2021-02-10 1.0 2021-03-10 0.6 2021-04-09 -0.5 Name: cpi, Length: 303, dtype: float64
import pandas as pd
cpi=pd.DataFrame({'cpi':cpi});
cpi=cpi[(cpi.index>='2000.1') & (cpi.index<='2017.12')]
cpi.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 215 entries, 2000-01-01 to 2017-11-09 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 cpi 215 non-null float64 dtypes: float64(1) memory usage: 3.4 KB
import matplotlib.pyplot as plt #加载基本绘图包
plt.plot(cpi.index,cpi.cpi);
CPI=pd.DataFrame(cpi.cpi)#简单移动平均法进行预测
CPI['M2']=cpi.cpi.rolling(3).mean()
CPI['M4']=cpi.cpi.rolling(5).mean();
CPI
cpi | M2 | M4 | |
---|---|---|---|
2000-01-01 | 0.0 | NaN | NaN |
2000-02-01 | 0.9 | NaN | NaN |
2000-03-01 | 1.9 | 0.933333 | NaN |
2000-04-01 | -1.6 | 0.400000 | NaN |
2000-05-01 | -0.9 | -0.200000 | 0.06 |
... | ... | ... | ... |
2017-07-10 | -0.2 | -0.066667 | -0.14 |
2017-08-09 | 0.1 | -0.066667 | -0.08 |
2017-09-09 | 0.4 | 0.100000 | 0.06 |
2017-10-16 | 0.5 | 0.333333 | 0.14 |
2017-11-09 | 0.1 | 0.333333 | 0.18 |
215 rows × 3 columns
sz = ak.stock_zh_index_daily(symbol='sh000001')
sz.sort_index(inplace=True);
sz=sz[(sz.index>='2015.01.01') & (sz.index<='2017.12.31')]
sz
open | high | low | close | volume | |
---|---|---|---|---|---|
date | |||||
2015-01-05 00:00:00+00:00 | 3258.627 | 3369.281 | 3253.883 | 3350.519 | 5.313524e+10 |
2015-01-06 00:00:00+00:00 | 3330.799 | 3394.224 | 3303.184 | 3351.446 | 5.016617e+10 |
2015-01-07 00:00:00+00:00 | 3326.649 | 3374.896 | 3312.211 | 3373.954 | 3.919189e+10 |
2015-01-08 00:00:00+00:00 | 3371.957 | 3381.566 | 3285.095 | 3293.456 | 3.711312e+10 |
2015-01-09 00:00:00+00:00 | 3276.965 | 3404.834 | 3267.509 | 3285.412 | 4.102409e+10 |
... | ... | ... | ... | ... | ... |
2017-12-25 00:00:00+00:00 | 3296.211 | 3312.300 | 3270.441 | 3280.461 | 1.468936e+10 |
2017-12-26 00:00:00+00:00 | 3277.837 | 3307.299 | 3274.327 | 3306.125 | 1.424345e+10 |
2017-12-27 00:00:00+00:00 | 3302.461 | 3307.080 | 3270.349 | 3275.783 | 1.626749e+10 |
2017-12-28 00:00:00+00:00 | 3272.291 | 3304.096 | 3263.728 | 3296.385 | 1.753717e+10 |
2017-12-29 00:00:00+00:00 | 3295.246 | 3308.225 | 3292.770 | 3307.172 | 1.415868e+10 |
732 rows × 5 columns
sz['returns'] = (sz['close']-sz['close'].shift(1))/sz['close'].shift(1);sz #计算收益率
open | high | low | close | volume | returns | |
---|---|---|---|---|---|---|
date | ||||||
2015-01-05 00:00:00+00:00 | 3258.627 | 3369.281 | 3253.883 | 3350.519 | 5.313524e+10 | NaN |
2015-01-06 00:00:00+00:00 | 3330.799 | 3394.224 | 3303.184 | 3351.446 | 5.016617e+10 | 0.000277 |
2015-01-07 00:00:00+00:00 | 3326.649 | 3374.896 | 3312.211 | 3373.954 | 3.919189e+10 | 0.006716 |
2015-01-08 00:00:00+00:00 | 3371.957 | 3381.566 | 3285.095 | 3293.456 | 3.711312e+10 | -0.023859 |
2015-01-09 00:00:00+00:00 | 3276.965 | 3404.834 | 3267.509 | 3285.412 | 4.102409e+10 | -0.002442 |
... | ... | ... | ... | ... | ... | ... |
2017-12-25 00:00:00+00:00 | 3296.211 | 3312.300 | 3270.441 | 3280.461 | 1.468936e+10 | -0.005035 |
2017-12-26 00:00:00+00:00 | 3277.837 | 3307.299 | 3274.327 | 3306.125 | 1.424345e+10 | 0.007823 |
2017-12-27 00:00:00+00:00 | 3302.461 | 3307.080 | 3270.349 | 3275.783 | 1.626749e+10 | -0.009178 |
2017-12-28 00:00:00+00:00 | 3272.291 | 3304.096 | 3263.728 | 3296.385 | 1.753717e+10 | 0.006289 |
2017-12-29 00:00:00+00:00 | 3295.246 | 3308.225 | 3292.770 | 3307.172 | 1.415868e+10 | 0.003272 |
732 rows × 6 columns
sz.returns.plot().axhline(y=0); #收益率变动分析
import numpy as np
re=sz.returns[1:];re
m1=np.arange(len(re));m1[:20]
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
#构建一个简单的趋势函数来进行模型选择。
import statsmodels.api as sm
import warnings #忽视警告信息
warnings.filterwarnings("ignore")
def trendmodel(y,x): #定义两变量直线趋势回归模型,x 为自变量,y 为因变量
fm=sm.OLS(y,sm.add_constant(x)).fit()
sfm=fm.summary2()
print("模型检验:\n",sfm.tables[1])
print("决定系数:",sfm.tables[0][1][6])
return fm.fittedvalues
l1=trendmodel(re,m1);
模型检验: Coef. Std.Err. t P>|t| [0.025 0.975] const 2.621221e-04 0.001233 0.212595 0.831702 -0.002158 0.002683 x1 -3.812012e-07 0.000003 -0.130351 0.896325 -0.000006 0.000005 决定系数: 0.000
l2=trendmodel(np.log(re),m1)
模型检验: Coef. Std.Err. t P>|t| [0.025 0.975] const NaN NaN NaN NaN NaN NaN x1 NaN NaN NaN NaN NaN NaN 决定系数: nan
l3=trendmodel(np.exp(re),m1)
模型检验: Coef. Std.Err. t P>|t| [0.025 0.975] const 1.000607e+00 0.001222 819.047451 0.000000 0.998208 1.003005 x1 -9.469541e-07 0.000003 -0.326803 0.743911 -0.000007 0.000005 决定系数: 0.000
由以上可知,线性模型,指数模型,对数模型都不是最适合模型。