黄金价格数据集(1833-2024)和趋势预测 – 甲壳虫AI(竞赛)案例精选

摘要：

合集：AI案例-ML-泛金融业
AI问题：回归预测问题
数据集：黄金价格数据集(1833-2024)
数据集价值：预测未来黄金价格、实时监控与预警。
解决方案：线性回归

一、问题描述

黄金价格在很长一段时间内保持相对稳定。例如，艾萨克·牛顿爵士作为英国铸币局局长，在1717年将黄金价格定为每盎司3英镑17先令10便士，这一价格实际上保持了200年不变，直到1914年。唯一的例外是在1797年至1821年的拿破仑战争期间。从1792年至今，美国政府的官方黄金价格只变动了四次。起始价格为每盎司19.75美元，1834年提高到20.67美元，1934年提高到35美元。1972年，价格提高到38美元，然后在1973年提高到42.22美元。1968年创建了一个双层定价系统，从那时起，黄金的市场价格就可以自由波动。

人工智能在黄金价格预测中的应用：

历史价格分析：人工智能可以通过分析黄金价格的历史数据，识别出价格波动的模式和趋势。这有助于投资者更好地理解黄金市场的运作规律，并做出更明智的投资决策。
预测未来价格：利用机器学习算法，人工智能可以对黄金价格进行预测。这些算法可以处理大量复杂的数据，并发现数据之间的潜在关系。通过训练模型并不断优化参数，人工智能可以提高对未来黄金价格预测的准确性。
实时监控与预警：人工智能系统可以实时监控黄金市场的动态变化，并在关键价格点位或市场趋势发生变化时及时发出预警。这有助于投资者及时调整投资策略，降低风险并把握市场机会。

二、数据集内容

摘自蒂莫西·格林的历史黄金价格表，伦敦价格转换为美元。

数据样例

monthly.csv （美元/盎司）

Date	Price
1833-01	18.93
1833-02	18.93
1833-03	18.93
…	…
Aug-24	2470.15
Sep-24	2570.55

数据集版权许可协议

The maintainers have licensed under the Public Domain Dedication and License.

三、黄金价格分析和预测

导入开发包

# LinearRegression is a machine learning library for linear regression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# pandas and numpy are used for data manipulation
import pandas as pd
import numpy as np
from math import sqrt
from numpy import log
from pandas import Series

from statsmodels.tsa.arima_model import ARMA
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.stattools import adfuller, arma_order_select_ic
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
import statsmodels as sm

# matplotlib and seaborn are used for plotting graphs
import matplotlib.pyplot as plt
from matplotlib.dates import date2num
import seaborn as sns
from datetime import datetime
import subprocess

导入数据集

使用data_usd.csv进行数据分析，分割训练数据和测试数据。

ds_gold = 'US dollar'
ds_etf = 'Close'
date_format = '%Y-%m-%d'
df = pd.read_csv("./data/data_usd.csv")
df = df[['Name', ds_gold]]
df['Name'] = [datetime.strptime(i, date_format) for i in df['Name']]
df.set_index('Name')
# df.index = pd.to_datetime(df.index, format=date_format)
print(df.columns)
dd =df

数据样例：

其中Name字段值为每个月的最后一天。

	Name	US dollar
0	1978/12/31	207.8
1	1979/1/31	227.3
2	1979/2/28	245.7
3	1979/3/30	242.1

定义探索变量

df[ds_gold]为每个月的黄金价格。这段代码的主要目的是在数据框（DataFrame）df中计算两个新的特征（S_1和S_2），这两个特征分别表示目标变量（ds_gold）过去3个月和12个月的移动平均值。然后，它将数据框中的缺失值删除，并将新计算的特征作为自变量（X）。X = df[['S_1', 'S_2']]: 我们将新创建的两个特征（即3个月和12个月的移动平均）作为自变量。将y = df[ds_gold]: 每个月的黄金价格的原始数据作为因变量。

最后，它绘制了两个新特征与名称（Name）之间的关系图。

plt.plot(df['Name'], df['S_1']) 和 plt.plot(df['Name'], df["S_2"]): 这两行代码用于绘制移动平均价格与时间（假设Name列包含月份信息）的关系图。

# Define exploratory variables
# Finding moving average of past 3 months and 12 months
df['S_1'] = df[ds_gold].shift(1).rolling(window=3).mean()
df['S_2'] = df[ds_gold].shift(1).rolling(window=12).mean()
df = df.dropna()
X = df[['S_1', 'S_2']]
X.head()
plt.plot(df['Name'], df['S_1'])
plt.plot(df['Name'], df["S_2"])
plt.show()
# dependent variable
y = df[ds_gold]
y.head()

基于新的特征X（S_1和S_2）（即3个月和12个月的移动平均）和每个月的黄金价格y数据集切割成训练数据X_train和测试数据y_train：

# Split into train and test
t = 0.2
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=t, shuffle=False)

线性回归分析

以下这段代码执行了线性回归分析，用于预测黄金价格。

1. 执行线性回归

使用训练数据 X_train 和 y_train 来拟合一个线性回归模型。

linear = LinearRegression().fit(X_train, y_train)

2. 打印回归系数和截距

print("Gold Price =", round(linear.coef_[0], 2), "* 3 Month Moving Average", round(
    linear.coef_[1], 2), "* 12 Month Moving Average +", round(linear.intercept_, 2))

打印出线性回归模型的系数和截距。这里假设 X_train 包含两个特征：3个月移动平均和12个月移动平均。
- linear.coef_[0] 是3个月移动平均的系数。
- linear.coef_[1] 是12个月移动平均的系数。
- linear.intercept_ 是模型的截距。

输出：

Gold Price = -1.59 * 3 Month Moving Average -4.96 * 12 Month Moving Average + 0.05

3. 预测价格并绘图

predicted_price = linear.predict(X_test)
predicted_price = pd.DataFrame(
    predicted_price, index=y_test.index, columns=['price'])
predicted_price.plot(figsize=(10, 5))
y_test.plot()
plt.legend(['predicted_price', 'actual_price'])
plt.ylabel("Gold Price")
# 设置x轴为日期格式
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter(date_format))
plt.gcf().autofmt_xdate()  # 自动调整日期标签的格式
plt.show()

使用测试数据 X_test 进行价格预测。
将预测结果转换为 Pandas DataFrame，并设置索引为 y_test 的索引。
绘制预测价格和实际价格的图表，并添加图例和标签。

4. 计算 R 平方和 RMSE

r2_score = linear.score(X_test, y_test)*100
print("R square for regression", float("{0:.2f}".format(r2_score)))
sqrt(mean_squared_error(y_test,predicted_price))

计算 R 平方值（决定系数），表示模型解释的数据变异程度。
计算均方根误差（RMSE），表示预测值与实际值之间的平均差异。

输出：

R square for regression 85.17
72.17024736057834

四、获取案例套件

需要登录后才允许下载文件包。登录需要登录后才允许下载文件包。登录