CCF-BDCI2019乘用车细分市场销量预测 – 甲壳虫AI(竞赛)案例精选

摘要：

合集：AI案例-ML-零售业
赛题：乘用车细分市场销量预测
主办方：中国计算机学会 & 深瞳云涂
主页：https://www.datafountain.cn/competitions/352
AI问题：时间序列预测问题
数据集：60个乘用车车型在22个省份市场销量数据
数据集价值：乘用车细分市场销量预测
解决方案：结合了机器学习模型(LightGBM)和基于规则的预测方法

一、赛题描述

赛题背景

深瞳是一家大数据与行业智能应用解决方案运营商，为各行业客户提供数据分析与策略咨询服务，帮助行业客户进行数据资产化，为客户提供数据处理、建模分析服务。汽车行业是深瞳所重点服务的核心行业之一，长期服务于国内外知名汽车品牌客户。近几年来，国内汽车市场由增量市场逐步进入存量市场阶段，2018年整体市场销量首次同比下降。在市场整体趋势逐步改变的环境下，消费者购车决策的过程也正在从线下向线上转移，我们希望能在销量数据自身趋势规律的基础上，找到消费者在互联网上的行为数据与销量之间的相关性，为汽车行业带来更准确有效的销量趋势预测。

赛题任务

本赛题需要参赛队伍根据给出的60款车型在22个细分市场（省份）的销量连续24个月（从2016年1月至2018年12月）的销量数据，建立销量预测模型；基于该模型预测同一款车型和相同细分市场在接下来一个季度连续4个月份的销量；除销量数据外，还提供同时期的用户互联网行为统计数据，包括：各细分市场每个车型名称的互联网搜索量数据；主流汽车垂直媒体用户活跃数据等。参赛队伍可同时使用这些非销量数据用于建模。除了模型的准确性外，参赛队伍需对本赛题任务有系统性的思考和设计，在决赛阶段，参赛队伍对于所提交的模型的适应性、可扩展性、代码的工程性等方面也会影响参赛队伍的最终名次。

二、数据集内容

文件信息

历史销量数据

历史销量数据包含60个车型在22个省份，从2016年1月至2017年12月的销量。参赛队伍需要预测接下来4个月（2018年1月至2018年4月），这60个车型在22个省份的销量；参赛参赛队伍需自行划分训练集数据进行建模。

训练数据：train_sales_data.csv

字段名称 Field name	字段类型 Field type	字段说明 Field description
province	String	省份
adcode	int	省份编码
model	String	车型编码
bodyType	String	车身类型
regYear	int	年
regMonth	int	月
salesVolume	int	销量

数据样例：

province	adcode	model	bodyType	regYear	regMonth	salesVolume
上海	310000	3c974920a76ac9c1	SUV	2016	1	292
云南	530000	3c974920a76ac9c1	SUV	2016	1	466
内蒙古	150000	3c974920a76ac9c1	SUV	2016	1	257
北京	110000	3c974920a76ac9c1	SUV	2016	1	408
四川	510000	3c974920a76ac9c1	SUV	2016	1	610
安徽	340000	3c974920a76ac9c1	SUV	2016	1	206
山东	370000	3c974920a76ac9c1	SUV	2016	1	503
山西	140000	3c974920a76ac9c1	SUV	2016	1	236

车型搜索数据

训练数据：train_search_data.csv

字段名称	字段类型	字段说明
province	String	省份
adcode	int	省份编码
model	String	车型编码
regYear	int	年
regMonth	int	月
popularity	int	搜索量

数据样例：

province	adcode	model	regYear	regMonth	popularity
河南	410000	17bc272c93f19d56	2016	1	19036
河南	410000	17bc272c93f19d56	2016	2	17856
河南	410000	17bc272c93f19d56	2016	3	12517
河南	410000	17bc272c93f19d56	2016	4	9700
河南	410000	17bc272c93f19d56	2016	5	12780
河南	410000	17bc272c93f19d56	2016	6	11803

销量预测

2018年1月至4月的各车型各省份销量预测文件：evaluation_public.csv

字段名称	字段类型	字段说明
id	int	数据的唯一标识，不可更改
province	String	省份
adcode	int	省份编码改
model	String	车型编码
regYear	int	年
regMonth	int	月
forecastVolum	int	预测销量

数据样例：

id	province	adcode	model	regYear	regMonth
1	上海	310000	3c974920a76ac9c1	2018	1
2	云南	530000	3c974920a76ac9c1	2018	1
3	内蒙古	150000	3c974920a76ac9c1	2018	1
4	北京	110000	3c974920a76ac9c1	2018	1
5	四川	510000	3c974920a76ac9c1	2018	1
6	安徽	340000	3c974920a76ac9c1	2018	1
7	山东	370000	3c974920a76ac9c1	2018	1
8	山西	140000	3c974920a76ac9c1	2018	1

三、解决方案样例

解决方案

一个汽车销量预测系统，结合了机器学习模型(LightGBM)和基于规则的预测方法，最终将两种方法的结果进行融合输出预测结果。

安装开发库

参见《安装传统机器学习开发包》。

导入开发库

import time
import math
import warnings
import numpy as np
import pandas as pd
import lightgbm as lgb

工作流程如下：

1、数据读取与预处理

读取初赛和复赛的销售数据、搜索数据以及最终评估数据
标记出新车型(new_model字段)

pre_train_sale = pd.read_csv(r'.\pre_data\train_sales_data.csv')
input_data  = pd.read_csv(r'.\data\train_sales_data.csv')
final_data  = pd.read_csv(r'.\data\evaluation_public.csv')
search_data = pd.read_csv(r'.\data\train_search_data.csv')
#将复赛新车型标记出来
pre_model = list(set(list(pre_train_sale['model'])))
input_data['new_model'] = list(map(lambda x: 1 if pre_model.count(x) == 0 else 0,input_data['model']))
final_data['new_model'] = list(map(lambda x: 1 if pre_model.count(x) == 0 else 0,final_data['model']))

prepare()函数对数据进行预处理：

合并年份和月份创建日期字段
将分类特征(省份、车型、车身类型)转换为数值ID
创建时间ID(time_id)作为连续时间索引
删除不必要的列

def prepare(data):
    #对数据进行预处理，将各个属性转为数值特征
    data['date'] = list(map(lambda x,y:str(x)+"."+str(y),data['regYear'],data['regMonth']))
    data['date'] = pd.to_datetime(data['date'])
    if 'forecastVolum' in list(data.columns):
        data = data.drop(['forecastVolum'],axis=1)
    if 'province' in list(data.columns):
        pro_label = dict(zip(sorted(list(set(data['province']))), range(0, len(set(data['province'])))))
    model_label = dict(zip(sorted(list(set(data['model']))), range(0, len(set(data['model'])))))
    if 'bodyType' in list(data.columns):
       body_label = dict(zip(sorted(list(set(data['bodyType']))), range(0, len(set(data['bodyType'])))))
       data['body_id'] = data['bodyType'].map(body_label)
       data=data.drop(['bodyType'],axis=1)
    if 'province' in list(data.columns):
        data['pro_id'] = data['province'].map(pro_label)
    data['model_id'] = data['model'].map(model_label)
    data=data.drop(['regYear','regMonth','model'],axis=1)
    if 'province' in list(data.columns):
         data=data.drop(['adcode','province'],axis=1)
    data['month_id'] = data['date'].apply(lambda x : x.month)
    data['sales_year'] = data['date'].apply(lambda x : x.year)
    data['time_id'] = list(map(lambda x,y:(x-2016)*12+y,data['sales_year'],data['month_id']))
    data=data.drop(['date'],axis=1).rename(columns={'salesVolume':'label'})
    return data

2、特征工程

get_stat_feature()函数生成大量统计特征：

def get_stat_feature(df_,month):   
    data = df_.copy()
    stat_feat = []
    start = int((month-24)/3)*2
    start += int((month-24)/4)
    start = start-1 if start >=1 else start
    ...

特征总结：

特征类型	示例特征	数量	作用
原始历史特征	last_1_sale, last_1_popularity	22	提供基础历史数据
统计特征	1_6_sum, 1_6_max	8	反映销量整体水平
趋势特征	1_2_diff, jidu_1_2_diff	7	捕捉变化方向和速度
季节性特征	is_chunjie, is_yanhai	4	处理周期性影响
环比特征	huanbi_1_2, huanbi_2_3	9	短期变化比率
占比特征	sale_ratio_pro_body	12	相对市场份额
同比特征	increase16_4	7	年度周期性变化

2.1、历史销量特征

过去1-16个月的销量(last_1_sale到last_16_sale)。通过调整time_id获取历史数据。只保留前6个月销量作为基础特征(其他用于特殊计算)。

for last in range(1,17):  
    tmp=data.copy()
    tmp['time_id'] = list(map(lambda x:x+last+start if x+last+start<=28 else -1,tmp['time_id']))
    tmp = tmp[~tmp['time_id'].isin([-1])][['label','time_id','pro_id','model_id','body_id']]
    tmp = tmp.rename(columns={'label':'last_{0}_sale'.format(last)})
    data = pd.merge(data,tmp,how='left',on=['time_id','pro_id','model_id','body_id'])
    if last <= 6:
        stat_feat.append('last_{0}_sale'.format(last))

2.2、历史搜索热度特征

过去1-16个月的搜索热度（last_1_popularity到last_16_popularity）。保留前6个月和11-13个月的搜索热度作为特征。搜索热度可能反映市场关注度，与销量有相关性。

for last in range(1,17):  
    tmp=data.copy()
    tmp['time_id']=list(map(lambda x:x+last+start if x+last+start<=28 else -1,tmp['time_id']))
    tmp=tmp[~tmp['time_id'].isin([-1])][['popularity','time_id','pro_id','model_id','body_id']]
    tmp=tmp.rename(columns={'popularity':'last_{0}_popularity'.format(last)})
    data=pd.merge(data,tmp,how='left',on=['time_id','pro_id','model_id','body_id'])
    if last<=6 or (last>=11 and last<=13):
        stat_feat.append('last_{0}_popularity'.format(last))

2.3、统计量特征

计算过去6个月销量的各种统计量，包含：总和、均值、最大值、最小值。对应的变量为：1_6_sum、1_6_mea、1_6_max、1_6_min、jidu_1_3_sum、jidu_4_6_sum。季度(1-3月和4-6月)销量总和。意义为捕捉销量的整体水平和波动情况。

data['1_6_sum'] = data.loc[:,'last_1_sale':'last_6_sale'].sum(1)
data['1_6_mea'] = data.loc[:,'last_1_sale':'last_6_sale'].mean(1)
data['1_6_max'] = data.loc[:,'last_1_sale':'last_6_sale'].max(1)
data['1_6_min'] = data.loc[:,'last_1_sale':'last_6_sale'].min(1)
data['jidu_1_3_sum'] = data.loc[:,'last_1_sale':'last_3_sale'].sum(1)
data['jidu_4_6_sum'] = data.loc[:,'last_4_sale':'last_6_sale'].sum(1)

2.5、趋势特征

不同时间段销量的差值。对应的变量为：1_2_diff、1_3_diff、2_3_diff、jidu_1_2_diff。

data['1_2_diff'] = data['last_1_sale'] - data['last_2_sale']
data['1_3_diff'] = data['last_1_sale'] - data['last_3_sale']
data['2_3_diff'] = data['last_2_sale'] - data['last_3_sale']
data['jidu_1_2_diff'] = data['jidu_1_3_sum'] - data['jidu_4_6_sum']

2.4、季节性特征

春节相关特征、沿海城市标记

环比特征：相邻月份销量比值
占比特征：省份-bodyType销量占比、省份总销量占比等
同比特征：与一年前同期销量的比较

yanhaicity={1,2,5,7,9,13,16,17}
data['is_yanhai'] = list(map(lambda x:1 if x in yanhaicity else 0,data['pro_id']))
data['is_chunjie'] = list(map(lambda x:1 if x==2 or x==13 or x==26 else 0,data['time_id']))

2.6、聚合特征

计算不同维度(省份、bodyType)的销量占比。反映车型在特定区域或类型的市场份额。

# 省份-bodyType维度聚合
pivot = pd.pivot_table(data,index=['time_id','pro_id','body_id'],values=last_time,aggfunc=np.sum)
data['last_{0}_sale_ratio_pro_body_last_{0}_sale_sum'.format(i,i)]=list(map(lambda x,y:x/y if y!=0 else 0,data[last_time],data['pro_body_last_{0}_sale_sum'.format(i)]))

# 省份维度聚合
pivot = pd.pivot_table(data,index=['time_id','pro_id'],values=last_time,aggfunc=np.sum)
data['last_{0}_sale_ratio_pro_last_{0}_sale_sum'.format(i,i)]=list(map(lambda x,y:x/y if y!=0 else 0,data[last_time],data['pro__last_{0}_sale_sum'.format(i)]))

2.7、同比特征

计算与一年前同期的变化率。捕捉年度周期性变化模式。

data["increase16_4"]=(data["last_16_sale"] - data["last_4_sale"]) / data["last_16_sale"]
data["increase_mean_province_16_4"] = (data["mean_province_16"] - data["mean_province_4"]) / data["mean_province_16"]

3、LightGBM模型训练

使用LGBMRegressor进行回归预测
特征包括数值特征和类别特征
对销量进行对数变换处理(math.log(x+1,lg))
分月预测(25-28月，对应2018年1-4月)
分别对初赛60车型和全部82车型训练模型

函数：get_model_type

def get_model_type():   
    model = lgb.LGBMRegressor(
            num_leaves=2**5-1, reg_alpha=0.25, reg_lambda=0.25, objective='mse',
            max_depth=-1, learning_rate=0.05, min_child_samples=5, random_state=2019,
            n_estimators=600, subsample=0.9, colsample_bytree=0.7,
            )
    return model

函数：get_train_model

def get_train_model(df_, m, m_type,features, num_feat, cate_feat):
    
    df = df_.copy()
    # 数据集划分
    all_idx   = df['time_id'].between(7 , m-1)
    test_idx  = df['time_id'].between(m , m  )
    #初始化model    
    model = get_model_type()
    # model.fit(df[all_idx][features], df[all_idx]['label'], categorical_feature=cate_feat,verbose=100)
    model.fit(df[all_idx][features], df[all_idx]['label'], categorical_feature=cate_feat)
    df['forecastVolum'] = model.predict(df[features]) 
    sub = df[test_idx][['id']]
    sub['forecastVolum'] = df[test_idx]['forecastVolum'].apply(lambda x: 2.0 if x < 0 else x)
    return sub

函数：LGB

def LGB(input_data,is_get_82_model):
    #采用lightgbm销量进行预测，这里采取分月预测的形式，分别预测1 2 3 4月
    #同时，分别对初赛和复赛的车型进行分别预测，在预测初赛的车型时只使用初赛的数据，在预测复赛新加的车型时使用全部数据
    if is_get_82_model == 0:
        input_data = input_data[input_data['new_model']==0]
    input_data['label'] = list(map(lambda x : x if x==np.NAN else math.log(x+1,lg),input_data['label']))
    input_data['salesVolume'] = list(map(lambda x : x if x==np.NAN else math.log(x+1,lg),input_data['salesVolume']))
    input_data['jidu_id'] = ((input_data['month_id']-1)/3+1).map(int)
    '******************************分月预测************************************************************'
    for month in [25,26,27,28]: 
        m_type = 'lgb' 
        data_df, stat_feat = get_stat_feature(input_data,month)
        num_feat = ['sales_year']+stat_feat
        cate_feat = ['pro_id','body_id','model_id','month_id','jidu_id']
        for i in cate_feat:
            data_df[i] = data_df[i].astype('category')
        features = num_feat + cate_feat
        sub = get_train_model(data_df, month, m_type, features, num_feat, cate_feat)   
        input_data.loc[(input_data.time_id==month),  'salesVolume'] = sub['forecastVolum'].values
        input_data.loc[(input_data.time_id==month),  'label'      ] = sub['forecastVolum'].values
    input_data['salesVolume'] = list(map(lambda x : x if x==np.NAN else (lg**(x))-1, input_data['salesVolume']))
    input_data['salesVolume'] = list(map(lambda x,y: x*0.95 if y == 26 else x,input_data['salesVolume'],input_data['time_id']))
    input_data['salesVolume'] = list(map(lambda x,y: x*0.98 if y == 27 else x,input_data['salesVolume'],input_data['time_id']))
    input_data['salesVolume'] = list(map(lambda x,y: x*0.90 if y == 28 else x,input_data['salesVolume'],input_data['time_id']))
    sub = input_data.loc[(input_data.time_id >= 25),['id','salesVolume']]
    sub.columns = ['id','forecastVolum']
    sub['id'] = sub['id'].map(int)
    sub['forecastVolum'] = sub['forecastVolum'].map(round)
    return sub

函数：get_lgb_ans

为LightGBM模型训练的入口函数。

def get_lgb_ans(input_data):
    #对销量进行预测，并返回最终lgb预测的结果
    print('use 60 models to train lgb model...')
    sub_60=LGB(input_data,0)
    print('use 82 models to train lgb model...')
    sub_82=LGB(input_data,1)
    input_data = pd.merge(input_data,sub_60,on='id',how='left')
    input_data = pd.merge(input_data,sub_82,on='id',how='left')
    input_data = input_data.loc[input_data.time_id>=25,['id','forecastVolum_x','forecastVolum_y']]
    input_data = input_data.fillna(-1)
    input_data['forecastVolum'] = list(map(lambda x,y:y if x==-1 else x,input_data['forecastVolum_x'],input_data['forecastVolum_y']))
    input_data = input_data[['id','forecastVolum']]
    input_data['id'] = input_data['id'].map(int)
    input_data['forecastVolum'] = input_data['forecastVolum'].map(int)
    return input_data

4、基于规则的预测

趋势因子计算：

计算不同时间段(1-3月、4-6月等)的年度同比变化率
对极端趋势值进行平滑处理
加权计算最终趋势因子

三次指数平滑：

使用exp_smooth()函数实现Holt-Winters三次指数平滑
基于历史24个月数据预测未来4个月销量

Holt-Winters 基于指数平滑技术，包含三个组件：

水平（Level）：销量的基准值。
趋势（Trend）：销量的长期增长或下降趋势。
季节性（Seasonality）：周期性波动（如月度、季度规律）。

根据季节性类型，模型分为两种：

加法模型：季节性波动的幅度不随时间变化（适用于稳定季节性）。
乘法模型：季节性波动的幅度随销量水平变化（适用于增长/衰减的季节性）。

加权预测：

结合去年同期、前年同期和近期销量进行加权预测
应用趋势因子调整预测结果

5、结果融合阶段

fusion()函数将规则模型和LightGBM模型的结果进行几何加权融合：

对初赛60车型和复赛新增22车型采用不同的权重
对不同预测月份(1-4月)采用不同的权重
最终结果取整后输出

几何加权函数：fusion

def fusion(sub,sub_rule,sub_lgb):
    sub['rule'] = sub_rule['forecastVolum'].values
    sub['lgb'] = sub_lgb['forecastVolum'].values
    '60个车型1-4月融合'
    sub['forecastVolum'] = -1
    sub['forecastVolum'] = list(map(lambda x,y,z,m,f:(math.pow(x,0.40) * math.pow(y,0.60)) if z==0 and m==25 else f,sub['rule'],sub['lgb'],sub['new_model'],sub['time_id'],sub['forecastVolum']))
    sub['forecastVolum'] = list(map(lambda x,y,z,m,f:(math.pow(x,0.40) * math.pow(y,0.60)) if z==0 and m==26 else f,sub['rule'],sub['lgb'],sub['new_model'],sub['time_id'],sub['forecastVolum']))
    sub['forecastVolum'] = list(map(lambda x,y,z,m,f:(math.pow(x,0.50) * math.pow(y,0.50)) if z==0 and m==27 else f,sub['rule'],sub['lgb'],sub['new_model'],sub['time_id'],sub['forecastVolum']))
    sub['forecastVolum'] = list(map(lambda x,y,z,m,f:(math.pow(x,0.40) * math.pow(y,0.60)) if z==0 and m==28 else f,sub['rule'],sub['lgb'],sub['new_model'],sub['time_id'],sub['forecastVolum']))
    '22个车型1-4月融合'
    sub['forecastVolum'] = list(map(lambda x,y,z,m,f:(math.pow(x,0.35) * math.pow(y,0.65)) if z==1 and m<=26 else f,sub['rule'],sub['lgb'],sub['new_model'],sub['time_id'],sub['forecastVolum']))
    sub['forecastVolum'] = list(map(lambda x,y,z,m,f:(math.pow(x,0.40) * math.pow(y,0.60)) if z==1 and m==27 else f,sub['rule'],sub['lgb'],sub['new_model'],sub['time_id'],sub['forecastVolum']))
    sub['forecastVolum'] = list(map(lambda x,y,z,m,f:(math.pow(x,0.40) * math.pow(y,0.60)) if z==1 and m==28 else f,sub['rule'],sub['lgb'],sub['new_model'],sub['time_id'],sub['forecastVolum']))
    sub = sub[['id','forecastVolum']]
    sub['id'] = sub['id'].map(int)
    sub['forecastVolum'] = sub['forecastVolum'].map(int)
    return sub

6、执行流程

读取并预处理所有输入数据
训练LightGBM模型并预测
训练规则模型并预测
融合两种方法的预测结果
保存最终预测结果到CSV文件
计算并输出总执行时间

函数：main

调用 get_lgb_ans() 和 rule()。

if __name__=="__main__":
    # start = time.clock()
    # 记录开始时间
    start = time.perf_counter()
    print('train lgb model...')
    sub_lgb = get_lgb_ans(input_data)
    print('train rule model...')
    sub_rule = rule()
    print('blend lgb and rule...')
    sub = fusion(final_data, sub_rule, sub_lgb)
    print('save final result...')
    sub.to_csv(r'.\sub\sub.csv',index=False)
    print('all procedures are over...')

    # 记录结束时间
    end = time.perf_counter()
    # 计算并打印执行时间
    elapsed_time = end - start
    print(f"程序执行时间: {elapsed_time} 秒")

运行

python demo.py

train lgb model...
use 60 models to train lgb model...
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.012737 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 21886
[LightGBM] [Info] Number of data points in the train set: 23760, number of used features: 99
[LightGBM] [Info] Start training from score 8.503720
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013920 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 21885
[LightGBM] [Info] Number of data points in the train set: 25080, number of used features: 99
[LightGBM] [Info] Start training from score 8.502387
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.017268 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 21885
[LightGBM] [Info] Number of data points in the train set: 26400, number of used features: 99
[LightGBM] [Info] Start training from score 8.479091
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.012928 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 21885
[LightGBM] [Info] Number of data points in the train set: 27720, number of used features: 99
[LightGBM] [Info] Start training from score 8.457191
use 82 models to train lgb model...
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.016119 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 21930
[LightGBM] [Info] Number of data points in the train set: 32472, number of used features: 99
[LightGBM] [Info] Start training from score 8.248549
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.018482 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 21931
[LightGBM] [Info] Number of data points in the train set: 34276, number of used features: 99
[LightGBM] [Info] Start training from score 8.240836
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.017528 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 21931
[LightGBM] [Info] Number of data points in the train set: 36080, number of used features: 99
[LightGBM] [Info] Start training from score 8.212591
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.018110 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 21931
[LightGBM] [Info] Number of data points in the train set: 37884, number of used features: 99
[LightGBM] [Info] Start training from score 8.187218
train rule model...
blend lgb and rule...
save final result...
all procedures are over...
程序执行时间: 79.44504269992467 秒

输出文件 sub.csv 数据样例如下：

其中id为销量预测表格evaluation_public.csv中的编号。

id  forecastVolum
1   258
2   309
3   162
4   246
5   368
6   181
7   442
8   190
...

技术亮点

特征工程丰富：构建了大量时间序列相关特征，包括历史统计量、趋势、季节性等
模型融合：结合了机器学习方法和基于规则的预测，发挥各自优势
分而治之：对不同车型(初赛/复赛)和不同月份采用不同的处理策略
数据平滑：使用对数变换和指数平滑处理数据，减少极端值影响
时间效率：使用LightGBM这种高效梯度提升框架，适合大规模数据

这个系统通过精心设计的特征工程和巧妙的模型融合，实现了较为准确的销量预测，是一个典型的时间序列预测解决方案。

源码开源协议

GPL-3.0 license

四、获取案例套装

需要登录后才允许下载文件包。登录

一、赛题描述

赛题背景

赛题任务

二、数据集内容

文件信息

历史销量数据

车型搜索数据

销量预测

三、解决方案样例

解决方案

安装开发库

导入开发库

1、数据读取与预处理

2、特征工程

2.1、历史销量特征

2.2、历史搜索热度特征

2.3、统计量特征

2.5、趋势特征

2.4、季节性特征

2.6、聚合特征

2.7、同比特征

3、LightGBM模型训练

4、基于规则的预测

5、结果融合阶段

6、执行流程

运行

技术亮点

源码开源协议

四、获取案例套装

发表评论 取消回复

发表评论取消回复