摘要:
合集:AI案例-NLP-零售业
数据集:Amazon(1996~2023)美肤类商品评论数据集
数据集价值:情绪分析文本分类
一、问题描述
情绪分析是最常见的文本分类工具。这个过程会分析文本片段以确定情绪倾向是积极的、消极的还是中性的。在监控在线会话时了解你的品牌、产品或服务引发的社会情绪是现代商业活动的基本工具之一,而情绪分析是实现这一目标的第一步。
这个数据集可以为任何产品创建情绪分析的入门模型,你可以使用它来快速创建可用于生产的模型。
二、数据集内容
商品信息
文件:meta_All_Beauty.jsonl
内容:对于Amazon美肤类商品,从1996/5到2023年9月的商品信息。
数据结构
Field | Type | Explanation |
---|---|---|
main_category | str | Main category (i.e., domain) of the product. |
title | str | Name of the product. |
average_rating | float | Rating of the product shown on the product page. |
rating_number | int | Number of ratings in the product. |
features | list | Bullet-point format features of the product. |
description | list | Description of the product. |
price | float | Price in US dollars (at time of crawling). |
images | list | Images of the product. Each image has different sizes (thumb, large, hi_res). The “variant” field shows the position of image. |
videos | list | Videos of the product including title and url. |
store | str | Store name of the product. |
categories | list | Hierarchical categories of the product. |
details | dict | Product details, including materials, brand, sizes, etc. |
parent_asin | str | Parent ID of the product. |
bought_together | list | Recommended bundles from the websites. |
数据样例
样例:
{"main_category": "All Beauty", "title": "Howard LC0008 Leather Conditioner, 8-Ounce (4-Pack)", "average_rating": 4.8, "rating_number": 10, "features": [], "description": [], "price": null, "images": [{"thumb": "https://m.media-amazon.com/images/I/41qfjSfqNyL._SS40_.jpg", "large": "https://m.media-amazon.com/images/I/41qfjSfqNyL.jpg", "variant": "MAIN", "hi_res": null}, {"thumb": "https://m.media-amazon.com/images/I/41w2yznfuZL._SS40_.jpg", "large": "https://m.media-amazon.com/images/I/41w2yznfuZL.jpg", "variant": "PT01", "hi_res": "https://m.media-amazon.com/images/I/71i77AuI9xL._SL1500_.jpg"}], "videos": [], "store": "Howard Products", "categories": [], "details": {"Package Dimensions": "7.1 x 5.5 x 3 inches; 2.38 Pounds", "UPC": "617390882781"}, "parent_asin": "B01CUPMQZE", "bought_together": null}
{"main_category": "All Beauty", "title": "Yes to Tomatoes Detoxifying Charcoal Cleanser (Pack of 2) with Charcoal Powder, Tomato Fruit Extract, and Gingko Biloba Leaf Extract, 5 fl. oz.", "average_rating": 4.5, "rating_number": 3, "features": [], "description": [], "price": null, "images": [{"thumb": "https://m.media-amazon.com/images/I/41b+11d5igL._SS40_.jpg", "large": "https://m.media-amazon.com/images/I/41b+11d5igL.jpg", "variant": "MAIN", "hi_res": "https://m.media-amazon.com/images/I/71g1lP0pMbL._SL1500_.jpg"}, {"thumb": "https://m.media-amazon.com/images/I/41j2ocUzCtL._SS40_.jpg", "large": "https://m.media-amazon.com/images/I/41j2ocUzCtL.jpg", "variant": "PT01", "hi_res": "https://m.media-amazon.com/images/I/81OqvR94isL._SL1500_.jpg"}], "videos": [], "store": "Yes To", "categories": [], "details": {"Item Form": "Powder", "Skin Type": "Acne Prone", "Brand": "Yes To", "Age Range (Description)": "Adult", "Unit Count": "10 Fl Oz", "Is Discontinued By Manufacturer": "No", "Item model number": "SG_B076WQZGPM_US", "UPC": "653801351125", "Manufacturer": "Yes to Tomatoes"}, "parent_asin": "B076WQZGPM", "bought_together": null}
用户评论数据
文件:All_Beauty.jsonl
内容:对于Amazon美肤类商品,从1996/5到2023年9月的商品评论信息。
数据结构
Field | Type | Explanation |
---|---|---|
rating | float | Rating of the product (from 1.0 to 5.0). |
title | str | Title of the user review. |
text | str | Text body of the user review. |
images | list | Images that users post after they have received the product. Each image has different sizes (small, medium, large), represented by the small_image_url, medium_image_url, and large_image_url respectively. |
asin | str | ID of the product. (asin – Amazon Standard Identification Number) |
parent_asin | str | Parent ID of the product. Note: Products with different colors, styles, sizes usually belong to the same parent ID. The “asin” in previous Amazon datasets is actually parent ID. Please use parent ID to find product meta. |
user_id | str | ID of the reviewer |
timestamp | int | Time of the review (unix time) |
verified_purchase | bool | User purchase verification |
helpful_vote | int | Helpful votes of the review |
数据样例
样例:
{"rating": 5.0, "title": "Such a lovely scent but not overpowering.", "text": "This spray is really nice. It smells really good, goes on really fine, and does the trick. I will say it feels like you need a lot of it though to get the texture I want. I have a lot of hair, medium thickness. I am comparing to other brands with yucky chemicals so I'm gonna stick with this. Try it!", "images": [], "asin": "B00YQ6X8EO", "parent_asin": "B00YQ6X8EO", "user_id": "AGKHLEW2SOWHNMFQIJGBECAF7INQ", "timestamp": 1588687728923, "helpful_vote": 0, "verified_purchase": true}
{"rating": 4.0, "title": "Works great but smells a little weird.", "text": "This product does what I need it to do, I just wish it was odorless or had a soft coconut smell. Having my head smell like an orange coffee is offputting. (granted, I did know the smell was described but I was hoping it would be light)", "images": [], "asin": "B081TJ8YS3", "parent_asin": "B081TJ8YS3", "user_id": "AGKHLEW2SOWHNMFQIJGBECAF7INQ", "timestamp": 1588615855070, "helpful_vote": 1, "verified_purchase": true}
数据集引用要求
@article{hou2024bridging,
title={Bridging Language and Items for Retrieval and Recommendation},
author={Hou, Yupeng and Li, Jiacheng and He, Zhankui and Yan, An and Chen, Xiusi and McAuley, Julian},
journal={arXiv preprint arXiv:2403.03952},
year={2024}
}