首页   >   新闻   >   文章

天池大赛 | 淘宝母婴品类数据分析
- 2023 -
05/21
06:28
零号员工
发表时间:2023.05.21     作者:Jingyi     来源:ShoelessCai     阅读:111

本人的最近工作,初步数据分析《 tianchi_mum_baby_20230519 》,点击阅读。

另外,关于母婴品类被购买的概率预测,依据比赛官方文档建模。点击阅读《 淘宝母婴品类商品被购买概率预测 》。

Features for the Classifier

Category features

Many user actions are related to some items. Given a behavior sequence, we obtain a set of product IDs and could use a bag of IDs as features. However this will lead to poor performance because of the sparsity issue. Besides, the set of items available in our system are changing frequently over time. To alleviate this problem, were present each item using its category. All items in Taobao are mapped to a category hierarchy. We expect consumer purchasing behavior patterns at different stages are more obvious on the categorical level. For example, a pregnant women is more likely to browse or buy products in Maternity Cloths and Pregnant Bras categories. A new parent is likely to browse Baby stroller category. Figure 5 shows how purchasing ratio of top level categories relates to baby age. We utilize all parent categories and first level categories an item belongs to as features. In order to reduce the influence of popular categories, the feature weight is set based on the TFIDF equation commonly used in the Information Retrieval community. To do so, each user is treated as a document and each category is treated as a term.

Queries

A user behavior sequence may contain user product search activities. User search queries can directly reflect users' requirements, which can indirectly reflect a baby's age. For example, a user may search “large-size diaperor 3 years old children's garments”. This information is very important for age prediction. Therefore, search queries in E-commerce are also utilized as features. In our system, search queries are pre-processed using Chinese word segmentation and stop words removing techniques, and represented as word vectors.

Product Property Features

Taobao is a distributed market space where sellers sell products. Many sellers provide meta data about the products. For example, a seller will label size as “M” or “L” on children clothes, or label Age as “newborn” or “1-3 years”.etc. Each product facet value pair (i.e. a product property) is associated with an unique feature.

Product title features

Product titles are created by sellers who are not affiliated with Taobao. Sellers are very creative about product titles. As a result, many product titles can be very informative about the life stage of the consumer. For example, “Diaper for new born” on a product title suggests the product is targeting for consumers with new babies. We preprocess these product titles via Chinese segmentation and phrase extraction techniques. Most of the extracted words are used as features, except stop words and words with very high inverse-document frequency.

Temporal Effect of Features

Whether a user has purchased a diaper 1 month ago or 2 years ago has defent meanings in terms of predicting the current life stage. To capture the temporal patterns, we divide each consumer behavior time sequence X∗ into multiple subsequences with fixed size time windows. A feature vector for each time window is generated. Then we concatenate all the feature vectors into a big feature vector to represent X∗. Figure 6 illustrates an example of feature matrix, where each row corresponds to a training data point.







阅读题目直播















原文链接

长按/扫码,有您的支持,我们会更加努力!







TOP 5 精选
回到顶部   回上一级
写文章

最新资讯




直播笔记


热点话题


精品论文


有你的鼓励
ShoelessCai 将更努力





文档免费。保护知识产权,保护创新。