100字范文 > 数据运营36计(六)：BG/NBD概率模型预测用户生命周期LTV Python实现

数据运营36计(六)：BG/NBD概率模型预测用户生命周期LTV Python实现

时间：2019-03-02 00:47:20

1. BG/NBD概率模型

BG/NBD模型又称为贝塔几何/负二项模型。他是基于Pareto/NBD模型假设设计的概率预测模型。BG/NBD模型是用于描述非契约客户关系情境下重复购买行为。即用户可以随时购买产品，无时间约束。该模型可利用用户历史交易数据(RFM)来预测未来每个用户的交易次数和流失率，BG/NBD模型主要有以下五个假设：

假设1：用户在活跃状态下，一个用户在时间段t内完成的交易数量服从均值为λt的泊松分布。泊松分布的PDF函数如下，j为交易次数。比如一个顾客在λ=5时的概率分布图如下所示：

import matplotlib.pyplot as pltfrom scipy.stats import poissonprobability_arr = []distribution = poisson(5)for transactions in range(0,20):probability_arr.append(distribution.pmf(transactions))plt.figure(figsize=(8,5))plt.ylabel("Probability")plt.xlabel("Number of Transactions")plt.xticks(range(0, 20))plt.title("Probability Distribution Curve")plt.plot(probability_arr, color="black", linewidth=0.7, zorder=1)plt.scatter(range(0, 20), probability_arr, color="purple", edgecolor="black", linewidth=0.7, zorder=2)plt.show()

假设2：用户的交易率λ服从形状参数为r，逆尺度参数为α的gamma分布，PDF函数如下所示。比如100个用户的交易率λ服从r=9.0，α=0.5的gamma分布时，这100个用户的泊松分布图如下所示：

import numpy as npplt.figure(figsize=(8,5))for customer in range(0, 100): distribution = poisson(np.random.gamma(shape=9, scale=0.5)) probability_arr = [] for transactions in range(0,20): probability_arr.append(distribution.pmf(transactions)) plt.plot(probability_arr, color="black", linewidth=0.7, zorder=1)plt.ylabel("Probability")plt.xlabel("Number of Transactions")plt.xticks(range(0, 20))plt.title("Probability Distribution Curve 100 Customers")plt.show()

假设3：每个用户在交易j完成后流失的概率服从参数为p(流失率)的几何分布，PDF函数如下所示。

假设4：用户的流失率p服从形状参数为a，b的beta分布，PDF函数如下所示。比如100个用户的流失率p服从a=1.0，b=2.5的beta分布时，这100个用户的几何分布图如下所示：

import numpy as npplt.figure(figsize=(8,5))for customer in range(0, 100): distribution = poisson(np.random.gamma(shape=9, scale=0.5)) probability_arr = [] beta = np.random.beta(a=1.0, b=2.5) cumulative_beta = 0 for transactions in range(0,20): proba = distribution.pmf(transactions) cumulative_beta = beta + cumulative_beta - (beta * cumulative_beta) inactive_probability = 1 - cumulative_beta proba *= inactive_probability probability_arr.append(proba) probability_arr = np.array(probability_arr) probability_arr /= probability_arr.sum() plt.plot(probability_arr, color="black", linewidth=0.7, zorder=1)plt.ylabel("Probability")plt.xlabel("Number of Transactions")plt.xticks(range(0, 20))plt.title("Probability Distribution Curve 100 Customers with drop-off probability after each transaction")plt.show()

假设5：每个用户的交易率λ和流失率p互相独立。

2.用户生命周期预测，Python实现以上已介绍完BG/NBD模型的基本假设，接下来推导模型，用户的历史数据有三个参数，其中x表示0到T时间段内的交易次数，tx表示第x次交易时间。

数据展示：

模型原理：首先gamma分布和beta分布分别为参数交易率λ和流失率p的先验分布，而泊松分布和几何分布是样本的分布函数，即似然函数。接下来建立交易率λ和流失率p的联立似然函数，使用Nelder-Mead的单纯形算法求解gamma分布和beta分布中的参数(r，α，a，b)，这是一种启发式的，非梯度搜索方法来最小化负对数似然代价函数。

用代码实现上面的似然函数如下：

from scipy.special import gammalndef negative_log_likelihood(params, x, t_x, T): if np.any(np.asarray(params) <= 0): return np.inf r, alpha, a, b = params ln_A_1 = gammaln(r + x) - gammaln(r) + r * np.log(alpha) ln_A_2 = (gammaln(a + b) + gammaln(b + x) - gammaln(b) - gammaln(a + b + x)) ln_A_3 = -(r + x) * np.log(alpha + T) ln_A_4 = x.copy() ln_A_4[ln_A_4 > 0] = ( np.log(a) - np.log(b + ln_A_4[ln_A_4 > 0] - 1) - (r + ln_A_4[ln_A_4 > 0]) * np.log(alpha + t_x) ) delta = np.where(x>0, 1, 0) log_likelihood = ln_A_1 + ln_A_2 + np.log(np.exp(ln_A_3) + delta * np.exp(ln_A_4)) return -log_likelihood.sum()

参数求解代码如下：

from scipy.optimize import minimizescale = 1 / df["T"].max()scaled_recency = df["t_x"] * scalescaled_T = df["T"] * scaledef _func_caller(params, func_args, function): return function(params, *func_args)current_init_params = np.array([1.0, 1.0, 1.0, 1.0])output = minimize( _func_caller, method="Nelder-Mead", tol=0.0001, x0=current_init_params, args=([df["x"], scaled_recency, scaled_T], negative_log_likelihood), options={"maxiter": 2000})r = output.x[0]alpha = output.x[1]a = output.x[2]b = output.x[3]alpha /= scaleprint("r = {}".format(r))print("alpha = {}".format(alpha))print("a = {}".format(a))print("b = {}".format(b))

计算得到r=0.243，α=4.414，a=0.793，b=2.426.

用户总交易次数预测

接下来通过使用上面的四个参数建立预测模型，即求解交易次数的期望E(x)。2F1为高斯超几何函数。

from scipy.special import hyp2f1def expected_sales_to_time_t(t): hyp2f1_a = r hyp2f1_b = b hyp2f1_c = a + b - 1 hyp2f1_z = t / (alpha + t) hyp_term = hyp2f1(hyp2f1_a, hyp2f1_b, hyp2f1_c, hyp2f1_z)

return ((a + b - 1) / (a - 1)) * (1-(((alpha / (alpha+t)) ** r) * hyp_term))

expected_sales_to_time_t(78)

现在通过以上代码预测未来78周单个用户的交易次数为1.858次，但计算E(x)为总的用户购买总次数，这里不能简单的通过单个用户交易次数乘以总用户数得到，因为每个用户的第一次交易时间点不同。这里统计有ns个用户在第s天进行了第一次交易。那么这ns个用户在未来某段时间内的交易次数是相同的。

# Period of consideration is 39 weeks.# T indicates the length of time since first purchasen_s = (39 - df["T"]).value_counts().sort_index()n_s.head()

数据得到18个用户在第0.14周进行了第一次交易，22个用户在第0.285周进行了第一次交易，17个用户在第0.428周进行了第一次交易。那么接下来只需要计算每个交易时间下的未来交易量再乘以相同交易时间下的用户数，最后求和即可得到总的交易次数。

## 1/7.0 is a day.

forecast_range = np.arange(0, 78, 1/7.0)

def cumulative_repeat_transactions_to_t(t): expected_transactions_per_customer = (t - n_s.index).map(lambda x: expected_sales_to_time_t(x) if x > 0 else 0) expected_transactions_all_customers = (expected_transactions_per_customer * n_s).values return expected_transactions_all_customers.sum()cum_rpt_sales = pd.Series(map(cumulative_repeat_transactions_to_t, forecast_range), index=forecast_range)cum_rpt_sales.tail(10)

最后算出接下来78周用户的交易总次数为4156次。通过了解到未来用户的交易次数，可以计算未来的收入，从而在现阶段，运营人员可以更好地计算推广预算制定相应的运营推广方案。

每个用户交易次数的条件预测

为预测每个用户在未来一段时间内的交易次数，这里推导出条件期望，根据用户历史的交易次数和交易时间数据，并根据上面得到的分布函数参数值，条件期望的最终计算公式如下所示：

def calculate_conditional_expectation(t, x, t_x, T): first_term = (a + b + x - 1) / (a-1) hyp2f1_a = r + x hyp2f1_b = b + x hyp2f1_c = a + b + x - 1 hyp2f1_z = t / (alpha + T + t) hyp_term = hyp2f1(hyp2f1_a, hyp2f1_b, hyp2f1_c, hyp2f1_z) second_term = (1 - ((alpha + T) / (alpha + T + t)) ** (r + x) * hyp_term) delta = 1 if x > 0 else 0 denominator = 1 + delta * (a / (b + x - 1)) * ((alpha + T) / (alpha + t_x)) ** (r+x)

return first_term * second_term / denominator

calculate_conditional_expectation(39,2,30.43,38.86)

通过上面的函数代入某用户过去38.86周内(T=38.86)交易两次(x=2)，第二次交易时间为30.43周(tx=30.43)的条件，计算得到该用户在未来39周将交易1.226次。通过预测出每个用户未来的交易次数，可以更针对性地细分用户人群，找准目标价值人群从而制定细分运营方案，比如未来一年52周用户预测出将交易1次，那么该用户有流失的风险，那么在现阶段实施促销方案（如发放促销卡），提高用户的交易频次将减小用户流失的风险。

4.学习资料论文推荐：代码中都直接计算最终的预测公式，公式的推导过程可详细查看该论文。/papers/018/fader_et_al_mksc_05.pdfPython语言lifetimes库：/project/Lifetimes/

参考博客：/bg-nbd-model-for-customer-base-analysis-in-python/

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。