1、标题PREDICTINGCUSTOMERCHURNINTHETELECOMMUNICATIONSINDUSTRYANAPPLICATIONOFSURVIVALANALYSISMODELINGUSINGSAS原文ABSTRACTCONVENTIONALSTATISTICALMETHODSEGLOGISTICSREGRESSION,DECISIONTREE,ANDETCAREVERYSUCCESSFULINPREDICTINGCUSTOMERCHURNHOWEVER,THESEMETHODSCOULDHARDLYPREDICTWHENCUSTOMERSWILLCHURN,ORHOWLONGTHEC
2、USTOMERSWILLSTAYWITHTHEGOALOFTHISSTUDYISTOAPPLYSURVIVALANALYSISTECHNIQUESTOPREDICTCUSTOMERCHURNBYUSINGDATAFROMATELECOMMUNICATIONSCOMPANYTHISSTUDYWILLHELPTELECOMMUNICATIONSCOMPANIESUNDERSTANDCUSTOMERCHURNRISKANDCUSTOMERCHURNHAZARDINATIMINGMANNERBYPREDICTINGWHICHCUSTOMERWILLCHURNANDWHENTHEYWILLCHURNTH
3、EFINDINGSFROMTHISSTUDYAREHELPFULFORTELECOMMUNICATIONSCOMPANIESTOOPTIMIZETHEIRCUSTOMERRETENTIONAND/ORTREATMENTRESOURCESINTHEIRCHURNREDUCTIONEFFORTSINTRODUCTIONINTHETELECOMMUNICATIONINDUSTRY,CUSTOMERSAREABLETOCHOOSEAMONGMULTIPLESERVICEPROVIDERSANDACTIVELYEXERCISETHEIRRIGHTSOFSWITCHINGFROMONESERVICEPRO
4、VIDERTOANOTHERINTHISFIERCELYCOMPETITIVEMARKET,CUSTOMERSDEMANDTAILOREDPRODUCTSANDBETTERSERVICESATLESSPRICES,WHILESERVICEPROVIDERSCONSTANTLYFOCUSONACQUISITIONSASTHEIRBUSINESSGOALSGIVENTHEFACTTHATTHETELECOMMUNICATIONSINDUSTRYEXPERIENCESANAVERAGEOF3035PERCENTANNUALCHURNRATEANDITCOSTS510TIMESMORETORECRUI
5、TANEWCUSTOMERTHANTORETAINANEXISTINGONE,CUSTOMERRETENTIONHASNOWBECOMEEVENMOREIMPORTANTTHANCUSTOMERACQUISITIONFORMANYINCUMBENTOPERATORS,RETAININGHIGHPROFITABLECUSTOMERSISTHENUMBERONEBUSINESSPAINMANYTELECOMMUNICATIONSCOMPANIESDEPLOYRETENTIONSTRATEGIESINSYNCHRONIZINGPROGRAMSANDPROCESSESTOKEEPCUSTOMERSLO
6、NGERBYPROVIDINGTHEMWITHTAILOREDPRODUCTSANDSERVICESWITHRETENTIONSTRATEGIESINPLACE,MANYCOMPANIESSTARTTOINCLUDECHURNREDUCTIONASONEOFTHEIRBUSINESSGOALSINORDERTOSUPPORTTELECOMMUNICATIONSCOMPANIESMANAGECHURNREDUCTION,NOTONLYDOWENEEDTOPREDICTWHICHCUSTOMERSAREATHIGHRISKOFCHURN,BUTALSOWENEEDTOKNOWHOWSOONTHES
7、EHIGHRISKCUSTOMERSWILLCHURNTHEREFORETHETELECOMMUNICATIONSCOMPANIESCANOPTIMIZETHEIRMARKETINGINTERVENTIONRESOURCESTOPREVENTASMANYCUSTOMERSASPOSSIBLEFROMCHURNINGINOTHERWORDS,IFTHETELECOMMUNICATIONSCOMPANIESKNOWWHICHCUSTOMERSAREATHIGHRISKOFCHURNANDWHENTHEYWILLCHURN,THEYAREABLETODESIGNCUSTOMIZEDCUSTOMERC
8、OMMUNICATIONANDTREATMENTPROGRAMSINATIMELYEFFICIENTMANNERCONVENTIONALSTATISTICALMETHODSEGLOGISTICSREGRESSION,DECISIONTREE,ANDETCAREVERYSUCCESSFULINPREDICTINGCUSTOMERCHURNTHESEMETHODSCOULDHARDLYPREDICTWHENCUSTOMERSWILLCHURN,ORHOWLONGTHECUSTOMERSWILLSTAYWITHHOWEVER,SURVIVALANALYSISWAS,ATTHEVERYBEGINNIN
9、G,DESIGNEDTOHANDLESURVIVALDATA,ANDTHEREFOREISANEFFICIENTANDPOWERFULTOOLTOPREDICTCUSTOMERCHURNOBJECTIVESTHEOBJECTIVESOFTHISSTUDYAREINTWOFOLDSTHEFIRSTOBJECTIVEISTOESTIMATECUSTOMERSURVIVALFUNCTIONANDCUSTOMERHAZARDFUNCTIONTOGAINKNOWLEDGEOFCUSTOMERCHURNOVERTHETIMEOFCUSTOMERTENURETHESECONDOBJECTIVEISTODEM
10、ONSTRATEHOWSURVIVALANALYSISTECHNIQUESAREUSEDTOIDENTIFYTHECUSTOMERSWHOAREATHIGHRISKOFCHURNANDWHENTHEYWILLCHURNDEFINITIONSANDEXCLUSIONSTHISSECTIONCLARIFIESSOMEOFTHEIMPORTANTCONCEPTSANDEXCLUSIONSUSEDINTHISSTUDYCHURNINTHETELECOMMUNICATIONSINDUSTRY,THEBROADDEFINITIONOFCHURNISTHEACTIONTHATACUSTOMERSTELECO
11、MMUNICATIONSSERVICEISCANCELEDTHISINCLUDESBOTHSERVICEPROVIDERINITIATEDCHURNANDCUSTOMERINITIATEDCHURNANEXAMPLEOFSERVICEPROVIDERINITIATEDCHURNISACUSTOMERSACCOUNTBEINGCLOSEDBECAUSEOFPAYMENTDEFAULTCUSTOMERINITIATEDCHURNISMORECOMPLICATEDANDTHEREASONSBEHINDVARYINTHISSTUDY,ONLYCUSTOMERINITIATEDCHURNISCONSID
12、EREDANDITISDEFINEDBYASERIESOFCANCELREASONCODESEXAMPLESOFREASONCODESAREUNACCEPTABLECALLQUALITY,MOREFAVORABLECOMPETITORSPRICINGPLAN,MISINFORMATIONGIVENBYSALES,CUSTOMEREXPECTATIONNOTMET,BILLINGPROBLEM,MOVING,CHANGEINBUSINESS,ANDSOONHIGHVALUECUSTOMERSONLYCUSTOMERSWHOHAVERECEIVEDATLEASTTHREEMONTHLYBILLSA
13、RECONSIDEREDINTHESTUDYHIGHVALUECUSTOMERSARETHESEWITHMONTHLYAVERAGEREVENUEOFXORMOREFORTHELASTTHREEMONTHSIFACUSTOMERSFIRSTINVOICECOVERSLESSTHAN30DAYSOFSERVICE,THENTHECUSTOMERMONTHLYREVENUEISPRORATEDTOAFULLMONTHSREVENUEGRANULARITYTHISSTUDYEXAMINESCUSTOMERCHURNATTHEACCOUNTLEVELEXCLUSIONSTHISSTUDYDOESNOT
14、DISTINGUISHINTERNATIONALCUSTOMERSFROMDOMESTICCUSTOMERSHOWEVERITISDESIRABLETOINVESTIGATEINTERNATIONALCUSTOMERCHURNSEPARATELYFROMDOMESTICCUSTOMERCHURNINTHEFUTUREALSO,THISSTUDYDOESNOTINCLUDEEMPLOYEEACCOUNTS,SINCECHURNFOREMPLOYEEACCOUNTSISNOTOFAPROBLEMORANINTERESTFORTHECOMPANYSURVIVALANALYSISANDCUSTOMER
15、CHURNSURVIVALANALYSISISACLANOFSTATISTICALMETHODSFORSTUDYINGTHEOCCURRENCEANDTIMINGOFEVENTSFROMTHEBEGINNING,SURVIVALANALYSISWASDESIGNEDFORLONGITUDINALDATAONTHEOCCURRENCEOFEVENTSKEEPINGTRACKOFCUSTOMERCHURNISAGOODEXAMPLEOFSURVIVALDATASURVIVALDATAHAVETWOCOMMONFEATURESTHATAREDIFFICULTTOHANDLEWITHCONVENTIO
16、NALSTATISTICALMETHODSCENSORINGANDTIMEDEPENDENTCOVARIATESGENERALLY,SURVIVALFUNCTIONANDHAZARDFUNCTIONAREUSEDTODESCRIBETHESTATUSOFCUSTOMERSURVIVALDURINGTHETENUREOFOBSERVATIONTHESURVIVALFUNCTIONGIVESTHEPROBABILITYOFSURVIVINGBEYONDACERTAINTIMEPOINTTHOWEVER,THEHAZARDFUNCTIONDESCRIBESTHERISKOFEVENTINTHISCA
17、SE,CUSTOMERCHURNINANINTERVALTIMEAFTERTIMET,CONDITIONALONTHECUSTOMERALREADYSURVIVEDTOTIMETTHEREFORETHEHAZARDFUNCTIONISMOREINTUITIVETOUSEINSURVIVALANALYSISBECAUSEITATTEMPTSTOQUANTIFYTHEINSTANTANEOUSRISKTHATCUSTOMERCHURNWILLTAKEPLACEATTIMETGIVENTHATTHECUSTOMERALREADYSURVIVEDTOTIMETFORSURVIVALANALYSIS,T
18、HEBESTOBSERVATIONPLANISPROSPECTIVEWEBEGINOBSERVINGASETOFCUSTOMERSATSOMEWELLDEFINEDPOINTOFTIMECALLEDTHEORIGINOFTIMEANDTHENFOLLOWTHEMFORSOMESUBSTANTIALPERIODOFTIME,RECORDINGTHETIMESATWHICHCUSTOMERCHURNSOCCURITSNOTNECESSARYTHATEVERYCUSTOMEREXPERIENCECHURNCUSTOMERSWHOAREYETTOEXPERIENCECHURNARECALLEDCENS
19、OREDCASES,WHILETHOSECUSTOMERSWHOALREADYCHURNEDARECALLEDOBSERVEDCASESTYPICALLY,NOTONLYDOWEPREDICTTHETIMINGOFCUSTOMERCHURN,WEALSOWANTTOANALYZEHOWTIMEDEPENDENTCOVARIATESEGCUSTOMERSCALLSTOSERVICECENTERS,CUSTOMERSCHANGEPLANTYPES,CUSTOMERSCHANGEBILLINGOPTIONS,ANDETCIMPACTTHEOCCURRENCEANDTIMINGOFCUSTOMERCH
20、URNSAS/STATHASTWOPROCEDURESFORSURVIVALANALYSISPROCLIFEREGANDPROCPHREGTHELIFEREGPROCEDUREPRODUCESPARAMETRICREGRESSIONMODELSWITHCENSOREDSURVIVALDATAUSINGMAXIMUMLIKELIHOODESTIMATIONTHEPHREGPROCEDUREISASEMIPARAMETRICREGRESSIONANALYSISUSINGPARTIALLIKELIHOODESTIMATIONPROCPHREGHASGAINEDPOPULARITYOVERPROCLI
21、FEREGINTHELASTDECADESINCEITHANDLESTIMEDEPENDENTHOWEVERIFTHESHAPESOFSURVIVALDISTRIBUTIONANDHAZARDFUNCTIONAREKNOWN,PROCLIFEREGPRODUCESMOREEFFICIENTESTIMATESWITHSMALLERSTANDARDERRORTHANPROCPHREGDOESSAMPLINGSTRATEGYONAUGUST16,2000,ASAMPLEOF41,374ACTIVEHIGHVALUECUSTOMERSWASRANDOMLYSELECTEDFROMTHEENTIRECU
22、STOMERBASEFROMATELECOMMUNICATIONSCOMPANYALLTHESECUSTOMERWEREFOLLOWEDFORTHENEXT15MONTHSTHEREFOREAUGUST16,2000ISTHEORIGINOFTIMEANDNOVEMBER15,2001ISTHEOBSERVATIONTERMINATIONTIMEDURINGTHIS15MONTHOBSERVATIONPERIOD,THETIMINGOFCUSTOMERCHURNWASRECORDEDFOREACHCUSTOMERINTHESAMPLE,AVARIABLEOFDURISUSEDTOINDICAT
23、ETHETIMETHATCUSTOMERCHURNOCCURRED,ORFORCENSOREDCASES,THELASTTIMEATWHICHCUSTOMERSWEREOBSERVED,BOTHMEASUREDFROMTHEORIGINOFTIMEAUGUST16,2000ASECONDVARIABLEOFSTATUSISUSEDTODISTINGUISHTHECENSOREDCASESFROMOBSERVEDCASESITISCOMMONTOHAVESTATUS1FOROBSERVEDCASESANDSTATUS0FORCENSOREDCASESINTHISSTUDY,THESURVIVAL
24、DATAARESINGLYRIGHTCENSOREDSOTHATALLTHECENSOREDCASESHAVEAVALUEOF15MONTHSFORTHEVARIABLEDURDATASOURCESTHEREAREFOURMAJORDATASOURCESFORTHISSTUDYBLOCKLEVELMARKETINGANDFINANCIALINFORMATION,CUSTOMERLEVELDEMOGRAPHICDATAPROVIDEDTHROUGHATHIRDPARTYVENDOR,CUSTOMERINTERNALDATA,ANDCUSTOMERCONTACTRECORDSABRIEFDESCR
25、IPTIONOFSOMEOFTHEDATASOURCESFOLLOWSDEMOGRAPHICDATADEMOGRAPHICDADAISFROMATHIRDPARTYVENDORINTHISSTUDY,THEFOLLOWINGAREEXAMPLESOFCUSTOMERLEVELDEMOGRAPHICINFORMATIONPRIMARYHOUSEHOLDMEMBERSAGEGENDERANDMARITALSTATUSNUMBEROFADULTSPRIMARYHOUSEHOLDMEMBERSOCCUPATIONHOUSEHOLDESTIMATEDINCOMEANDWEALTHRANKINGNUMBE
26、ROFCHILDRENANDCHILDRENSAGENUMBEROFVEHICLESANDVEHICLEVALUECREDITCARDFREQUENTTRAVELERRESPONDERTOMAILORDERSDWELLINGANDLENGTHOFRESIDENCECUSTOMERINTERNALDATACUSTOMERINTERNALDATAISFROMTHECOMPANYSDATAWAREHOUSEITCONSISTSOFTWOPARTSTHEFIRSTPARTISABOUTCUSTOMERINFORMATIONLIKEMARKETCHANNEL,PLANTYPE,BILLAGENCY,CU
27、STOMERSEGMENTATIONCODE,OWNERSHIPOFTHECOMPANYSOTHERPRODUCTS,DISPUTE,LATEFEECHARGE,DISCOUNT,PROMOTION/SAVEPROMOTION,ADDITIONALLINES,TOLLFREESERVICES,REWARDSREDEMPTION,BILLINGDISPUTE,ANDSOONTHESECONDPARTOFCUSTOMERINTERNALDATAISCUSTOMERSTELECOMMUNICATIONSUSAGEDATAEXAMPLESOFCUSTOMERUSAGEVARIABLESAREWEEKL
28、YAVERAGECALLCOUNTSPERCENTAGECHANGEOFMINUTESSHAREOFDOMESTIC/INTERNATIONALREVENUECUSTOMERCONTACTRECORDSTHECOMPANYSCUSTOMERINFORMATIONSYSTEMCISSTORESDETAILEDRECORDSOFCUSTOMERCONTACTSTHISBASICALLYINCLUDESCUSTOMERCALLSTOSERVICECENTERSANDTHECOMPANYSMAILCONTACTSTOCUSTOMERSTHECUSTOMERCONTACTRECORDSARETHENCL
29、ASSIFIEDINTOCUSTOMERCONTACTCATEGORIESAMONGTHECUSTOMERCONTACTCATEGORIESARECUSTOMERGENERALINQUIRY,CUSTOMERREQUESTSTOCHANGESERVICE,CUSTOMERINQUIRYABOUTCANCEL,ANDSOONMODELINGPROCESSMODELPROCESSINCLUDESTHEFOLLOWINGFOURMAJORSTEPSEXPLANATORYDATAANALYSISEDAEXPLANATORYDATAANALYSISWASCONDUCTEDTOPREPARETHEDATA
30、FORTHESURVIVALANALYSISANUNIVARIATEFREQUENCYANALYSISWASUSEDTOPINPOINTVALUEDISTRIBUTIONS,MISSINGVALUESANDOUTLIERSVARIABLETRANSFORMATIONWASCONDUCTEDFORSOMENECESSARYNUMERICALVARIABLESTOREDUCETHELEVELOFSKEWNESS,BECAUSETRANSFORMATIONSAREHELPFULTOIMPROVETHEFITOFAMODELTOTHEDATAOUTLIERSAREFILTEREDTOEXCLUDEOB
31、SERVATIONS,SUCHASOUTLIERSOROTHEREXTREMEVALUESTHATARESUGGESTEDNOTTOBEINCLUDEDINTHEDATAMININGANALYSISFILTERINGEXTREMEVALUESFROMTHETRAININGDATATENDSTOPRODUCEBETTERMODELSBECAUSETHEPARAMETERESTIMATESAREMORESTABLEVARIABLESWITHMISSINGVALUESARENOTABIGISSUE,EXCEPTFORTHOSEDEMOGRAPHICVARIABLESTHEDEMOGRAPHICVAR
32、IABLESWITHMORETHAN20OFMISSINGVALUESWEREELIMINATEDFOROBSERVATIONSWITHMISSINGVALUES,ONECHOICEISTOUSEINCOMPLETEOBSERVATIONS,BUTTHATMAYLEADTOIGNOREUSEFULINFORMATIONFROMTHEVARIABLESTHATHAVENONMISSINGVALUESITMAYALSOBIASTHESAMPLESINCEOBSERVATIONSTHATHAVEMISSINGVALUESMAYHAVEOTHERTHINGSINCOMMONASWELLTHEREFOR
33、E,INTHISSTUDY,MISSINGVALUESWEREREPLACEDBYAPPROPRIATEMETHODSFORINTERVALVARIABLES,REPLACEMENTVALUESWERECALCULATEDBASEDONTHERANDOMPERCENTILESOFTHEVARIABLESDISTRIBUTION,IE,VALUESWEREASSIGNEDBASEDONTHEPROBABILITYDISTRIBUTIONOFTHENONMISSINGOBSERVATIONSMISSINGVALUESFORCLASSVARIABLESWEREREPLACEDWITHTHEMOSTF
34、REQUENTVALUESCOUNTORMODEVARIABLEREDUCTIONSTARTEDWITH212VARIABLESINTHEORIGINALDATASET,BYUSINGPROCFREQ,ANINITIALUNIVARIATEANALYSISOFALLCATEGORICALVARIABLESCROSSEDWITHCUSTOMERCHURNSTATUSSTATUSWASCARRIEDOUTTODETERMINETHESTATISTICALLYSIGNIFICANTCATEGORICALVARIABLESTOBEINCLUDEDINTHENEXTMODELINGSTEPALLTHEC
35、ATEGORICALVARIABLESWITHACHISQUAREVALUEORTSTATISTICSOF005ORLESSWEREKEPTTHISSTEPREDUCEDTHENUMBEROFVARIABLESTO115MODELDURSTATUS0MODELESTIMATIONWITHONLY29EXPLORATORYVARIABLES,THEFINALDATASETHASREASONABLENUMBEROFVARIABLESTOPERFORMSURVIVALANALYSISBEFOREAPPLYINGSURVIVALANALYSISPROCEDURESTOTHEFINALDATASET,T
36、HECUSTOMERSURVIVALFUNCTIONANDHAZARDFUNCTIONWEREESTIMATEDUSINGTHEFOLLOWINGCODETHEPURPOSEOFESTIMATINGCUSTOMERSURVIVALFUNCTIONANDCUSTOMERHAZARDFUNCTIONISTOGAINKNOWLEDGEOFCUSTOMERCHURNHAZARDCHARACTERISTICSFROMTHESHAPEOFHAZARDFUNCTION,CUSTOMERCHURNINTHISSTUDYDEMONSTRATESATYPICALHAZARDFUNCTIONOFALOGNORMAL
37、MODELASPREVIOUSLYDISCUSSED,SINCETHESHAPEOFSURVIVALDISTRIBUTIONANDHAZARDFUNCTIONWASKNOWN,PROCLIFEREGPRODUCESMOREEFFICIENTESTIMATESWITHSMALLERSTANDARDERRORTHANPROCPHREGDOESPROCLIFETESTDATASASOUT2ALL3OUTSURVSASOUT2OUTSURVMETHODLIFEPLOTS,HWIDTH1GRAPHICSTIMEDURSTATUS0RUNTHEFINALSTEPISTOESTIMATECUSTOMERCH
38、URNPROCLIFEREGWASUSEDTOCALCULATECUSTOMERSURVIVALPROBABILITYATTHISSTEPTHEFINALDATASETWASDIVIDED50/50INTOTWODATASETSMODELDATASETANDVALIDATIONDATASETTHEMODELDATASETISUSEDTOFITTHEMODELANDTHEVALIDATIONDATASETISUSEDTOSCORETHESURVIVALPROBABILITYFOREACHCUSTOMERAVARIABLEOFUSEISUSEDTODISTINGUISHTHEMODELDATASE
39、TSETUSE0ANDVALIDATIONDATASETSETUSE1INTHEVALIDATIONDATASET,SETBOTHDURANDSTATUSMISSINGSOTHATCASESINTHEVALIDATIONDATASETWERENOTTOBEUSEDINMODELESTIMATION出处JUNXIANGLU,PHDPREDICTINGCUSTOMERCHURNINTHETELECOMMUNICATIONSINDUSTRYANAPPLICATIONOFSURVIVALANALYSISMODELINGUSINGSASSASUSERGROUPINTERNATIONALSUGI27ONL
40、INEPROCEEDINGS,2002,PAPERNO11427译文预测电信行业客户流失基于一种SAS生存分析模式的应用程序JUNXIANGLU,PHDSPRINTCOMMUNICATIONSCOMPANYOVERLANDPARK,KANSAS摘要传统的统计方法(如LOGISTIC回归,决策树等等)都是能非常成功的预测客户流失的。但是,这些方法是很难预测什么时候客户会流失,或者这些客户还能保留多久。这项研究的目的是运用生存分析技术通过使用来自电信公司的数据来预测客户流失。这项研究将会帮助电信公司了解客户流失的风险和通过预测那些和何时客户将要流失的一种时间方式的危害。这一研究的结果有助于电信公司
41、优化客户的保留和(或)处理资源来努力降低他们的客户流失。引言在电信行业,客户可以在多个提供服务的供应者中进行选择,积极运用他们从一个服务供应商转换到另一个供应商的权利。在这个竞争激烈的市场,客户需要用低价格获得的按要求特质非产品和更好的服务,服务的供应商要不断的专注于收购作为他们的业务目标。鉴于电信业的经验是3035的平均客户流失率,开发一个新客户的成本是保留原有客户成本的510倍。对于许多老牌的运营商,企业的主要头痛的是留住高利润的客户。许多电信公司在协调方案和过程时使用保持战略通过提供量身定做的产品和服务来更长时间的保持客户。随着各地方使用客户保持战略,很多公司开始把降低客户流失作为他们业
42、务的目标之一。为了支持电信企业管理客户流失的减少,我们不仅需要预测那些客户存在流失的高风险,还需要知道什么时候这些高风险的客户要流失。因此,电信公司优化了其市场营销的资源来防止很多可能的客户流失。换句话说,如果电信公司知道他们的客户有流失的高风险和什么时候他们将要流失,他们就设计出与客户即使有效的交流沟通的方案。传统的统计方法(如LOGISTIC回归,决策树等等)都是能非常成功的预测客户流失的。但是,这些方法是很难预测什么时候客户会流失,或者这些客户还能保留多久。然而,生存分析的最初设计是用于处理存在的数据,因此是预测客户流失的一种有效和强大的工具。目标这项预测研究的目标有两个。第一个目标是为
43、了建立客户生存函数和客户风险函数来获取在客户的任期时间的客户流失的知识。第二个目标是演示用来识别那些是高风险流失的客户和什么时候他们将要流失的生存分析技术。定义和排除本问澄清一些重要的概念和排除在本次研究之外的使用。流失在电信含有,客户流失的广泛定义是指一个客户的电信服务被取消了。这包括服务提供者引发的客户流失,和客户主动的流失。一个服务提供者引发的客户流失的例子有客户的账户因为客户欠费被关闭。客户主动流失就比较复杂,流失的原因也是不同的。在这项研究中只研究客户的主动流失,它被定义为由一系列取消原因代码,原因代码的举例有不能接受通话质量,竞争对手的更优惠的定价计划,在销售中误传了信息,客户的期
44、望得不到满足,计费问题,移动,业务上的变化等等。高价值客户仅仅只那些已经接受至少有三个月账单的客户。高价值客户是那些在过去三个月每个月平均收益在X美元或以上的客户。如果客户的第一张发票少于30天的服务,那么客户的每个月的收益是按比例分配到一个整月的收入。尺度本研究讨论关于账户的客户流失率排除这项研究没有区分国内客户和国际客户,实际上把国际客户流失从国内客户流失中分开是值得做的。此外,这项研究不包括员工的账户,因为员工账户的流失不只是一个问题或是企业的一种权利。生存分析和客户流失生存分析是为学习发生的事情和实时的事件的一种统计研究方法。从一开始,生存分析对发生的事件的设计纵向数据。对客户流失的跟
45、踪时一个生存数据的很好的例子。生存数据有两个共同的特点,很难用传统的统计方法处理审查和时间上的依赖性变量。一般情况下,生存函数和风险函数是用来描述在任期间观察客户存在的状态。生存函数给出了超过一定时间T的存在概率,而风险寒素描述在间隔时间T的事件风险(在这种情况下,客户流失)在时间T后的一段间隔时间,在时间T中考虑已经生存下来的客户。因此,风险功能更直观的在生存分析中的使用,因为它试图把风险量化,客户流失将在这个客户存货的时间T内发生。为了生存分析,最佳观测计划是有前瞻性,我开始观测在一些时间定义的明确点(成为时间的起源)的客户集,然后按照相当长的一段时间记录在那时间所发生的客户流失。每个客户
46、体验流失(客户没有体验流失被称为审查情况,这些客户已经流失的称为观察情况)是不必要的。通常情况下,我们不仅预测客户流失的时间,我们也需要分析如何随着时间变化(如客户服务呼叫中心,客户变更计划类型,客户改变结算方式等)发生和时间影响流失的客户。SAS/STAT对生存分析有两个程序LIFEREG程序和PHREG程序。LIFEREG程序产生的参数回归模式对生存分析的数据使用最大可能的估计。PHREG过程时一个半参数回归分析使用部分可能的估计。PHREG程序在过去的十年里依赖它处理的时间性,已经获得了的普及超过LIFEREG程序。但是,如果生存分布和风险函数的形状是已知的,LIFEREG程序比PHRE
47、G程序更有效的估计(标准误差较小)。抽样策略2000年8月16日,41374活动的高价值客户的样本是从整个电信公司的客户群中随机挑选的。所有的客户在未来的15个月的跟随,2000年8月16日是时间的起点,2001年11月15日时观察的终止时间。在这15个月的观察期,客户流失的时间被记录。对于样本中的每一个客户,一个变量的总指数是用来表示在客户流失情况或者审查情况下的时间,最后一次客户进行观察,从开始的时间(2000年8月16日)进行测量。第二个变量状态是用来区分审查情况和观察情况的。在观察情况下状态1和在审查情况下状态0都是常见的。在这项研究中,生存数据是单独正确的审查情况,所有的审查情况有1
48、5个(月)有价值的总指数为变量值。资料来源这里有四个主要数据来源的研究数据营销和财务信息,客户水平,通过第三方的供应商提供的人口统计数据,客户内部数据和客户联系记录。一个数据源的一些简要说明如下。人口数据人口数据时来自第三方的厂商。在这项研究中,以下是客户级别的人口信息的例子小学家庭成员的年龄性别和婚姻状况成人人数小学家庭成员的职业家用估计收入和财富排名儿童和儿童人数的年龄车辆辆数和车辆价值信用卡频繁游客有响应的邮件订单住宅与居住期限客户内部数据客户内部数据是从该公司的数据仓库得到的。它由两部分组成。第一部分是关于客户如市场渠道,计划的类型,票据代理,客户细分的代码,该公司的其他产品的所有权,
49、纠纷,滞纳金费用,折扣,促销信息/保存推广,额外的线路,免费服务,奖励赎回,结算纠纷等等。对客户内部数据的第二个部分是客户的电信使用数据。客户使用变量的例子有每周平均通话次数会议纪要变动百分率应占的国内/国际业务收入客户联系记录该公司的客户信息系统(CIS)存储客户接触的详细记录。这基本上包括客户呼叫服务中心和公司的邮件往来的客户。客户联系记录为客户联系的类别分类。其中客户联系客户类别有一般查询,客户要求变更服务,客户查询有关取消等等。模型建立过程模型建立的过程包括以下四个主要步骤。说明资料分析(EDA)说明数据进行分析,以备生存分析的数据。一个的频率分析被使用于精确值分布,遗漏值和离群值。变量变换进行了一些必要的数字变量,以减少偏度水平,因为有利于提高转换一种模式适合数据。离群的筛选,以排除如离群或其他不建议在数据挖掘分析包括极端值的观察。从训练数据筛选极端值往往会产生更好的模型,因为参数估计更稳定。变量有遗漏值不是一个大问题,除了这些人口统计变数。超过20的人口遗漏值的变量被淘汰。对于遗漏值的观察,一个选择是使用不完整的意见,但可能导致忽略的变量有没有遗漏价值的有用信息。它也可能带有偏见的误差样本,因为意见有遗漏值在其他中可能有共同的东西。因此,在这项研究中,遗漏值改为适当的方法。对于区间变量,重置价值计算依据变量的分