神秘的本福特定律.doc

上传人:sk****8 文档编号:3158424 上传时间:2019-05-23 格式:DOC 页数:9 大小:150KB
下载 相关 举报
神秘的本福特定律.doc_第1页
第1页 / 共9页
神秘的本福特定律.doc_第2页
第2页 / 共9页
神秘的本福特定律.doc_第3页
第3页 / 共9页
神秘的本福特定律.doc_第4页
第4页 / 共9页
神秘的本福特定律.doc_第5页
第5页 / 共9页
点击查看更多>>
资源描述

1、神秘的本福特定律physixfan 2010-10-31 21:25 统计一下世界上 237 个国家的人口数量,你觉得其中以 1 开头的数会占多大比例,而以 9 开头的数又占多大比例呢?如果你的回答是都为 1/9,恭喜你你是正常人,但是事实却不是如此:以 1 开头的数惊人的占到了 27%,而以 9 开头的数却只占 5%。下图可以很形象的展示出在各国人口数量问题上,以各个数字开头的数占了多大的比例(图片来自维基百科)。为什么会相差这么大呢?这正是神秘的本福特定律在起作用。本福特定律,也称为本福德法则,说明一堆从实际生活得出的数据中,以 1 为首位数字的数的出现机率约为总数的三成,接近期望值 1/

2、9 的 3 倍,推广来说,越大的数字,以它为首几位的数出现的机率就越低;精确地数学表述为:在 b进位制中,以数 n 起头的数出现的机率为 logb(n + 1) logb(n)。在十进制中,首位数字出现的概率为:d 1 2 3 4 5 6 7 8 9d 1 2 3 4 5 6 7 8 9p 30.1% 17.6% 12.5% 9.7% 7.9% 6.7% 5.8% 5.1% 4.6%这个定律的发现,据说是因为本福特在翻对数表的时候发现前面几页被翻得很黑很破烂,越往后越颜色越浅。由此他想到会不会是 1 开头的数字就是比其他数多,他统计了一下发现果然如此。其实这个对数表的事情真假难辨了,就像是牛顿

3、说自己是被苹果砸到了头才发现的万有引力定律一样,只要最后的定律有用就可以了。首先说明一下本福特定律的适用范围这个定律是一个非常神奇的定律,它的适用范围异常的广泛,几乎所有日常生活中没有人为规则的统计数据都满足这个定律。比如说世界各国人口数量、各国国土面积、账本、物理化学常数、数学物理课本后面的答案、放射性半衰期等等数据居然都符合本福特定律。值得一提的是,科学家还发现,统计物理的三个重要分布,Boltzmann-Gibbs 分布,Bose-Einstein 分布,Fermi-Dirac 分布,也基本上满足 Benford 定律!(来源:李淼的 博客 )其次这个定律毕竟还是有适用范围的第一,这些数

4、据必须跨度足够大,必须横跨好几个数量级才能产生这个结果。第二,有人为规则的数据就不满足次定律,比如说手机号码、身份证号、发票编号等数据,明显不满足这种对数分布律。也就是说,本福特定律正是没有任何限制才显露出来的定律,越是对数据的产生有人为限制,越是不满足该定律。第三,数据不能经过人为修饰,随便人为修改的数据一般就不满足本福特定律了,比如当年著名的安然公司造假案,他们的账本就没有满足本福特定律,因此这个神秘的定律甚至可以用来判别是否财务造假。那么到底该如何理解这个神秘的定律呢?为何自然产生的数据会满足这么奇特的一个定律,而不是均匀分布呢?本福特定律产生的根源,就在于指数增长。这幅图可以直观的显示

5、,如果一个变量随时间成指数增长的话,那么这个变量开头的数字随着时间的变化就应该是如下图:(横轴代表时间,纵轴代表那个变量)显然,在某时刻你得到它以 1 开头的概率要大于 9 开头。而这是只取一个值的情况,如果是取大量的数据的话,在某时刻你观察到他以 1 开头的数据数量就大于以 9 开头的数量了。而指数增长的形式在自然界是十分普遍的,只要一个变量的增长率和他的大小成正比,结果就会是指数增长。比如说人类科技发展的速度大致和已有的科技成果成正比,所以人类的科技发展就是个指数增长;人口增长率会和已存在人口数成正比,因此没有资源限制的人口增长也是指数增长。指数增长是自然中极为普遍的一种变化规律,而这种变

6、化规律可以直接导致本福特定律。另外一种直观的解释(来自维基百科)是这样的从数数目来说,顺序从 1 开始数,1,2,3,9 ,从这点终结的话,所有数起首的机会似乎相同,但 9 之后的两位数 10 至 19,以 1 起首的数又大大抛离了其他数了。而下一堆 9 起首的数出现之前,必然会经过一堆以 2,3,4,8 起首的数。若果这样数法有个终结点,以 1 起首的数的出现率一般都比 9 大。就以一个城市的所有门牌号为例,有的街道门牌号可能在 100 多就结束了,有的在 500 多结束,有的在 900 多结束。注意到 500 多结束那条街一定包含了1、10+和 100199 这些 1 开头的门牌号,而不包

7、含 9 开头的百位数,只包含9 及 90+的以 9 开头的数,这样一来明显以 1 打头的就多于 9 打头的了。然后对整个城市的所有街道做一个综合,最终就满足本福特定律了。以上只是直观的理解,如果想深究它的根本原理,可以参见它的证明Hill, T. P. “A Statistical Derivation of the Significant-Digit Law.” Stat. Sci. 10, 354-363, 1996.。另外,值得一提的是,本福特定律满足尺度不不变性,即如果我们换一套单位制,本福特定律仍然成立。其实,这也可以作为大自然产生的统计数据满足该定律的一个解释:如果我们把原来的单位

8、是米的统计数据换一个单位,例如换成英尺或者公尺,那么统计数据的分布应当不变。而唯一满足这种尺度不变性的分布,应当是某种对数分布,也就是本文的主角本福特定律。Benfords LawBenfords Law (which was first mentioned in 1881 by the astronomer Simon Newcomb) states that if we randomly select a number from a table of physical constants or statistical data, the probability that the first

9、 digit will be a “1“ is about 0.301, rather than 0.1 as we might expect if all digits were equally likely. In general, the “law“ says that the probability of the first digit being a “d“ isThis implies that a number in a table of physical constants is more likely to begin with a smaller digit than a

10、larger digit. It was published by Newcomb in a paper entitled “Note on the Frequency of Use of the Different Digits in Natural Numbers“, which appeared in The American Journal of Mathematics (1881) 4, 39-40. It was re-discovered by Benford in 1938, and he published an article called “The Law of Anom

11、alous Numbers“ in Proc. Amer. Phil. Soc 78, pp 551-72.To illustrate this interesting fact, try tabulating the first digits of the physical constants listed in Table 2.3 of Abramowitz and Steguns “Handbook of Mathematical Functions“. The result is the bar chart shown below, which gives the distributi

12、on of the leading digits of the 44 constants in the table, along with the theoretical expected distribution based on Benfords Law:Aside from the conspicuous deficiency of 3s, this is a reasonably good match for just 44 data points.Although there have been many lengthy and erudite “explanations“ of B

13、enfords Law, it seems to me it can be explained with a single picture:1-2-3-4-5-6-7-8-9Clearly the underlying premise of Benfords Law is that the subject population of quantities, expressed in the base 10 and more or less arbitrary units, will be fairly evenly distributed on a logarithmic scale. Thi

14、s is confirmed by the fact that the exponents on these constants are fairly uniformly distributed (at least over several orders of magnitude). As a result, the probability of the leading digit being “d“ clearly approachesOf course, we could have chosen units for our physical constants such that the

15、leading digits were all 9s (for example), but evidently we have a natural tendency to choose units so that our numbers are evenly distributed by order of magnitude, rather than absolute value. This may be related to our basic impressions of hearing and sight (and earthquakes), since our sense impres

16、sions of loudness and brightness are logarithmic.Naturally we can apply Benfords Law to numbers expressed in any base, not just the base 10. In general the probability of the leading digit d (in the range 1 to B-1) for the base B isNotice that for binary numbers, i.e., numbers expressed in the base

17、2, the probability of the leading digit being 1 is 1.000, as it must be, since the leading non-zero digit of a binary number is necessarily 1. The distributions of probabilities of the digits 1 to B-1 for each base B from 2 to 10 are shown below.We can also easily verify that the sum of all the prob

18、abilities for digits 1 through b-1 equals 1.0000, as it must, since the leading digit must be one of these. This impliesTo verify this, recall the fundamental law of logarithms, ln(ab) = ln(a) + ln(b). With this we can re-write this sum of logarithms as the logarithm of a product:which confirms the

19、result.By the same kind of analysis we can determine the probability that the second digit will have a certain value. Its only necessary to consider a single order of magnitude, since the pattern is repeated on each order. For example, in the base 10, the probability of the second digit being “3“ is

20、 equal to the sum of the probabilities of the first two digits being “1.3“, “2.3“, “3.3“, . or “9.3“ for numbers in the range from 1 to 10. This is indicated by the shaded regions in the logarithmic scale shown below.The fraction of this region covered by the range from 1.3 to 1.4 isThe fraction cov

21、ered by the other regions (such as 2.3 to 3.3, and so on) can be found similarly, and we can add them together to give the total probability that the first digit following the first non-zero digit will be a 3:In general, the probability of the 2nd digit of d in a base-B number (taken from a logarith

22、mic population) isExtending this analysis to the case of the nth digit following the first non-zero digit, we arrive at the general formulaThis applies to the case of the leading non-zero digit, with the understanding that with n = 0 the summation reduces to just the single term k = 0. This formula

23、shows that the non-uniformity in the distribution of digits becomes much less as we consider less significant digits. For example, we have P01 = 0.301019995., P11 = 0.113890103., and P21 = 0.101375977. Thus the probability of a “1“ quickly approaches 1/10 as we proceed to less significant digits.Return to MathPages Main Menu

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 重点行业资料库 > 建筑建材

Copyright © 2018-2021 Wenke99.com All rights reserved

工信部备案号浙ICP备20026746号-2  

公安局备案号:浙公网安备33038302330469号

本站为C2C交文档易平台,即用户上传的文档直接卖给下载用户,本站只是网络服务中间平台,所有原创文档下载所得归上传人所有,若您发现上传作品侵犯了您的权利,请立刻联系网站客服并提供证据,平台将在3个工作日内予以改正。