企业数据管理的混合工作负载环境【外文翻译】.doc-资源下载-文客久久网

企业数据管理的混合工作负载环境【外文翻译】.doc

1、外文翻译原文 Enterprise Data Management in Mixed Workload Environments Material Source: Proceedings of 2009 IEEE the 16th International Conference on Industrial Engineering and Engineering Management Author: Jens Krueger Enterprise applications are presently built on a 20year old data management infrast

2、ructure that was designed to meet a specific set of requirements for OLTP systems. In the meantime, enterprise applications have become more sophisticated, data set sizes have increased, requirements on the freshness of input data have been strengthened, and the time allotted for completing business

3、 processes has been reduced. To meet these challenges, enterprise applications have become increasingly complicated to make up for short-comings in the data management infrastructure. This paper outlines the characteristics of enterprise application with regards to the underlying data management lay

4、er. We also propose a database design perfectly fit to the demanded requirements of enterprise applications. Nowadays, enterprise applications for large and mid-size companies are subject to tied conditions. Rarely any company can manage daily business and offer its services at a certain level of qu

5、ality without the extensive use of comprehensive software systems throughout all departments of the company. enterprise applications are presently built on a 20-year old data management infrastructure that was designed to meet certain set of requirements for transaction processing systems. In the me

6、antime, enterprise applications have become more sophisticated, addressing for example legal regulations, governmental compliance, new accounting principles, and global supply chains. In addition, data set sizes have increased, requirements on the freshness of input data have been strengthened, and

7、the time allotted for completing business processes has been reduced. To meet these challenges, enterprise applications have become increasingly complicated to make up for shortcomings in the data management infrastructure. These complications increase the total cost of ownership of the applications

8、 and make them harder to use. Companies like SAP offer standard software solutions for a wide range of different industries and application domains. It is a fact that the system landscape in most companies is very heterogeneous. Different operating-and database systems have to be taken into account

9、when developing enterprise applications for a wide range of potential customers. Often, the decision to choose a certain database, operating system or the combination of both is of political or historical origin leading to a mismatch of the data management with regards to actual requirements of the

10、applications. The extended life cycle of enterprise applications with evolutionary changes corroborates this effect as well. The remainder of this paper is structured as follows: Firstly, section II will address the requirements for data management derived from enterprise applications such as the mi

11、xed workload. Also, the results of analyzing customer data will be described. Then in section III database technologies, which impact the data management for enterprise applications will be reviewed. Consequently, section IV points out the feasibility and improvements of applying those technologies

12、in combination. Section V surveys the related work conducted in this area. Finally, a conclusion will be provided in section VI. This section presents issues to concern about when looking at requirements for enterprise application specific data management. A. Mixed Workload In the context of enterpr

13、ise data management database systems are classified being optimized either for online transaction processing (OLTP) or online analytical processing (OLAP). In fact, enterprise applications today are primarily focused on the day-to-day transaction processing needed to run the business while the analy

14、tical processing necessary to understand and manage the business is added on after the fact. In contrast to this classification, single applications such as Available-To-Promise (ATP) or Demand Planning exist, which cannot be exclusively referred to one or the other workload category. These applicat

15、ion initiate a mixed workload in terms of that they process small sets of transactional data at a time including write operations and simple read queries as well as complex, unpredictable mostly-read operations on large sets of data with a projectivity on just a few columns. Having a mixed workload

16、is nothing new and has been analyzed on database level a decade ago by French-the insight that it is originated by a single application is new. Given this and the fact that databases are either build for OLTP or OLAP, it is evident that there is no database management system that adequately addresse

17、s the needed characteristics for these complex enterprise applications. For example, within sales order processing systems, the decision of being able to deliver the product at the requested time relies on the ATP check. The execution of this results in a confirmation for the sales order containing

18、information about the product quantity and the delivery date. Consequently, the checking operation leads to a database request summing up all available resources in the context of the specific product. Apparently, materialized aggregates could be seen as one solution to tackle the expensive operatio

19、n of on-the-fly aggregation. However, they fail in processing real-time order rescheduling due to incoming high priority orders leading to a reallocation of all products. Considering this operation as essential part of the present ATP application encompasses characteristics of analytical workloads w

20、ith regards to low selectivity and low projectivity as well as aggregation functionality is used and read-only queries are executed. Along the afore mentioned check operation the write operations to declare products as promised to customers work on fine-granular transactional level. While looking at

21、 the characteristics of these write operations it is obvious that they belong to the OLTP category. B. The Mismatch The afore mentioned, simplified example of a complex enterprise application shows workload characteristics, which match with those associated with OLTP and OLAP. As a consequence, nowa

22、days database management systems cannot fulfill the requirements of specific enterprise applications since they are optimized for one or the other category leading to a mismatch of enterprise applications regarding the underlying data management layer. Mainly because conventional RDMBSs cannot execu

23、te certain important complex operations in a timely manner. While this problem is widely recognized for analytical applications, it also pertains to sophisticated transactional applications. To meet this issue, enterprise applications have become increasingly complicated to make up for shortcomings

24、in the data management infrastructure. One of these solutions packaging the operations as long-running batch jobs. Consequently, this approach slows down the rate at which business processes can be completed, possibly exceeding external requirements. Maintaining pre-computed, materialized results of

25、 the operations are another solution. Materialized views in data warehouses for analytical applications are an example of this approach, which makes applications less flexible, harder to use, and more expensive to maintain. To address this mixed workload the database management layer has to be aware

26、 of this fact and optimized towards these contradicting workloads by leveraging nowadays advances in hardware such as the availability of huge amounts of main memory. Additionally and presented by Schaffner et al. in, recent trends in data management, like storing data in a column-wise fashion and l

27、ight-weight compression algorithms, support the feasibility of building a enterprise application specific data management layer. C. Data Characteristics Give the mixed workload characteristics, data managements systems optimized for analytic-style queries seem to be the best match. However, fast rec

28、onstruction of complete tuples is still an essential requirement of OLTP workloads. While ERP data schemas consist of very wide relations due to inherent complexity this operation can be expensive. For example, in a large enterprise system the accounting document has 98 attributes while the correspo

29、nding line item contains 301 attributes. Consequently, in order to make an assumption of whether a row-or column-oriented oriented database is better suited for an ERP system, the usage of each attribute of a table is explored. The main focus in this evaluation are the distinct values of each column

30、. While taking the most common applications in enterprises, the financial accounting and sales order processing have been analyzed. It is assumed that the type of data characteristics and structure can be applied to other application domains as well. Traditionally, enterprise applications store thei

31、r data in a conventional RDBMS. These originated from the System R in the 1970s. In order to fulfill the requirements of OLTP applications these days row-oriented database management systems were developed. There the data is organized logically and physically in tables containing rows. Each table re

32、presents all entities of a certain type and each row represents a single entity where the specific columns represent attributes of each entity. Given the fact, that most of the described data access characteristics of OLTP applications are accessing a full relation and storage is assumed as disk-bas

33、ed this storage layout is preferable. A. Column Database While conventional RDMBS store data in a row-wise fashion, which has advantages for OLTP applications, this physical data representation is not optimized for read-mostly, analytic-style queries where typically only a few columns are projected.

34、 Following the trend of specialized database data management system have been developed organizing data along columns. While the logical data layout still remains as before, meaning that the data is still organized in tables, the physical layout now differs from this. 译文企业数据管理的混合工作负载环境资料来源 : 2009

35、年国立清华大学第 16 届工业工程与工程管理国际会议作者： Jens Krueger 目前企业应用程序仍然建立在二十年旧的数据管理基础架构之上，旨在满足 OLTP 系统具体要求的各项措施的实施。在此期间，企业应用变得更加复杂，数据集大小的增加，对输入数据的更新速度和及时性的要求也得到了加强，而且分配用于完成业务流程的时间也要减少。为了迎接这些挑战，企业对应用系统的要求日益复杂，对短期的数据管理基础的设施有了更多的需求和要求。本文概述了企业应用方面的特点，对基础数据的管理层。我们也提出了一种数据库模式，设计尽量接近完备，更适合

36、现代企业应用所需求的目标。如今，大型和中型公司的企业应用程序都受到很多条件的束缚。很少有公司在功能不完备的应用系统软件提供的服务下，能管理好公司日常业务并同时也使得公司各个部门也能被提供良好的服务水平。目前企业应用基本建立在一个 20 岁左右的数据管理架构之上，旨在满足对某些事务处理系统的要求而设置的。在此期间，企业应用变得更加复杂，例如法律法规，政府规定，新会计准则，与全球供应链解决。此外，数据集大小的增加，对输入数据的及时性和更新速度的要求也得到了加强，并且给予完成业务流程而分配的时间却大大的减少了。为了迎接这些挑战，企业

37、对应用的需求日益复杂，使得数据管理基础设施显得尤为不足。这些问题的日益尖锐并发的使得另外附加的信息数据更加难以处理，使整个数据管理变得复杂而难以统一。如 SAP 公司提供的在不同行业和应用领域广泛应用，统一标准的软件解决方案。这是一个事实证明，在大多数公司里系统环境是很不均匀的。不同的操作系统和数据库系统都必须考虑到会不会有更广泛的潜在的使用者将会在企业里应用到它们。通常情况下，决定选择某个特定数据库，操作系统或两者的结合，是政治或历史渊源导致了它们与数据管理方面的应用程序的实际需要的不匹配，不能及时更新的情况

38、。面对企业应用与发展变化的生命周期延长，以及证实这种效果发生的情况。本文的其余部分的结构如下：第一，第二节将处理从企业得到系统应用效果的详情，如数据管理在混合工作负载情况下的应用要求。此外，客户所提供的数据进行分析以得到结果而进一步进行说明。然后，在第三节关于数据库技术说明时，阐述此技术对企业应用数据管理的巨大影响并将其在企业中应用的结果进行审查。因此，第四部分针对之前提出的可行性和应用到的相关技术的改进相结合进行论述。第五节调查这方面的工作进行情况。最后，结论将在第六节总结得出。本节所介绍，是关注于发

39、现和研究企业应用程序对于数据管理特定的要求。 1 混合工作负载环境在企业数据管理数据库系统的范围内优化要么被归类为在线交易处理要么就是联机分析处理。而事实上，今天的企业应用主要集中在广泛的营运所需的日常交易业务上，同时需要处理一些简单的分析，了解日常运营情况和管理日常业务，增加对日后总结加工的事实材料。与此不同的分类方法，如单个应用程序可以给企业提供或者承受的供承诺量而得出的需求规划，就完全不能称为是一个或者一系列合理的工作量分类方法。这些应用程序在启动方面的混合应用而产生的共同的工作量和传递共享过程中

40、产生的小数据集，包括的操作主要是对一些少量简短的输入数据的输入读取和查询，以及对一些更为复杂的不可预测的大量数据集的重点阅读与分析，尤其是对这项工作的时间控制。然而是否真的存在类似新的混合工作负载状况呢？利用法国十年前数据库技术来进行观察推测证明，它是独特单一的应用系统全新的起源。鉴于这点，在实际操作中不论在建立数据库时是应用 OLTP 还是 OLAP，显然没有相关的数据库管理系统能完全解决这些复杂的功能。例如，在销售订单处理系统中，要求在固定时间内提供产品，而做出的相关的决策需要依赖于

41、ATP 检查系统。在一个完整的销售流程中应该包含产品质量和交货日期等信息来确认订单执行的结果。因此，检查相关系统操作而对数据库发出请求的情况可以总结出不同的产品所需要的所有可用的资源和资料。显然，物化总量算法可以作为一种解决方案用以对付以上类似的的需要即时进行集中程度高涉及成本高的操作。然而，他们未必能在实时处理订单时重新考虑安排新传入的高优顺序而导致了所有产品需要重新分配的情况。考虑到这一行动作为本 ATP 的应用程序的必要组成部分并包括了与有关负荷低，选择性低，投射

42、低的分析工作的特性，以及聚集功能的使用性和只读查询被执行时的特性。除了前述各种有关检查操作提到的输入操作需要为向客户承诺详细事务型的工作产品说明。然而只要从这些输入操作的特征区别，很明显，他们是属于 OLTP 类别的。 2 不匹配性之前述说了各种相关的系统工作特性，其中有提到系统简化了复杂的企业应用需要更多的实例数据来表明工作量的特点，这是对 OLTP 和 OLAP 不同特性进行的相关比较。然而，对于现在的数据库管理系统不能满足企业应用的具体要求这一情况，因为他们本身也是其中的一个类别，导致企

43、业应用管理方面的基础数据层优化工作有了不能与系统相匹配的情况。主要的原因归结到底是因为传统的关系型数据库不能及时执行某些重要的复杂操作。这个问题不仅仅在分析应用中被广泛认可和统一意见，同时因为它也涉及到复杂的事务处理，所以在应用关系层也被时常关注着。由此可见，企业应用日益复杂，使得数据管理基础设施渐渐力不从心，难以满足企业的需求。其中一种封包数据，只需要运行批处理作业的操作方案。因此，这种方法减少了在某些业务流程内部就可以完成而对外部业务流程需求的可能性和必要性。而用来维护计划预算了的物化的业务流程的过

44、程则需要另一种解决方案。在数据库仓储物化视图管理时的分析应用就是利用了相同的方法，它使应用程序灵活度下降，成为难以调节利用的模板，并且需要更昂贵的维护。为了解决这种混合层的工作量大而难以管理的情况，在进行数据库管理时就必须意识到这一事实，并通过利用时下先进技术，如提供大量的更新并改进了的主内存和硬件来解决缓解这些矛盾实现对工作负载的进一步优化。此外，同时提出由夏弗纳等人在数据库管理方面的最新研究成果的趋势也是一种明智的时尚行为，用重量轻存储数据压缩算法来支持建立一个企业应用程序，使更有针对性的数据管理的

45、产生有了可能性。 3 数据的特点对于混合工作负载的特性，数据管理人员解析式的查询和优化系统的方式似乎是与之最佳匹配的。然而，快速及时的重建完整元组数据仍是 OLTP 工作负载的基本要求。虽然 ERP 数据模式非常广泛的被应用，但是由于系统应用中存在固有的复杂性较高的技术部分，此改进更新操作可能需要很昂贵的代价。例如，在大型企业制度下编排的会计凭证属性时，同时有 98 个项目，包含了相应细分项目属性 301 个。因此，按行或列导向型的数据库的建立需要利用 ERP系统，用于实际比较贴合的模拟方式对

46、每张表的属性和使用进行探讨。在这种时候重点需要放在每个列的重复值上。在企业业务中财务会计和销售订单的处理分析显然是最常见的应用。这是假设固定了数据的特点和结构类型时可以应用到其他应用领域的情况。传统的企业应用程序在传统关系型数据库存储数据的方式源自 20 世纪 70 年代。为了满足当时的 OLTP 应用程序的行式数据库管理系统开发的要求。有数据的组织逻辑上和物理行的表中包含。每个表代表了某些类型的所有实体，每行代表了一个实体，而相对应的特定的那列代表了每个实体的属性。所有与以上描述类似的数据访问特性与大多数的 OLTP 应用程序相匹配，由此不难看出一个完整的系统关系的数据存储是基于磁盘的存储布局的。在常规关系型数据库管理系统中列出行是正确的， OLTP 应用程序存储数据的优势也在此，物理数据的主要读取方式是分解式，通常只有几列预测内的查询行为。目前专业数据库数据管理系统的发展趋势延续了之前已经存在的数据组织方式的模式。逻辑数据布局不变，数据有组织性，但是现在的物理布局不仅限于此。存储模式改进为分解存储模型和垂直分布存储。

邮箱/手机：
温馨提示：	快捷下载时，用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）。如填写123，账号就是123，密码也是123。
特别说明：	请自助下载，系统不会自动发送文件的哦；如果您已付费，想二次下载，请登录后访问：我的下载记录
支付方式：
验证码：	换一换

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？