1、 外文翻译 原文 Enterprise Information application Strategy MaterialSource: http:/ n21400209 Author: Philip To reach this goal, organizations must implement an enterprise information management (EIM) strategy that combines familiar and new methods for addressing data integration, data quality, semantic rec
2、onciliation, and metadata management.To maximize enterprise performance, todays organizations require timely, consistent access to trustworthy information from within their organization and beyond. An EIM strategy that offers integration with the Bl platform delivers a deeper level of insight and vi
3、sibility to the organization. Introduction Over the last 15 years, organizations have established business intelligence (BI) as a critical application for delivering insight, enabling better decision making, and driving enterprise performance. Initially, demands on BI systems were primarily related
4、to information access; end users were happy to simply gain access to data locked away in transactional systems. In some cases, organizations prepared data for BI by integrating multiple data sources into a single data warehouse. In others, users were able to access data directly with little reconcil
5、iation or quality assurance effort. While greater end-user reporting, analysis, and performance management requirements have advanced BI technologies tremendously over the years, back-end data-quality issues continue to be a huge challenge and a barrier to BI end-user trust and adoption. As market f
6、actors such as corporate compliance have compounded the pressure to ensure that organizations are always working with timely and accurate information, data inconsistencies, redundancy, and poor data quality are no longer acceptable. A single, consistent view of business information with assurance th
7、at it is accurate and trustworthy is imperative. But data is dynamic and disparate for most organizations, and traditional data warehousing techniques alone often do not address the broad spectrum of information access and analysis requirements. To meet this challenge, a strategy for enterprise info
8、rmation management is required. EIM provides a business and technology framework to support the delivery of information into a single, consistent view. A comprehensive EIM strategy helps organizations improve operational efficiency and bottomline performance and serves to broaden the effectiveness a
9、nd reach of BI solutions. Challenge: Silos of Data and Metadata BI infrastructures are often fragmented with data and metadata residing in different business domains, departments, and geographies. This presents a challenge when it comes time to deliver a single, consistent view of the business. A fe
10、w common issues contribute to this problem. 1. Disparate Data Organizations face an explosion of data today, with information coming from traditional sources (such as spreadsheets, databases, legacy systems, and enterprise applications) and new sources (such as Web applications and XML-based systems
11、.) As organizations move through mergers and acquisitions, new data sources are introduced into the IT landscape. The one constant about enterprise data is that its always changing. As a result, IT must continually be prepared to deal with disparate data that comes from heterogeneous data sources. M
12、any enterprise data warehouse implementations fail to meet their business objectives because of the rapidly changing landscape of enterprise data affected by organizational mergers and acquisitions. How do organizations gain a single view of business when new data enters the BI landscape before they
13、re done building their enterprise data warehouses? Additionally, user requirements for on-demand information have increased. In many cases, users need near-real-time information to address operational BI requirements. 2. Poor Data Quality What is the impact to your business when you make decisions f
14、rom inaccurate information? In the case of a midsize software company in the UK, it cost over $ 1 million when a budgeting decision based on an inaccurate BI report caused inventory requirements to be overstated. Data-quality issues are a reality in almost all transaction systems. They are caused by
15、 different factors, such as incorrect data entry, multiple records of the same customer coming from different systems, empty fields (such as missing contact information), and redundant or inconsistent data between two data silos. For most organizations, poor data quality is the primary reason why BI
16、 projects fail. In the case of the software company, the BI project suffered from a lack of end-user trust until the company successfully implemented a data integration strategy that fixed its data-quality issues. Many organizations that have experienced this pain recognize the importance of data qu
17、ality for BI success. 3. Inconsistent Semantics How do you determine the total sales for your company if every division, department, and country uses a different definition of sales? In the case of a global media company, 15 different terms described sales in its operational systems, departments, an
18、d geographies. Without a common language and definitions for data across the organization, it is impossible to gain a single, consistent view of the business. A process must be established to help define and reconcile semantics across the organization-with minimal impact on current systems. 4. Metad
19、ata Visibility How can you trust the numbers in your BI report if you dont know where they came from or how they were computed? Compliance requirements mandate that organizations be held accountable for their financial information. As a result, the need to trace data to its origin is now a critical
20、BI function. To answer this question, you will need to gain visibility into all the metadata in your BI environment. The challenge is that every data source, BI tool, and ETL tool contains its own metadata, and they do not talk to one another. Answering the question can also be a consuming task and,
21、 in some situations, nearly impossible because data has been transformed using hand-coded scripts. For one data warehousing expert, 50 percent of BI service requests are about data lineage: “Where did that number come from?“ Without a way to easily view the end-to-end metadata in a BI environment, o
22、rganizations cannot deliver trusted information for BI users. Solution: Gaining a Single and Consistent View of the Enterprise Getting to a single view of the enterprise has been a long-standing goal of most organizations, but it has been achieved by only a few. Managing informatio n across the ente
23、rprise requires a well-thought-out strategy that employs a set of technologies and processes to address the different requirements of the business. Ultimately, your enterprise information management strategy needs to address the core issues that arise from having silos of data and metadata. The key
24、areas to focus on are data integration, data quality, semantic reconciliation, and metadata management. 1. Data Integration Data integration is the foundation of successful BI. Without a comprehensive strategy for unifying your disparate data, you will not gain a single view of the truth. Various me
25、thods exist to integrate disparate data, and each offers a unique advantage that meets the different information requirements of the business. Extract, transform, and load (ETL) technology is used to build data warehouses and data marts for BI. ETL extracts data from disparate source systems, transf
26、orms the data to meet business requirements, and loads the data into a target database. The process usually occurs in a nightly (batch) window. While organizations use ETL tools to build an enterprise data warehouse to deliver a single version of the truth, this is not often achieved because of evol
27、ving business requirements and a rapidly changing data environment. ETL allows organizations to: * Create a trustworthy data foundation for analytical purposes * Combine data from disparate data sources * Establish consistency throughout the organization * Provide historical breadth and enable trend
28、 analysis Enterprise integration information (EII) technology has emerged to provide agility for organizations to meet real-time information requirements. EII is both a complementary and, in some cases, alternative solution for ETL. EII provides real-time integration of disparate data without physic
29、ally moving it to a new location. Only requested data from the transactional systems is moved and transformed, on demand at query time, and the end result appears to come from a single data source, similar to a data warehouse. Because there is no storage of data, EII does not address the need for a
30、historical view of the business. EII allows organizations to: * Provide real-time views of data spread across multiple operational systems * Combine data from an operational system with a data warehouse * Support operational BI requirements Enterprise application integration (EAI) allows enterprise
31、application systems to exchange data. It is event-driven and allows for the transfer of messages from one application to another. EAI is useful for connecting enterprise applications in real time for business process automation. EAI is also used to capture changes in operational and other systems to
32、 feed real-time data to data warehouses. EAI allows organizations to: * Make a change in one application and reflect it elsewhere * Ensure that the change is captured and delivered reliably * Feed data warehouses with real-time data 2. Data Quality Data quality is an important component of any BI an
33、d data warehousing implementation. Without data quality controls to ensure the accuracy and trustworthiness of information, BI deployments will fail to gain end-user confidence. A few vital steps will help deliver trustworthy information. Data profiling. Understanding your source data by analyzing i
34、ts characteristics, type, quality, and relationships typically occurs before any ETL or EII development begins. This process provides insight into how data should be transformed to improve data quality. Data profiling can be used to identify problems and anomalies in the source data, such as telepho
35、ne or Social security numbers that do not match their expected format or pattern. It can also be used to examine inter-record dependencies, such as sales orders for products that are not in the product master file. Data profiling and data cleansing complement each other. Once data profiling identifi
36、es an issue, the data cleansing process can be used to correct the problem. Data cleansing. Once youve profiled your source data, you are ready to cleanse it. The data cleansing process involves identifying, correcting, and consolidating data. For example, with customer data, data cleansing identifi
37、es contact names and addresses, then standardizes the data and enhances it to fill in missing fields or incorrect addresses. Matching and merging capabilities provide sophisticated ways to identify members of the same household, combine records by matching different forms of the same name (such as J
38、on and Jonathan), and match and consolidate records into a single view. Data validation. This process prevents unwanted data from entering your data warehouse. For example, you may only want sales records in 2005, postal codes that match a specific pattern, or product IDs that are not null. Data qua
39、lity is often a matter of perspective and requires tight collaboration between IT and business constituents. By defining the business rules that help identify unwanted data, you can ensure a high level of information accuracy. Data auditing. Another challenge for developers is to audit the integrity
40、 of the ETL job against operational rules. By auditing data, you can verify that the expected data is read, processed, and loaded successfully. For example, you can verify that all 100,000 records loaded successfully into the data warehouse if you are extracting tables from flat files. Another usefu
41、l application is to verify the successful execution of a join. Data auditing can determine if any rows are missing and if any joins have been configured improperly. 3. Semantic Reconciliation Inconsistent semantics exist in every business domain and its underlying applications. Without a common defi
42、nition of data across the enterprise, each department will have a conflicting view of the business. This can strangle an organizations efficiency and agility. To achieve semantic reconciliation, organizations must develop a common definition and manage these semantics through a master reference solu
43、tion. Common Definition. The process of establishing these common data definitions is called semantic reconciliation, and it requires executive-level sponsorship. Typically, the role of a data steward is developed to drive inter-departmental standards for describing the business in terms of customer
44、s, products, and employees. Also, BI tools offer a semantic layer that further assists in translating the meaning of technical terminology into business. For example, when determining departmental productivity using “cost per employee“ as a metric, do two part-time employees, each working a four-hou
45、r day, count as one employee or two? The answer is likely to differ by department and, unless an organization-wide definition is established, departmental comparisons are not meaningful. Once an organization recognizes differing definitions and standardizes on enterprise definitions, a data warehous
46、e can aid in the implementation. It may be impractical to modify every operational system to reflect the enterprise standard. However, it is possible to transform the data extracted from each operational system to conform to the enterprises standard definitions and value lists as the data is loaded
47、into the warehouse. When an analyst uses a data warehouse, he is comparing “apples to apples;“ when an analyst directly accesses two or more operational systems for analysis purposes, he is often comparing “apples to oranges.“ Master Data Management. All organizations have data that is used across s
48、everal departments. Examples of these “reference data“ files include customer, product, employee, vendor, and financial data. In many organizations, individual departments maintain their own reference files, and problems frequently arise when different departments use different identifiers or keys f
49、or the same customer, making it difficult (or even impossible) to accurately aggregate or combine data across systems. For example, if a customers revenues from both the sales and the service departments cant be accurately combined, the total value of that customers account is understated. While the term “master data management“ is receiving tremendous attention, it is essentially an extension of the reference file concept-a concept that was behind the use of centralized Rolodex files