1、大数据管理与数据质量- 美国金融业中的对策,汪时奇 (博士)处理速度容量限制数据质量,Overview,数据 2 levels or 3 levels approvalExtend view points to confirm data qualityReduce redundancy systems (e.g. due to merge, due to vendors)Schedule Cleansing (see details)Enhance Reconciliation (see details)Build Trust level (see details)Try to cover a
2、ll rare cases,3.1.E Cleansing,WhenAt system mergeAt major changeHowDevelop detection applicationsDeliver mismatch reports to IT & businessFind solutions on both IT & business,3.1.F Reconciliation,Where1+ subsystems have data for same contents.1+ subsystems have independent date change functionality.
3、WhatRun & improve recon. app. routinely.Categorize reports by urgency.Analyze reports.Debug or adjust biz rule or apply Cleansing.,3.1.G Trust level,WhenAt 1+ fixed data inputsInputs are independentMust decide final details from inputsHow (based on)Provider level (for a detailed data group) Data his
4、torySamples: Bloomberg, Reuter, Telekurs, DTCC, ; Moody, S&P, Fitch.,3.2.A Failover & DR,FailoverDB: 2+ at diff. locations; real-time replicationAppActive-Active: Cluster with Load BalancingActive-PassiveAuto (via SAN)Manual + AutoDRDB: e.g. daily or hourly or real-time replicationApp: Manual switch,3.3 Technology,DB designConstraint Check (for sensitive table values)Normalization (to reduce duplications)Validation processes (to find conflict data)Application designData integration checkE.g. cryptography signatureE.g. CRC checkData display (e.g. Excel missing leading 0, date=num),