1、 本科毕业设计 (论文 )开题报告 题目: 在线信息采集 系统设计 The Design of Online Information Collection System 1 一、本课题的研究意义、研究现状和发展趋势 在线信息资源采集系统是一套功能强大的网络信息资源开发利用与整合系统,可用于定制跟踪和监控互联网实时信息,建立可再利用的信息服务系统。它能够从各种网络信息源,包括网页、 BLOG、论坛等采集用户感兴趣的特定信息,经自动分类处理后 ,以多种形式提供给最终用户使用。能够快速及时地捕获 用户所需的热点新闻、市场情报、行业信息、政策法规、学术文献等网络信息内容,可广泛用于垂直搜索引擎、网络敏
2、感信息监控、情报收集、舆情分析、行情跟踪等方面。 (一)研究现状: 1、国内研究现状 3 随着互联网的快速发展,网络媒体作为一种新的信息传播形式,已深入人们的日常生活。互联网信息采集系统需要集成网页搜索、内容智能提取与过滤、自动分类、等技术,实现对互联网信息采集、过滤、提取的自动化与一体化,搭建成一个完善的网络信息搜集平台 ;获取大量的互联网上资料,充实资料库;收集与自身业务相关的文献资料,提升办公与决策效率;快速获取行业宏观环境、政策动态。 信息是单位了解情况、进行科学决策和推动工作落实的基础和依据,因此,及时掌握准确、全面的信息极为重要。新闻媒体、政府部门、大型企事业单位纷纷通过互联网技术
3、搭建网络信息收集平台:新闻媒体需要获取大量的互联网上新闻资料,充实新闻资料库;政府机关需要收集与自身业务相关的文献资料,提升办公与决策效率;大型企事业单位需要快速获取行业宏观环境、政策动态与竞争对手信息 2、国外研究现状 14 随着国外经济的持续发展,信息的多元化 ,学生信息查询模式也发生了变革,并随着计算机科学技术的不断发展,使得信息采集系统在各个领域发挥着越来越重要的作用。 国外的大中型软件公司有很多也曾做过很多在线信息采集系统,开发技术也比国内的成熟,在开发过程中遵循了可扩展性和包容性,使得系统能接纳已有的数据结构,在今后扩展时有效地保护已有的资源,在需要变化时,能方便地调整结构,易于扩
4、充功能,升级方便,即满足当前的业务需求,又为今后的扩展留有很大的空 间。而在信息标准化和规范化的基础上,对信息进行合理的布局,使得相关人员可以按照各自的权限进行信息采集和维护。 (二)发展趋势: 1、应用方面的发展趋势 从应用的发展来看,信息化的发展给各个社会组织带来了信息采集的变革。信息作为现代社会组织的一部分,其采集模式必然深受信息化的影响,在线信息采集系统必将随着计算机技术的发展迅速普及和提高。主要体现在: 2 1)单机采集向网络化采集发展 2 在线信息工作涉及到在线信息的诸多内容,在网络上 实现数据交互与共享,形成全面综合的查询已经是在线信息采集的研究热点和重点。 2)辅助查询向辅助决
5、策发展 随着查询要求和查询信息系统水平的不断提高,在线信息采集系统从辅助查询向辅助决策发展是必然趋势。如何利用在线信息采集系统查看各个实时的信息,采用数据挖掘技术,找出有价值的信息,并根据此信息进行分析、比较、选择等,获得最有用的信息。 2、技术方面的发展趋势 5 从技术的发展来看,在线信息采集系统的开发技术已经日益成熟。计算 机和网络技术的不断发展为国家的信息化建设提供了得天独厚的技术条件;各个局域网、城域网的建成和完善为实现信息的网络化提供了硬件设施保障;先进的软件系统开发工具、好的图形用户界面操作系统的推广,高效汉字处理技术的不断改进,都为在线信息采集系统的开发、应用提供了强有力的支持。
6、 二、研究方案及工作计划 本课题为设计课题,要求基于在线信息采集系统的设计和开发方法设计和完成一个较完整的在线信息采集系统 10,主要内容为: 1、技术准备:通过搜集阅读一定数量的中英文资料,掌握在线信息采集系统的构成、并能熟练使用一种信 息采集开发工具; 2、系统的整体分析和设计,主要包括信息采集的概念、信息采集的原则、信息需求分析、信息源的筛选等模块; 3、系统各模块的分析、设计、编码与系统各模块的整合和整体功能测试。 此学生信息管理系统需要完成的功能主要有 : 有关数据查看 .可以修改搜索的类型和搜索的范围。 在线信息的爬行 . 在线信息的索引创建 . 已经爬行入库信息的查询 . (一)
7、研究方案 1、文献调研 查阅大量的相关研究成果及相关理论,同时大量阅读相关专著,理出此次研究的总体思路。 2、访谈调查 通过对部分信息工作人员、新 闻工作人员进行访谈调查,全面了解相关工作人员对在线信息采集系统的认识和态度。 3、阶段总结法 定期做工作阶段总结,不断反思与回顾,提高研究能力,完善研究方案,丰富研究3 素材,最终形成研究成果。 查询相关资料,了解本系统的研究意义。可以上网搜索或者是去图书馆查阅相关资料。 通过查询资料了解该系统要如何做,及要做哪些东西。 设计出大体上的功能模块,画出模块图。 通过进一步的了解,对每个功能模块进行细化,将每一步都想清楚。制定出每一步的做法和注意的地方
8、。 对设计好的程序进行调试,通过调试发现存在 的问题并解决,从而达到完善系统的目的。 最后,整理各阶段的设计记录文档,写成论文稿。 (二)工作重点与难点 1、重点 1)在线信息采集系统其开发包括数据库 12 的前期开发、应用程序的开发以及数据库的后期数据的不断更新系统。对于数据库的前期开发要求建立数据一致性和完整性强、数据安全性好的数据库。对于应用程序的开发要求应用程序功能的完整性、易使用性等特点。对于数据库后期的数据不断更新,需要建立完整的应用程序,方便不断更新数据。 2)运用面向对象语言来编写程序完成用户界面等操作,并做好与数据库的连接,运用 JDBC 来完成数据库的创建及生成 15 。
9、2、难点 1)根据不同的用户分配不同的权限,对于管理员可以在界面中查询、修改、添加和删除其中的内容,再将更新后的数据输入数据库 16 ,并在界面中显示出来。 2)在线信息采集系统结构流程复杂,系统庞大。开发人员需要根据实际情况,较快地确定需 求,然后采用循环进化的开发方式,对系统模型作连续的精化,将系统需具备的功能逐渐增加上去,直到所有的功能全部满足。 3)在线信息采集系统开发成功投入使用后,使用单位所应具备的计算机设备及其有关的外围设备。对硬件资源进行可行性分析时主要考虑计算机的主机内存、类型、功能、联网能力、安全保护措施以及输入 /输出设备,外存储器和联网数据通信设备的配置。 (三)工作计
10、划 2014 年 2 月 25 日至 2014 年 3 月 17 日: 毕业实习(选择与专业、课题相关的实习单位实习) ,了解课题所研究的系统运作过程,准备开题报告 2014 年 3 月 18 日至 2014 年 3 月 31 日: 广泛阅读课题所要掌握的基础文献,收集准备与课题有关的专业文献资料,了解课题内容及研究意义,为开题做准备 2014 年 4 月 1 日至 2014 年 4 月 14 日 : 收集整理已掌握的文献资料,了解在线信息采集的相关知识,熟悉掌握在线信息采集系统开发技术的运用,通过多种调研方式(如小组讨论、网上收集资料等)了解在线信息采集系统的需求 2014年 4月 15 日
11、至 2014 年 4月 28 日:与指导老师、同组同学讨论,逐步确定要4 解决的问题以及解决问题的技术方法,并讨论分析得出“在线信息采集系统”的需求分析,撰 写需求分析报告 2014年 4月 29 日至 2014 年 5月 12 日: 根据分析阶段得出的报告对系统进行概要设计和详细设计,并撰写设计说明书,并根据系统结构搭建开发环境 2014 年 5 月 13 日至 2014 年 06 月 2 日 :根据系统详细说明书对系统的各模块进行编码实现,并编写测试报告,进行单元测试和集成测试,同时整理前期材料,开始撰写毕业设计论文 2014 年 6 月 3 日至 2014 年 6 月 16 日撰写并整理
12、毕业设计论文,指导老师审阅,准备答辩 2014年 6月 17日至 2014年 6月 23日答辩并总结 三、阅读的主要参考文献 1 李 金海 ,张景元 .Struts,Spring 和 Hibernate 的 J2EE 架构的研究和实现 J.山东理工大学学报 (自然科学版 )2006,(06). 2 明日科技 Java 从入门到精通(第 3 版) M 北京:清华大学出版社, 2012 3 蒋宗礼,马涛,唐好魁,闫明霞等数据库技术及应用(第 2 版) M电子工业出版社, 2010: 43 65 4 李兴华; Java 开发实战经典 M;北京 :清华大学出版社; 2009,8:30-40. 5 唐汉
13、明 ,翟振兴 ,兰丽华 ,关宝军 ,申宝柱 .深入 浅出 MySQL 数据库开发、优化与管理维护 M.人民邮电出版社出版 ,2006-2. 6 李盛恩,王珊数据库基础与应用(第二版) M北京:人民邮电出版社, 2009:14 78 7 孙卫琴:精通 Hibernate: Java 对象持久化技术详解 M.电子工业出版社出版 8 刘瑞新,张兵义大学计算机规划教材: SQL Server 数据库技术及应用教程 M电子工业出版社, 2012, 8 9 夏昕 , 曹晓钢 , 唐勇 . 深入浅出 HibernateM.电子工业出版社 , 2005-6. 10 张德详 J2EE 架构下校园网用户管理系统的
14、分析与部分实现 J青岛大学学报,2010, 19(4): 86 89 11 邬继成 .J2EE 开源编程精讲 15 讲 M,电子工业出版社 ,2008.1:41-114. 12 王珊 ,萨师煊 .数据库系统概论 M.高等教育出版社 ,2006.5:198-235. 13 张孝祥 .深入 Java Web 开发内幕 核心基础 M.北京:电子工业出版社 .北京 .2006.10. 14 舒红平 .Web 数据库编程 -javaM,西安电子科技大学出版社 ,2005:97-134. 15 Stephanie Bodoff, Dale Green, Kim Haase et al. The J2EE
15、TutorialM.Addison-Wesley Professional,2003.7(02) 16 Wendy Boggs,Michael Boggs.Mastering UML with Rational XDEM.Publishing House Of Electronics Industry, 2003:11-56. 5 17 Cay S.Horstmann,Gary Cornell 著,叶乃文,邝劲筠,杜永萍 .JAVA 核心技术卷I:基础知识,程序设计教程 M.人民邮电出版社, 2008.5:87-234. 18 Bruce Eckel,饶若楠等译 .Java 编程思想机械工业出
16、版社 M,2005:124-234. 19 庞丽娜 .Java 应用开发技术详解 M,科学出版社 ,2007:126-235. 6 外文文献 Information securityhttp:/en.wikipedia.org/wiki/Information_security From Wikipedia, the free encyclopedia Information Security Attributes: or qualities, i.e., Confidentiality, Integrity andAvailability (CIA). Information Systems
17、 are composed in three main portions, hardware, software and communications with the purpose to help identify and apply information security industry standards, as mechanisms of protection and prevention, at three levels or layers:physical, personal and organizational. Essentially, procedures or pol
18、icies are implemented to tell people (administrators, users and operators) how to use products to ensure information security within the organizations. Information security (sometimes shortened to InfoSec) is the practice of defending information from unauthorized access, use, disclosure, disruption
19、, modification, perusal, inspection, recording or destruction. It is a general term that can be used regardless of the form the data may take (electronic, physical, etc.)1 Below are the typical terms you will hear when dealing with information security: IT Security = Sometimes referred to as compute
20、r security, IT Security is information security when applied to technology (most often some form of computer system). It is worthwhile to note that a computer does not necessarily mean a home desktop. A computer is any device with a processor and some memory (even a calculator). IT security speciali
21、sts are almost always found in any major enterprise/establishment due to the nature and value of the data within larger businesses. They are responsible for keeping all of the technology within the company secure from malicious cyber attacks that often attempt to breach into critical private informa
22、tion or gain control of the internal systems. Information Assurance = The act of ensuring that data is not lost when critical issues arise. These issues include but are not limited to; natural disasters, computer/server malfunction, physical theft, or any other instance where data has the potential
23、of being lost. Since most information is stored on computers in our modern era, information assurance is typically dealt with by IT security specialists. One of the most common methods of providing information assurance is to have an off-site backup of the data in case one of the mentioned issues ar
24、ise. Governments, military, corporations, financial institutions, hospitals, and private businesses amass a great deal of confidential information about their employees, customers, products, research and financial status. Most of this information is now collected, processed and stored on electronic
25、computers and transmitted across networks to other computers.Should confidential information about a business customers or finances or new product line fall into the hands of a competitor, such a breach of security could lead to negative consequences.clarify Protecting confidential information is a
26、business requirement, and in many cases also an 7 ethical and legal requirement. For the individual, information security has a significant effect on privacy, which is viewed very differently in different cultures. The field of information security has grown and evolved significantly in recent years
27、. There are many ways of gaining entry into the field as a career. It offers many areas for specialization including: securing network(s) and allied infrastructure, securing applications and databases, security testing, information systems auditing, business continuity planning and digital forensics
28、, etc. editHistory Since the early days of writing, politicians, diplomats and military commanders understood that it was necessary to provide some mechanism to protect the confidentiality of correspondence and to have some means of detecting tampering. Julius Caesar is credited with the invention o
29、f the Caesar cipher ca. 50 B.C., which was created in order to prevent his secret messages from being read should a message fall into the wrong hands, but for the most part protection was achieved through the application of procedural handling controls. Sensitive information was marked up to indicat
30、e that it should be protected and transported by trusted persons, guarded and stored in a secure environment or strong box. As postal services expanded, governments created official organisations to intercept, decipher, read and reseal letters (e.g. the UK Secret Office and Deciphering Branch in 165
31、3). In the mid 19th century more complex classification systems were developed to allow governments to manage their information according to the degree of sensitivity. The British Government codified this, to some extent, with the publication of the Official Secrets Act in 1889. By the time of the F
32、irst World War, multi-tier classification systems were used to communicate information to and from various fronts, which encouraged greater use of code making and breaking sections in diplomatic and military headquarters. In the United Kingdom this led to the creation of the Government Codes and Cyp
33、her School in 1919. Encoding became more sophisticated between the wars as machines were employed to scramble and unscramble information. The volume of information shared by the Allied countries during the Second World War necessitated formal alignment of classification systems and procedural contro
34、ls. An arcane range of markings evolved to indicate who could handle documents (usually officers rather than men) and where they should be stored as increasingly complex safes and storage facilities were developed. Procedures evolved to ensure documents were destroyed properly and it was the failure
35、 to follow these procedures which led to some of the greatest intelligence coups of the war (e.g. U-570). The end of the 20th century and early years of the 21st century saw rapid advancements in telecommunications, computing hardware and software, and dataencryption. The availability of smaller, mo
36、re powerful and less expensive computing equipment made electronic data processing within the reach of small businessand the home user. These computers quickly became interconnected through a network generically called the Internet. 8 The rapid growth and widespread use of electronic data processing
37、 and electronic business conducted through the Internet, along with numerous occurrences of international terrorism, fueled the need for better methods of protecting the computers and the information they store, process and transmit. The academic disciplines of computer security and information assu
38、rance emerged along with numerous professional organizations all sharing the common glhvboals of ensuring the security and reliability of information systems. editBasic principles editKey concepts The CIA triad (confidentiality, integrity and availability) is one of the core principles of informatio
39、n security.2 There is continuous debate about extending this classic trio.citation needed Other principles such as Accountability3 have sometimes been proposed for addition it has been pointed outcitation needed that issues such as Non-Repudiation do not fit well within the three core concepts, and
40、as regulation of computer systems has increased (particularly amongst the Western nations) Legality is becoming a key consideration for practical security installations.citation needed In 1992 and revised in 2002 the OECDs Guidelines for the Security of Information Systems and Networks4 proposed the
41、 nine generally accepted principles: Awareness, Responsibility, Response, Ethics, Democracy, Risk Assessment, Security Design and Implementation, Security Management, and Reassessment. Building upon those, in 2004 the NISTs Engineering Principles for Information Technology Security5 proposed 33 prin
42、ciples. From each of these derived guidelines and practices. In 2002, Donn Parker proposed an alternative model for the classic CIA triad that he called the six atomic elements of information. The elements areconfidentiality, possession, integrity, authenticity, availability, and utility. The merits
43、 of the Parkerian hexad are a subject of debate amongst security professionals.citation needed editConfidentiality Confidentiality refers to preventing the disclosure of information to unauthorized individuals or systems. For example, a credit card transaction on the Internet requires the credit car
44、d number to be transmitted from the buyer to the merchant and from the merchant to a transaction processing network. The system attempts to enforce confidentiality by encrypting the card number during transmission, by limiting the places where it might appear (in databases, log files, backups, print
45、ed receipts, and so on), and by restricting access to the places where it is stored. If an unauthorized party obtains the card number in any way, a breach of confidentiality has occurred. Confidentiality is necessary (but not sufficient) for maintaining the privacy of the people whose personal infor
46、mation a system holds.citation needed editIntegrity 9 In information security, data integrity means maintaining and assuring the accuracy and consistency of data over its entire life-cycle. 6 This means that data cannot be modified unauthorized or undetected. This is not the same thing as referentia
47、l integrity in databases, although it can be viewed as a special case of Consistency as understood in the classic ACID model of transaction processing. Integrity is violated when a message is actively modified in transit. Information security systems typically provide message integrity in addition t
48、o data confidentiality. editAvailability For any information system to serve its purpose, the information must be available when it is needed. This means that the computing systems used to store and process the information, the security controls used to protect it, and the communication channels use
49、d to access it must be functioning correctly. High availability systems aim to remain available at all times, preventing service disruptions due to power outages, hardware failures, and system upgrades. Ensuring availability also involves preventing denial-of-service attacks. editAuthenticity In computing, e-Business,