1、 外文翻译 Semantic Advertising for Web 3.0 Material Source: Department of Computer Science Author: Edward Thomas University of Aberdeen Aberdeen, Scotland Abstract: Advertising on the World Wide Web is based around automatically matching web pages with appropriate advertisements, in the form of banner a
2、ds, interactive adverts, or text links. Traditionally this has been done by manual classification of pages, or more recently using information retrieval techniques to find the most important keywords from the page, and match these to keywords being used by adverts. In this paper, we propose a new mo
3、del for online advertising, based around light weight embedded semantics. This will improve the relevancy of adverts on the World Wide Web and help to kick-start the use of RDFa as a mechanism for adding lightweight semantic attributes to the Web. Furthermore, we propose a system architecture for th
4、e proposed new model, based on our scalable ontology reasoning infrastructure TrOWL. 1 Introduction Advertising is the main economic force which drives the development of the World Wide Web. According to a report by Price Water House Coopers advertising revenues totaled $6.1 billion for the fourth q
5、uarter of 2008, an increase over the previous year even during an economic recession. Of this, banner advertising accounted for the second largest piece of this revenue, following search revenue with 21 percent of the total market. Search revenue typically uses the keywords entered by the user to ma
6、tch against keywords which have been purchased by an advertiser. This is a strict match - advertisers who wish to cover synonyms or hyponyms of a particular keyword will purchase additional keywords. Since advertisers only pay per impression, or per click, there is no penalty to covering wide ranges
7、 of keywords. The simple matching of keywords entered by a user, and keywords purchased by an advertiser makes it easy to understand, and hence a popular route for advertisers. Matching banner adverts to web pages is a harder problem. In this case, the entire content of the web site, and the context
8、 of the web site which holds it must be taken into account. Some systems, such as Google Ad Sense, attempt to extract the most important (by some information retrieval metric) keywords from the page, and match these to keywords selected by an advertiser 4. There is a very low cost of entry to system
9、s like this, and publishing or advertising on these networks is a trivial pay-as-you go process. Other systems, such as those used by Double Click1, work closely with a publisher to classify their web site according to a tax on my of content, and may embed custom tags or keywords into the page itsel
10、f to improve the matching process. There is a large cost of entry to advertising or publishing in this way, Double Click and similar networks only take on web sites with a certain minimum number of ad impressions per month. The relationship among publishers, advertising agencies, and advertisers, is
11、 much closer than the relationships found in traditional media with a large degree of human involvement in the design and deployment of advertising campaigns. This relationship is costly in terms of man-hours, and requires a large number of ad impressions to make it viable. This paper outlines a thi
12、rd alternative. By using lightweight semantics on a web page, and RDF descriptions of adverts (and more importantly, what web sites should a particular advert appear on), combined with some existing semantic web technologies, we can produce an open market for online advertising which offers automati
13、c targeting, with more accurate targeting, combined with the zero cost of entry which keyword based advertising currently operates. In this paper, we will first discuss the technical motivations of our approach, before proposing a new model for online advertising, based on lightweight embedded seman
14、tics. Furthermore, we propose a system architecture for the new model, based on our scalable semantic reasoning infrastructure TrOWL. Finally we will present two case studies on Semantic Advertising, and conclude the paper with a discussion of areas of future work. 2 Approaches Traditional approache
15、s such as strict keyword matching are quite limited in a sense that it can not disambiguate the keywords in different context. Also, the synonyms or hyponyms have to be manually specified by the advert providers but not automatically derived. Other approaches such as the one from Double Click requir
16、e large amount of work for both the advert provider and advertising agency to classify the websites content and fit it into a pre-defined taxonomy. This is inconvenient for own of small web sites. Furthermore, when the web page is automatically generated in real-time it is difficult to apply such an
17、 approach. Our approach attempts to provide a more accurate and easy-to-use matching between web pages and adverts by making use of semantics embedded in both. This can, on the one hand, enable the web developers and advert providers describing their documents (web pages and adverts) and requirement
18、s in a intuitive and flexible manner, and on the other hand, make use of existing semantic web resources such as ontologies, thesaurus and reasoners to discover the relationsin between. The advert providers no longer need to worry about issues such as the synonyms because they will be inferred autom
19、atically with the help of upper level categorisation ontologies; while the web owners no longer need to classify their web pages one by one because the embedded semantics tells everything. This approach includes two major aspects: (1) the automatic reasoning in matching and (2) the manual or automat
20、ic annotation of the documents. Like any other web-based application, a crucial technical feature of this service is efficiency. Neither the web publisher, nor the advertising provider would like to an advert matching delays the rendering of the web page. In the semantic web context, the efficiency
21、of a reasoning-related service is strongly restricted by the language used to describe the semantics. Currently, the de facto semantic web languages recommended by W3C are RDF, RDF Schema, OWL and their dialects. OWL family is based on well-defined and understood description logics (DLs) with many m
22、ature tool supporting. However, many OWL dialects, such as OWL DL and OWL2 DL, are expensive in reasoning. RDF, on the other hand, is widely applied in web data exchange and integration; however, they have limited expressive power. One solution is to use TrOWL, which provide scalable reasoning and q
23、uery answering services for not only RDF-DL, OWL2-QL (as well as other OWL2 profiles2, including OWL2-EL and OWL2-RL), but also expressive ontology languages such as OWL DL and OWL2-DL (based on Quality guaranteed approximation-based reasoning ). As for the annotation aspect, one can create an RDF d
24、ocument and render it as in HTML through transformation techniques such as XSLT. For web developers, it will be more convenient to embed RDF data into normal web pages and further validate them w.r.t. its schema. RDFa, an application of RDF bridges the gap between web page composing language such as
25、 XHTML and RDF. It can express structured data such as RDF in any markup language by specifying attributes of web page elements. In this paper, we use RDFa to annotate the documents and to enhance them with lightweight semantics in RDF. 3 How Does System Work In this section, we propose the system a
26、rchitecture for semantic advertising and show you how does it works. The semantics embedded on the page will be converted into RDF graphs, and the constraints given by the advertisers will be rewritten as SPARQL queries. By running each query against the repository of graphs extracted from content,
27、we can produce a map of the best advertising for each webpage. We propose that the advertising system performs the matching process at the point when new content or new adverts are added to the system. This can then be stored in a cache to improve performance on repeated matching. When a user reques
28、ts an advert for a particular page, the system can consult the map of appropriate adverts and select the most lucrative. Additional techniques could, for example, ensure that a user does not see the same advert on the same site too many times, but this is outside the scope of this paper. 4 Conclusio
29、ns and Future Work In this paper we have outlined a vision for publishing advertising specifications and matching these to semantically enabled web pages. We see this as a general approach that can work across a number of different domains without changing the underlying method. There are some issue
30、s still to resolve before this can be realized on a large scale. The first and most difficult problem is that embedded semantics are currently not widely used on commercial web sites. RDFa is a new format which is not greatly understood, and also there is no compelling application for these semantic
31、s which would encourage large publishers to add them to their web sites. Our hope is that by giving a financial incentive for web sites to deploy RDFa, by improving the matching of advertisements to web pages, we may help to bootstrap these new technologies into the mainstream. The second issue occu
32、rs on highly dynamic web pages, where the content is different for every user. The cost of performing the extraction of RDFa, RDFS reasoning, and matching this to the most suitable advert would make this method prohibitive for these web sites. There is some research being made into methods for appro
33、ximate matching and querying. 译文 Web3.0 的语义广告 资料来源:苏格兰亚伯丁计算机科技大学 作者:爱德华 托马斯 摘要:万维网的广告 是将广告与适当的网页自动匹配,主要形式有旗帜广告、互动广告或文本链接。传统的形式是手工分类来匹配 ,或是使用最近信息检索技术 ,从页面上寻找最重要的关键字 ,并将这些关键字与广告中的关键字进行比对。在本文中 ,我们提出一个新的模型基于语义的在线嵌入式广告。这将提高万维网上广告的相关性,有助于将语义作为一种机制并把它归属为网络的一种属性。此外,我们还提出了一种新的模型 web3.0的语义广告。 1简介 广告的是推动万维网发展的
34、主要经济力量。 Price Water House Coopers 调查报告显示, 2008年第 四季度广告收入总计为 61亿美元。即使是在经济衰退时期相对于以前也是增长的。因此 ,旗帜广告占第二大的一块收入,调查结果显示占整个市场的 21%。搜索引擎的收入一般通过广告购买客户购买关键词得到的。这是一个激烈的广告商比赛,如果你想关键字能包括同类的或者相关下位的产品,你就得购买长尾关键字。因为广告商只支付每次浏览 ,或是每点击 ,没有涵盖大范围关键词,用户购买的是简单匹配的关键字。 由用户输入的关键字,以及广告客户购买的关键字简单的匹配可以很容易地理解,因此成为了一个受广告商欢迎的路线。匹配到网
35、页上的横幅广告是一个更 难的问题。在这种情况下,我们应该考虑到网站的内容以及它所处的环境。现在有这样一个随收随付系统,并且在这个网络上发布广告的成本非常低。比如被 Double Click使用的,与出版商密切合作的一些系统,以税收的内容、嵌入到网也本身的自定义标签或关键字来对网站进行分类来提高匹配度。以广告或者出版的方式会花费很大的成本, Double Click和类似的网络只需要在网站上具有一定数量的广告展示。出版商,广告公司和广告商的关系,是远远高于传统媒体找到了一个在设计和广告活动的部署大量人工参与程度的关系更密切。这种关系是在 工时方面代价高昂,而且需要大量可行的广告展示。 本文概述了
36、第三个选择。使用轻量级语义网页和 RDF描述广告 (更重要的是 , 广告出现在那些网站某一特定地点 ),结合现有的语义及其他的网络技术 , 现有的零成本关键字的基础广告,我们可以生产出自动瞄准 ,更准确的在线广告。 在本文中 ,在提出一种基于语义网的在线广告全新的模型之前,我们将首先讨论技术动机。此外 ,根据我们的 TrOWL可扩展的语义推理得出一种系统新的体系结构模型,并且总结说明本文未来的讨论领域。 2 研究 传统的方法 ,如严格的关键字匹配受关键字在在不同的上下文 之间存在歧义的限制。同时 ,同义词或下位词广告供应商不会自动推导,必须由人工所指定的方法才能解决。其他的方法 ,如一个来 D
37、ouble Click的点击需要广告机构和广告代理公司大量的工作,将网站的内容放到一个预先设定的分类中。这对我们自己开设的网站来说非常的不方便。此外 ,当您的网页实时自动产生的就很难应用这种方法。我们试图提供了更加准确、容易地使用嵌入的方式来匹配网页和广告之间的语义的方法。一方面,这样可以直观和灵活让网络开发者和广告提供商描述他们的文件(网页和广告),另一方面利用语义网发现本体与同义词之间的关 系。因本体分级系统会自动的帮他们匹配,广告宣传供应商不用再担心同义词的问题。同时,站长们也不用一个个地对网页进行分类,这些都可以通过语义网来实现。 这个研究包括两大方面 :(1)自动推理匹配 (2)手动
38、或自动注释的文件。像提供关键性技术服务的其他网络应用程序一样。网站发布商和广告供应商都不希望因为一个广告匹配延迟网页网页的发布。在语义网页中,服务的效率严格受用来描述语义的语言限制。目前,现实意义上的语义就是 W3C开发的 RFD格式的网络本体语言,许多应用成熟的软件都支持网络本体语言的语言描述,但是,很多网络本体语 言的推广代价却是巨大的,如 OWL2 DL和 OWL2DL。另一方面, RFD格式的网络本体语言广泛应用在数据交换和集成领域中,但是它所应用的领域是有限制的。唯一的解决方式就是采用 TrOWL,提供可扩展的推理和查询回答服务 ,不仅 RDF-DL,OWL2 QL各项目功能显而易见
39、 (以及其他 OWL2 profiles2,包括OWL2-EL和 OWL2-RL),而且也表达本体语言 OWLDL和 OWL2-DL(语义的推理质量也有了良好的保证 )。在注释方面,一个人能够创造 RFD文件,并且可以把它像 XSLT一样在网页中进行转化。 对于网开发者 ,它将更方便 RDF数据正常嵌入网页并进一步验证它们 w.r.t.。 RDFa是 XHTML和 RDF沟通的桥梁,它无法用语言来表达结构化数据,如 RDF标记在任何语言时是通过指定网页元素的属性。在本文中 ,我们利用 RDFa诠释文件, 在 RDF中我们利用语义网加强转化。 3 实现的方法 在这一节中 ,我们提出了该系统的体系
40、结构为语义广告并介绍其是如何工作。语义嵌入到页面中将会转化为 RDF图表 ,广告商的要求将被改写成查询的语句。每个询问跑向数据库中提取图的内容 ,就可以为每个网页制作出一幅最佳的广告。我们建议对广告 系统进行匹配过程的内容或新的广告点被添加到系统。这可以被存储在高速缓冲区来提高性能 ,重复的匹配。当一个用户要求做某一特定的广告页 ,系统可以参考一下适当的广告,选择最有利的发布方式。额外的技术 ,例如 ,确保用户不看到同样的广告在同一地点太多次了 ,但这已经超出了本文的范围。 4 总结 和展望 本文阐述了如何让刊登的广告在网页中与语义匹配,我们把他看做是在不同领域中应用的一般方法,而不改变原有的基础方式。在实现这种方法是还需要解决一些问题。第一步 ,也是最困难的问题是 , RDFa是一种新型的格式,语义现在还没办法让大多 数人信服,嵌入式语义是目前也没有鼓励大出版商将它们添加到他们的网站,在商业网站得不到广泛地使用。我希望透过财政激励的方式在网页中部署 RDFa,引导这些新技术成为主流。第二个问题是存在为不同客户服务的高动态网页的内容, RDFa执行、 RDFS推理需要额外成本,匹配最合适的广告使得这个方法在网站中被禁用。现在正在研究一套近似匹配和查询的方法。