1、毕业论文(设计)外文翻译外文原文QA42WEBBASEDQUESTIONANSWERINGSYSTEMBASEDQUESTIONANSWERINGSYSTEMCS224NFINALPROJECT1INTRODUCTIONQA42,NAMEDINMEMORYOFDOUGLASADAMS8,ISANOPENDOMAINQUESTIONANSWERINGSYSTEMTHATBUILDSUPONANDEXTENDSPRIORWORKIN1BYEXPLOITINGTHEREDUNDANCYOFTHEWORLDWIDEWEB9QA42RETURNSSPECIFICANSWERSTOFACTOIDQUEST
2、IONSRATHERTHANSUMMARIESASISDONEINSEARCHENGINESANDTRADITIONALQUESTIONANSWERINGSYSTEMSTHECURRENTVERSIONOFQA42ISGEAREDTOANSWERONLYQUESTIONSFORPERSONS,ORGANIZATIONS,LOCATIONS,DATES,ANDQUANTITIESADDITIONALTYPESOFQUESTIONSAREPLANNEDFORFUTUREWORKANOUTLINEOFTHEPROCESSISDIAGRAMMEDINFIGURE1THEQUESTIONTOBEANSW
3、EREDISWRITTENINTOONEORMORESEARCHENGINEQUERIES,WHICHARETHENSENTTOTHEGOOGLESEARCHENGINE10SUMMARIESRETURNEDBYGOOGLEARESCOREDAGAINSTANSWERMODELSALSOGENERATEDFROMTHEQUESTIONSIMILARVIABLEANSWERSARECLUSTEREDTOGETHERANDRESCOREDBASEDUPONFREQUENCYQA42PRESENTSTHETHREEANSWERSTHATSCORETHEHIGHESTFORTHEREADERSCONV
4、ENIENCE,EXAMPLESANDSEVERALTABLESAREPRESENTEDATTHEENDOFTHISDOCUMENTWENOWPRESENTTHEDETAILS2QUESTIONPROCESSING21PARSEANALYSISASAFIRSTSTEP,THEQA42QUERYQUESTIONJAVACLASSPARSESTHEQUESTIONUSING2ASTHEPARSERTRAINEDONDATAFROM11FROMTHERESULTANTPARSETREE,ITASCERTAINSTHEPRONOUNTYPE,PRONOUNSUBTYPE,MAINVERB,SUBJEC
5、TNOUNPHRASES,ANDOBJECTNOUNPHRASESTHEPRONOUNTYPEISDEFINEDASWHO/WHOM,WHERE,WHEN,WHY,HOW,WHICH,WHAT,ANDOTHERINGENERAL,THEPRONOUNTYPEISTAKENTOBETHEINTERROGATIVEPRONOUNUSEDINTHEQUESTIONINTHECASEWHEREMORETHANONEINTERROGATIVEPRONOUNAPPEARSINTHEQUESTION,THEOUTERMOSTFROMTHEPARSETREEISUSEDTHEOTHERCATEGORYISUS
6、EDWHENNOSUCHINTERROGATIVEPRONOUNEXISTSINTHEQUESTIONTHISISCOMMONLYTHECASEWHENTHEQUESTIONISWORDEDASANIMPERATIVEANEXAMPLEISNAMETHEDESIGNEROFTHESHOETHATSPAWNEDMILLIONSOFPLASTICIMITATIONS,KNOWNAS“JELLIES”ASSUCH,QA42DOESNOTPERFORMPARTICULARLYWELLONIMPERATIVESTHEFACTTHAT11CONTAINSRELATIVELYFEWIMPERATIVESAL
7、SOIMPACTSTHEEFFECTIVENESSOFANALYZINGSUCHSENTENCESTHEPRONOUNSUBTYPEISDEFINEDASTHESUBORDINATEPHRASEORCLAUSETHATISHEADEDBYTHEINTERROGATIVEPRONOUNTHUS,THEPRONOUNTYPEANDSUBTYPERESPECTIVELYAREWHATANDCOMPANYINTHISEXAMPLEWHATCOMPANYISTHELARGESTJAPANESESHIPBUILDERCLAUSETHATISHEADEDBYTHEINTERROGATIVEPRONOUNTH
8、US,THEPRONOUNTYPEANDSUBTYPERESPECTIVELYAREWHATANDCOMPANYINTHISEXAMPLEWHATCOMPANYISTHELARGESTJAPANESESHIPBUILDERSINCETHEWHATANDWHICHPRONOUNTYPECATEGORIESGIVELITTLECLUEASTOTHEINTENDEDANSWERFORM,QA42CONVERTSTHESECATEGORIESTOEITHERWHEREORWHENINTHECASETHATTHESUBTYPERESPECTIVELYINDICATESALOCATIONORTIMEQA4
9、2CONTAINSAHARDCODEDSETOF42LOCATIONWORDSAND40TIMEWORDSRESPECTIVELY6REPRESENTEDBYTHEQA42WORDLOCATIONLISTANDQA42WORDTIMELISTCLASSESWEUSEMINIMALANALYSISOFPRONOUNTYPESANDSUBTYPESSINCEITISNOTTHEMAJORFOCUSOFOUREXPERIMENTDEEPEREVALUATIONOFTHESENATURALLANGUAGEFEATURESISRESERVEDFORFUTUREWORKTHEMAINVERBISDEFIN
10、EDASTHEWORDHEADINGTHEOUTERMOSTVERBPHRASEIDENTIFIEDBYTHEPARSERTHATISNOTAFORMOFDOORHAVEINTHECASEOFTHESEHELPERVERBS,THETENSEANDNUMBERISNOTEDSOTHATTHEMAINVERBCANBEREPRESENTEDEITHERINITSORIGINALFORMORINAREVISEDFORMTOMATCHTHEHELPERFOREXAMPLE,THEMAINVERBISDIEANDTHEREVISEDMAINVERBISDIEDINTHISQUESTIONWHENDID
11、NIXONDIESIMILARLY,THEMAINVERBISMEANANDTHEREVISEDMAINVERBISMEANSINTHISEXAMPLEWHATDOESELNINOMEANINSPANISHTHEMAINVERBISALSOCONVERTEDTOAPASTPARTICIPLEFORMFORTHEPURPOSEOFFORMINGPASSIVEVOICEPHRASESINQUERYTEMPLATES22THEQA42WORDVERBFORMCONVERTERJAVACLASSMAKESTHESECONVERSIONSITCONTAINSASETOF388PASTPARTICIPLE
12、MAPPINGSAND344TENSEMAPPINGSTOHANDLEIRREGULARVERBSTHELISTOFIRREGULARVERBSWASREDUCEDFROMANEXCEPTIONLISTIN3FORALLOTHERWORDS,QA42APPLIESHEURISTICSBASEDONSTANDARDRULESOFENGLISHTHESUBJECTANDOBJECTNOUNPHRASESAREDEFINEDASANYANDALLNOUNPHRASESIDENTIFIEDBYTHEPARSERTHATRESPECTIVELYCOMEBEFOREORAFTERTHEMAINVERBAS
13、SUCH,ELNINOANDSPANISHARETHESUBJECTANDOBJECTNOUNSPHRASESINTHEEARLIEREXAMPLEWHATDOESELNINOMEANINSPANISHTHEEFFECTOFTHISPROCESSINGISTHATWORDSWITHLITTLESEMANTICVALUE,SUCHASPREPOSITIONS,AREDROPPEDADDITIONALLY,SINCEOURDEFINITIONOFSUBJECTANDOBJECTAREMOREGENERALTHANSTANDARDENGLISH,ITISFREQUENTLYTHECASETHATTH
14、EREAREMULTIPLESUBJECTSAND/OROBJECTTOBEIDENTIFIEDFOREXAMPLE,SINCETHEMAINVERBISASK,BOTHTHEFBIANDAWORDPROCESSORAREOBJECTNOUNPHRASESINTHISQUESTIONWHYDIDDAVIDKORESHASKTHEFBIFORAWORDPROCESSORTHEFACTTHATWEHAVEMULTIPLEPHRASESBECOMESSIGNIFICANTINGENERATINGANSWERMODELS23EACHQUOTEDPHRASEISTAKENASASIMPLENOUNPHR
15、ASEREGARDLESSOFITSPARSINGTHEASSUMPTIONISTHATTHEQUOTEDPHRASEREFERSTOASINGLETITLE,QUOTE,ETCQUOTEDPHRASESAREHANDLEDSPECIALLYINTEMPLATEGENERATION2222QUERYTEMPLATESONCEPARSEANALYSISISCOMPLETE,QA42GENERATESALISTOFONEORMOREQUERYTEMPLATESFROMEACHTEMPLATE,QA42GENERATESASINGLESEARCHENGINEQUERYANDAPOSSIBLYEMPT
16、YSETOFANSWERMODELS23,THELATTEROFWHICHAREMATCHEDAGAINSTSEARCHENGINESUMMARIESTHESETEMPLATESAREGENERALLYREPRESENTEDBYOBJECTSOFTHEQA42QUERYTEMPLATEJAVACLASSEACHTEMPLATEREPRESENTSASEQUENCEOFPHRASESWITHATMOSTONEPRONOUNPOSITIONMARKERTHISMARKERINDICATESTHEPLACEWITHINTHEPHRASESEQUENCEINWHICHTHEANSWEREXPECTED
17、ITISUSEDINGENERATINGANSWERMODELSBUTNOTINTHEFORMATTINGOFSEARCHENGINEQUERIESASATYPICALEXAMPLE,THEQUESTIONWHATSTATEDOESCHARLESROBBREPRESENTRESULTSINTHISTEMPLATEAMONGOTHERSPRONOUN,STATE,REPRESENTED,CHARLESROBBWHENTHISTEMPLATEISFORMATTEDFORTHEGOOGLESEARCHENGINE,THEPRONOUNISDROPPED,SOTHEQUERYREADSASFOLLOW
18、SSTATEREPRESENTEDCHARLESROBBSINCETHEORDEROFWORDSISSIGNIFICANT10,GOOGLEISLIKELYTOMATCHASEARCHENGINESUMMARYSUCHASVIRGINIAISTHESOUTHERNSTATEREPRESENTEDBYCHARLESROBB,ASSUMINGTHATITEXISTSQA42ALWAYSGENERATESONESIMPLEQUERYTEMPLATEFOREACHQUESTION,WHICHISTHEVERBATIMTEXTOFTHEQUESTIONITSELFSINCETHISSPECIALTEMP
19、LATEISNOTTHERESULTOFANALYSIS,ITISNOTBROKENINTOPHRASES,DOESNOTCONTAINAPRONOUN,ANDTHEREFOREISNOTUSEDTOGENERATEANSWERMODELS23ASPECIALQA42QUERYSIMPLETEMPLATEJAVACLASS,WHICHISASUBCLASSOFTEMPLATE,ISUSEDTOHANDLETHISSPECIALCASEDEPENDINGONTHESTRUCTUREOFTHEQUESTIONPARSE,QA42MAYGENERATEASMANYASEIGHTADDITIONALT
20、EMPLATESPERQUESTION,EACHONEWITHAPRONOUNPOSITIONMARKERTABLE7NEARTHEENDOFTHISDOCUMENTSUMMARIZESTHEPOSSIBLEFORMATSOFTHESETEMPLATES,ALONGWITHEXAMPLESOFMATCHINGSENTENCESFROMAPOTENTIALSEARCHENGINESUMMARYSINCEMANYOFTHESEFORMATSDIFFERONLYINTHEPLACEMENTOFTHEPRONOUNPOSITIONMARKER,DUPLICATESEARCHENGINEQUERIESA
21、REDISCARDEDINGENERAL,SEARCHENGINEQUERIESDONOTCONTAINQUOTESORSPECIALOPERATORSTHEEXCEPTIONAPPLIESWHENPHRASESAREQUOTEDINTHEORIGINALQUESTIONINTHISCASE,THESAMEPHRASEISQUOTEDINTHEQUERYUNDERTHEASSUMPTIONTHATITREFERSTOATITLEORQUOTATION23ANSWERMODELSQA42GENERATESSEVERALANSWERMODELSFROMEACHQUERYTEMPLATEOTHERT
22、HANSIMPLETEMPLATESWITHTHEPURPOSEOFPREDICTINGANDSCORINGSEARCHENGINESUMMARIESWITHSOMEEXCEPTION,THESETOFANSWERMODELSISTHESETOFALLPOSSIBLESUBSEQUENCESOFTHEPHRASESFROMTHETEMPLATEFROMWHICHITISGENERATEDTHATISTOSAY,QA42CREATESANANSWERMODELBYINCLUDINGOREXCLUDINGEACHTEMPLATEPHRASETHEEXCEPTIONSARETHATQA42NEVER
23、GENERATESANEMPTYANSWERMODELNORAMODELWITHNOPRONOUNTHESETOFALLPOSSIBLESUBSEQUENCESOFTHEPHRASESFROMTHETEMPLATEFROMWHICHITISGENERATEDTHATISTOSAY,QA42CREATESANANSWERMODELBYINCLUDINGOREXCLUDINGEACHTEMPLATEPHRASETHEEXCEPTIONSARETHATQA42NEVERGENERATESANEMPTYANSWERMODELNORAMODELWITHNOPRONOUNTHEREFORE,THENUMB
24、EROFMODELSMGENERATEDFROMASINGLETEMPLATEWITHNPHRASESISM2N1121SINCEN1REPRESENTSTHENUMBEROFPHRASESOTHERTHANTHEPRONOUNPOSITIONMARKEREACHMODELISRANKEDBYTHERATIOOFTHENUMBEROFPHRASESITCONTAINSTOTHENUMBEROFPHRASESCONTAINEDBYITSTEMPLATETHATIS,FORAMODELWITHNPHRASESANDATEMPLATEWITHNPHRASES,THESPECIFICITYRANKRI
25、SGIVENBYRN/N22FOREXAMPLE,THETEMPLATEPRONOUN,STATE,REPRESENTED,CHARLESROBBRESULTSINTHESEMODELSPRONOUN,STATE,REPRESENTED,CHARLESROBB100PRONOUN,REPRESENTED,CHARLESROBB075PRONOUN,STATE,CHARLESROBB075PRONOUN,STATE,REPRESENTED075PRONOUN,CHARLESROBB050PRONOUN,REPRESENTED050PRONOUN,STATE050WHENREVIEWINGTHIS
26、INTHECONTEXTOFTABLE7,ITISUSEFULTOKEEPINMINDTHATEACHSUBJECTOROBJECTMAYREPRESENTSEVERALPHRASESFOREXAMPLE,INTHEQUESTIONWHYDIDDAVIDKORESHASKTHEFBIFORAWORDPROCESSOR,THEREAREONESUBJECTPHRASE,DAVIDKORESH,ANDTWOOBJECTPHRASES,THEFBIANDAWORDPROCESSORTHEREFORE,FROMTHETEMPLATETHEFBI,AWORDPROCESSOR,ASKEDBY,DAVID
27、KORESH,PRONOUN,15MODELSAREGENERATEDSINCETHEREAREGENERALLYMULTIPLETEMPLATESPERQUESTION,THEREISAPOTENTIALFORDUPLICATEANSWERMODELSCONSEQUENTLY,QA42ELIMINATESTHEDUPLICATESSINCETHENUMBEROFANSWERMODELSISAFUNCTIONOFTHENUMBEROFTEMPLATESANDTHENUMBEROFPHRASESINEACHTEMPLATE,ITCANBETHOUGHTOFASAFUNCTIONOFTHEQUES
28、TIONCOMPLEXITYANDTHELENGTHOFTHESENTENCEASSUCH,SOMEOFTHEQUESTIONSCANHAVEAFAIRLYLARGENUMBEROFMODELSFOREXAMPLE,HOWMUCHDIDMANCHESTERUNITEDSPENDONPLAYERSIN1993RESULTSIN8NONSIMPLEQUERYTEMPLATESAND158ANSWERMODELSDUETOTHECOMPLEXITYINCONTRAST,WHATISTHENAMEOFTHERARENEUROLOGICALDISEASEWITHSYMPTOMSSUCHASINVOLUN
29、TARYMOVEMENTSTICS,SWEARING,ANDINCOHERENTVOCALIZATIONSGRUNTS,SHOUTS,ETCRESULTSIN4094MODELSFROMONLYONETEMPLATE3INFORMATIONRETRIEVAL31SEARCHENGINEMODULEANDGOOGLETHESEARCHENGINEMODULEOFQA42SENDSTHEQUERIESGENERATEDFROMTHEQUERYTEMPLATESTOGOOGLE10THERESULTSRETURNEDFROMTHESEARCHENGINEAREPREPROCESSED32BEFORE
30、BEINGPASSEDTOTHENAMEDENTITYRECOGNITIONMODULE33QA42USESTHEGOOGLESOAPAPITORETRIEVETHEQUERYRESULTSQA42PROCESSESONLYTHEPAGESUMMARIESRETURNEDBYGOOGLEANDNOTTHEREFERENCEDPAGESTHISIMPROVESONTEMPORALPERFORMANCE,SINCELOOKINGATTHESEPAGESWOULDINVOLVESEPARATENETWORKURLREQUESTSANDINTRODUCEABOTTLENECKALSO,UNDERTHE
31、ASSUMPTIONISTHATTHEANSWERAPPEARSCLOSETOTHEQUERYPHRASES,THESUMMARYPROVESTOBESUFFICIENTFOREVERYQUERY,WEREQUESTAMAXIMUMOFTENRESULTSPERQUERYFROMGOOGLE32HTMLPREPROCESSORQA42PREPROCESSESTHESUMMARIESRETURNEDBYGOOGLETOTRANSFORMTHEHTMLINTOAMOREUSABLEFORMATTHEPREPROCESSORSTARTSBYREMOVINGHTMLTAGSSUCHASANDCHARA
32、CTERREFERENCESSUCHAS39,SINCETHESEDATAELEMENTSCARRYLITTLEORNONATURALLANGUAGEINFORMATIONTHEPREPROCESSORALSOINSERTSWHITESPACEBETWEENADJACENTDIGITANDNONDIGITCHARACTERSTOAIDTHENAMEDENTITYRECOGNIZER33INIDENTIFYINGQUANTITIESFORINSTANCE,ONEBIGMACCOSTS24EEKINESTONIA,11364EEKISCONVERTEDTOONEBIGMACCOSTS24EEKIN
33、ESTONIA,11364EEK33NAMEDENTITYRECOGNIZERTHENAMEDENTITYRECOGNIZERNERMODULEANALYZESTHEPREPROCESSEDSEARCHENGINESUMMARIESTOEXTRACTSCANDIDATEANSWERSSPECIFICALLY,THESTANFORDNER4,WHICHWASTRAINEDONTHREECORPORA12,13,14,ISUSEDTOIDENTIFYPERSON,ORGANIZATION,ANDLOCATIONENTITIESOVERTHESUMMARIESFORWHOANDWHEREPRONOU
34、NTYPESQA42ALSOUSESANAUGMENTEDNERFORWHENANDHOWPRONOUNTYPESTHECURRENTVERSIONOFQA42DOESNOTHANDLEOTHERTYPESOFPRONOUNSTHISLOGICISAUGMENTEDFORWHENANDHOWPRONOUNTYPESSINCE4DOESNOTPROVIDESUFFICIENTLYFINEGRAINEDENTITYTYPESINTHECASESQA42USESLOGICBASEDONREGULAREXPRESSIONTOIDENTIFYDATEANDQUANTITYENTITIESTHEAPPRO
35、XIMATINGASSUMPTIONHEREISTHATWHENPRONOUNTYPESUSUALLYREFERTOADATEASOPPOSEDTOATIMEANDHOWTYPESUSUALLYREFERTOAQUANTITYASINHOWMUCHORHOWFARTHEAUGMENTEDNERMODULECONTAINSTHEDATEMATCHERJAVACLASS,WHICHEXTRACTSFROMTHESEARCHENGINESUMMARIESPHRASESWITHPATTERNSTHATINDICATEADATEORYEAREXAMPLESINCLUDE1776,JULY4TH1776,
36、07/04/76,AND4THOFJULYTHISLOGICSEARCHESTHESUMMARYSTRINGFORDATESUSINGREGULAREXPRESSIONS,THENREFORMATSTHESECANDIDATESINTOACOMMONAMERICANFORMATOFJULY4,1776TOSIMPLIFYCLUSTERING42THISLOGICSEARCHESTHESUMMARYSTRINGFORDATESUSINGREGULAREXPRESSIONS,THENREFORMATSTHESECANDIDATESINTOACOMMONAMERICANFORMATOFJULY4,1
37、776TOSIMPLIFYCLUSTERING42QUANTITIESAREEXTRACTEDUSINGASIMPLERULEQA42DEFINESAQUANTITYTOBEANYNUMBERWITHAUNIT,SUCHAS40,000,20METERS,30FT,ETCSEVERALLISTSOFUNITSFORDISTANCE,AREA,MONEY,ETC7AREHARDCODEDINTOQA42TOFACILITATETHISPROCESSTHISMODULEHASROOMFORFURTHERDEVELOPMENTBUTISNOTTHEFOCUSOFOURPROJECTFROMEACHO
38、FTHESENERMODULES,QA42EXTRACTSENTITIESTHATMATCHTHEPRONOUNTYPEASSPECIFIEDINTABLE1FOREXAMPLE,IFTHEQUESTIONHASAWHOPRONOUNTYPE,THENERMODULERETURNSALISTOFALLPERSONSANDORGANIZATIONSFOUNDINTHESEARCHENGINESUMMARIESALLSUCHENTITIESARESENTTOTHESCORINGMODULEASCANDIDATEANSWERS译文QA42基于WEB的答疑系统的问答系统CS224N最终项目1简介QA4
39、2,用道格拉斯亚当斯记忆命名的,是一个通过利用冗余万维网建立在与扩展前的工作的开放域提问答疑系统。QA42给仿真陈述问题返回特定的答案,而不是总结词料库是搜索引擎和传统进行自动问答系统。QA42的最新版本仅仅回答面向对于个人,组织,地点,日期和数量的问题。其他类型的问题计划为以后的工作。纲要的过程由图1来表示的。该要回答的问题是写入一个或多个搜索引擎查询,然后再发送到谷歌搜索引擎10。由谷歌返回的摘要进行评分对回答模式也产生的问题。类似的可行的答案,都聚集在一起,重新评分基于频率。QA42提出的三个答案得分最高的。为了读者的方便,示例和多个表列于本文件的结尾。我们现在提出的细节。2问题处理21
40、解析分析作为第一步,QA42QUERYQUESTION解析JAVA类这个问题作为分析器2从训练数据11。根据所得的解析树,它肯定了的代名词类型,代词亚型,主要动词,名词短语主题,并对象名词短语。代词的类型定义为谁/谁,何地,何时,为什么,怎样,这是什么,和其他。在一般情况下,代词类型是采取的是疑问代词的使用的问题。在的情况下多个疑问代名词出现,在这个问题从最外层解析树被使用。另一类是没有这样的疑问时使用代词中存在的问题。这是通常的情况当这个问题措辞作为当务之急。一个例子是名称的鞋产生数百万的塑料仿制品,如“果冻“之称的设计师。因此,QA42不对不执行紧迫特别好。事实上,11包含相对较少的必要性
41、也影响效益分析这样的句子。代词的亚型的定义是短语或下属这是第为首的疑问代词。因此,代词的类型和子类型分别是什么,公司在这个例子什么是最大的公司日本造船公司由于这是什么,哪些类型类别给予代名词一点线索,以预定的问答形式,QA42转换这些类别时,无论是在何处或案件该亚型分别表示位置或时间。QA42包含一个硬编码的42个方位词集时间分别为40字6代表在QA42WORDLOCATIONLIST和QA42WORDTIMELIST类。我们用最小的代名词类型和亚型分析因为这不是我们的实验重点。更深这些自然语言功能的评价是预留今后的工作。主要动词是指最外层的标题字短语动词由分析器是不是做的形式确定或有。在这些
42、辅助动词的情况下,紧张,数字指出,使主要动词可以表示无论是在其原来的版本或经修订的表格,以配合帮手。例如,模具的主要动词是与修订主要动词是死在这个问题尼克松是什么时候死的同样,主要是指动词和修订主要动词是指在这个例子这是什么意思在厄尔尼诺西班牙语吗主要动词也转换为过去分词形式在查询中形成的目的,被动语态的短语模板(22)。JAVA类的QA42WORDVERBFORMCONVERTER使这些转换。它包含了一套388过去分词映射和344紧张的映射处理不规则动词。不规则动词的名单是从异常减少名单3。对于所有其他的话,QA42适用于基于启发式对英语的标准规则。主体与客体名词短语的定义是任何所有名词的短
43、语分析器,分别确定来之前或之后的主要动词。因此,厄尔尼诺和西班牙是主体与客体的名词短语前面的例子什么是厄尔尼诺在西班牙语的意思这种处理处理的效果是不大的话语义值,如介词,被丢弃。此外,因为我们的主体和对象的定义是超过标准的英语一般情况下,经常出现的情况有多个学科和/或对象被确定。例如,由于主要动词的要求,无论美国联邦调查局和一个文字处理器是对象名词短语这样的问题为什么大卫考雷什要求联邦调查局一个文字处理器事实上,我们有多个词组在回答模型不变得很有意思(23)。22查询模块一旦解析分析完成后,QA42生成一个列表一个或多个查询模板。从每个模板,QA42生成一个单一的搜索引擎的查询和(可能为空)的
44、回答模型集(23),后者这是比对搜索引擎的摘要。这些模板是一般的对象表示QA42QUERYTEMPLATEJAVA类。每个模板代表了最多的短语与序列一代词位置标记。此标记指示在短语顺序在何地回答预期。这是用在生成模型,但是没有回答在搜索引擎的查询格式。作为一个典型的例子,这个问题没有什么状态查尔斯罗布代表什么结果本模板(等等)代词,州,代表查尔斯罗布。当这模板的格式为谷歌搜索引擎,代词被丢弃,所以查询内容如下国家代表查尔斯罗布。由于词语的顺序是显着10,谷歌有可能匹配的搜索引擎如弗吉尼亚州南部摘要由查尔斯罗布表示,假设它的存在。QA42总是生成一个简单的查询模板,每个的问题,这是问题本身逐字文
45、本。由于这个特殊的模板不是分析的结果,它是不破成短语,不包含的代名词,并因此,不用于生成回答模型(23)。一特别QA42QUERYSIMPLETEMPLATEJAVA类,它是一个子类的模板,用于处理这种特殊情况。根据问题的解析结构,QA42可能产生多达八个额外每个问题的模板,每一个位置标记的代名词之一。表7(近本文件)结束总结了可能的格式这些模板,以及配套的句子例子从一个潜在的搜索引擎的摘要。由于许多这些格式的差别仅在安置的代名词位置标记,重复的搜索引擎查询丢弃。一般来说,搜索引擎查询不包含引号或特种作业。唯一的例外适用于当词组报告引述了原来的问题。在这种情况下,同样短语查询中引用的假设下,它
46、是指一个标题或报价。23答案模型QA42从每个查询生成几个回答模型模板(不是简单的其他模板)与目的预测和评分搜索引擎摘要。对于某些例外,答题的设置模式,是集从模板中的所有可能的子序列词组从它产生的。这就是说,QA42创建回答包括或不包括每个模板模型短语。例外的是,从来没有产生QA42回答模式,也不是空的,没有代名词模型。因此,数字M的模型产生一个单一的与N短语模板M2N1121由于N1代表人数的短语以外的其他代词的位置标记。每个模型是排名由短语人数的比例它包含了数载的词组其模板。也就是说,对于一个有N模型短语和一个与N短语模板,特异性秩为R为RN/N22例如,模板的代名词,国家为代表,查尔斯罗
47、布在这些模型的结果代词,州,代表,查尔斯罗布100代词,代表,查尔斯罗布075代词,州,查尔斯罗布075代词,州,代表075代词,查尔斯罗布050代词,代表050代词,州050在审查中的表7而言,如果它是有益的记住,每一个主题或对象可能代表几个词组。例如,在这个问题为什么大卫考雷什要求一个字处理器的联邦调查局,有一个主题词组,大卫考雷什,短语和两个对象,美国联邦调查局和一个文字处理器。因此,从模板联邦调查局,一个字处理器,问,大卫考雷什,代词,15个型号产生。由于通常有多个模板每个问题,有一个重复的回答模型的潜力。因此,消除了重复QA42。由于车型数量的答案是函数数模板和短语在每个号码模板,它
48、可以被看作是一个问题的功能复杂性和句子的长度。因此,一些这些问题可以有相当多的模型。例如,曼联在1993年花费在球员上有多少在8个非简单的查询结果模板和158车型由于回答的复杂性。与此相反,什么是罕见的神经系统疾病症状的名称不自主运动(抽搐),说脏话,和不连贯的发声(呼噜声,叫喊声,等等)结果在4094只从一个模板模型。3信息检索31搜索引擎模块和谷歌搜索引擎的QA42模块发送查询从查询生成的模板,以谷歌10。该返回结果从搜索引擎进行预处理(32),然后传递到命名实体识别模块(33)。QA42使用谷歌的SOAPAPI来检索查询的结果。QA42仅处理返回的页面摘要由谷歌,而不是引用的网页。这提高
49、了对时间的表现,因为在寻找将这些网页涉及到不同的网络URL请求,并采用瓶颈。此外,就是在假定的答案看来贴近查询词组,总结证明就足够了。对于每一个查询,我们要求最高从谷歌查询每十个结果。32HTML的预处理QA42预处理摘要返回由谷歌转变成一个更实用的HTML格式。该预处理开始拆除例如HTML标签如和字符引用,因为这些数据元素进行很少或根本没有自然语言的信息。预处理器之间还插入空白相邻的数字和非数字字符命名的援助实体识别(33)在确定的数量。对于例如,一个巨无霸成本24EEK在爱沙尼亚,11364EEK转换为一个大的巨无霸的售价在24克朗爱沙尼亚,11364克朗。33命名实体识别命名实体识别(净入学率)模块分析预处理搜索引擎摘要候选人的摘录答案。具体来说,斯坦福大学的净入学率4,这是三语料12,13,14,训练是用来识别在个人,组织和实体的位置总结其中的代名词谁和类型。QA42还使用一个时间和方式的代名词类型增加净入学率。当前版本的QA42不处理其它类型代词。这种逻辑是何时以及如何增强代名词类型自4并没有提供足够的细粒度实体类型的案件。QA42使用常规的逻辑表达确定的日期和数量的实体。该近似的假设是,当代词类型通常指一个日期(而不是一个时间),以及如何类型通常指的是数量(如多少多远)。扩编的净入学率模块包含DATEMATCHERJAVA类