1、支持文档排序的检索系统,任务,输入给定1万个文档 一个至少包含两个关键词的查询Q输出Q中所有关键词的df值(不存在时df=0)Q的查询结果(界面输出文档名列表/文档内容)按照相似度从高到低排序输出Top10的文档文档ID;相似度;文档内容要求tf-idf计算权重,consine计算相似度不要求做词条变化如friends - friend等,直接用空格作为分割符都转成小写A-a能支持多次查找文档集保持相同路径(E3files),这样提交作业不需要包含文档集,测试格式,输入:Q=T1 T2输出T1的df,T2的df第一个文档ID,相似度=0.23132第一个文档内容空行第二个文档ID,相似度=0.
2、21第二个文档内容空行。(直到第10个文档),测试格式(数字仅是举例),输入:new york city输出new:2000 york:123 city:323123空行D1750: sim=0.23132Recent trials in New York City proved that all politicans are crooks .An extra tax on politicians seems appropriate and is consistent with this new enlightened policy of disciplinary taxation .空行D2
3、319: sim=0.21More Greenwichers than in the past have bought one or more second homes in New England , Florida , the Caribbean or New York City . The person who used to have two houses now has four , says Carl W. Menk , chairman of Canny , Bowen Inc. , executive recruiters in New York .空行。(直到第10个文档),测试样例,输入1:new york city输入2:I like new york city输入3:aaaaaa city 输入4:aaaaa bbbb输入5:任意输入,