RegularExpressions.ppt

上传人:ga****84 文档编号:456479 上传时间:2018-10-09 格式:PPT 页数:23 大小:115KB
下载 相关 举报
RegularExpressions.ppt_第1页
第1页 / 共23页
RegularExpressions.ppt_第2页
第2页 / 共23页
RegularExpressions.ppt_第3页
第3页 / 共23页
RegularExpressions.ppt_第4页
第4页 / 共23页
RegularExpressions.ppt_第5页
第5页 / 共23页
点击查看更多>>
资源描述

1、Scanner,中正理工學院電算中心副教授許良全,Compiler Design,Overview of Scanning,The purpose of a scanner is to group input characters into tokens.A scanner is sometimes called a lexical analyzerA precise definition of tokens is necessary to ensure that lexical rules are properly enforced.Scanners normally seek to mak

2、e a token as long as possible. E.g. ABC is scanned as one identifier rather than threeAll scanners perform much the same functionusing scanner generator is to limit the effort in building a scanner from scratch,Compiler Design,Finite State Systems,The finite state automaton is a mathematical model o

3、f a system, with discrete input and outputs,Compiler Design,Examples of Finite State Systems,Elevatorsdo not remember all previous requests for service but only the current floor, the direction of motion, and the collection of not yet satisfied requests for serviceVending machinesinsert enough coins

4、 and youll get a Pepsi eventuallyComputersthe state of the CPU, main memory, and auxiliary storage at any time is one of a very large but finite number of statesHuman brains235 cells or neurons at most,Compiler Design,Definition of Finite Automata,A finite automaton (FA) is an idealized 5-tuple comp

5、uter that recognizes strings belonging to regular sets. (Q,q0,F)A finite set of states, QA finite input alphabet, , or vocabulary, V.A special start, or initial state, q0. q0Q.A set of final, or accepting states, F. FQ.A transition function, , that maps QF to Q.,Compiler Design,FA and Transition Dia

6、grams,Compiler Design,FA and Transition Tables,Compiler Design,Regular Expressions,The languages accepted by finite automata are easily described by simple expressions called regular expressions.Strings are built from characters in V via catenatione.g., !=, for, whileAn empty or null string, denoted

7、 by , is allowedThe characters, (, ), , *, +, and | are called meta-characters. They must be be quoted when used in order to avoid ambiguity. E.g.Delim = (|)|:=|;|,|+|-|*|/|=|$),Compiler Design,Definition of Regular Expression,A regular expression denotes a set of strings: is a regular expression de

8、noting the empty set (the set containing no strings). is a regular expression denoting the set that contains only the empty string.Note that this set contains one element.A string s is a regular expression denoting a set containing only s. If s contains meta-characters, s can be quoted to avoid ambi

9、guity.If A and B are regular expressions, then A|B, AB, and A* are also regular expressions, corresponding to alternation, catenation, and Kleene closure respectively.,Compiler Design,Properties of Regular Expressions,Let P and Q be a set of stringsThe string s (P|Q) iff s P or s QThe string s P* if

10、f s can be broken into zero or more pieces: s = s1s2s3sn such that each si P.P+ denotes all strings consisting one or more strings in P catenated togetherP* = (P+|) and P+ = PP* = P*PIf A is a set of characters, Not(A) denotes (V-A)all characters in V not included in A.If k is a constant, the set Ak

11、 represents all strings formed by catenating k strings from A, i.e., Ak = (AAA) (k copies),Compiler Design,Examples of Regular Expressions,Let D = (0|9), L = (A|Z)A comment that begins with - and ends with EolComment = -Not(Eol)*EolA fixed decimal literalLit = D+.D+An identifier, composed of letters

12、, digits, and underscores, that begins with a letter, ends with a letter or digit, and contains no consecutive underscoresID = L(L|D)*(_(L|D)+)*,Compiler Design,Using a Scanner Generator: Lex,Lex is a lexical analyzer generator developed by Lesk and Schmidt of AT&T Bell Lab, written in C, running un

13、der UNIX.Lex produces an entire scanner module that can be compiled and linked with other compiler modules.Lex associates regular expressions with arbitrary code fragments. When an expression is matched, the code segment is executed.A typical lex program contains three sections separated by % delimi

14、ters.,Compiler Design,First Section of Lex,The first section define character classes and auxiliary regular expression. (Fig. 3.5 on p. 67) delimits character classes- denotes ranges: xyz = = x-z denotes the escape character: as in C. complements a character class, (Not): xy denotes all characters e

15、xcept x and y.|, *, and + (alternation, Kleene closure, and positive closure) are provided.() can be used to control grouping of subexpressions.(expr)? = = (expr)|, i.e. matches Expr zero times or once. signals the macroexpansion of a symbol defined in the first section.,Compiler Design,First Sectio

16、n of Lex, cont.,Catenation is specified by the juxtaposition of two expressions; no explicit operator is used.abcd will match any of ad, ac, bc, and bd.begin = = “begin” = = begin,Compiler Design,Second Section of Lex,The second section of lex defines a table of regular expressions and corresponding

17、 commands.When an expression is matched, its associated command is executed.Auxiliary functions may be defined in the third section.Input that is matched is stored in the string variable yytext whose length is yyleng.Lex creates an integer function yylex() that may be called from the parser. The val

18、ue returned is usually the token code of the token scanned by Lex.When yylex() encounters end of file, it calls a use-supplied integer function named yywrap() to wrap up input processing.,Compiler Design,Dealing with Multiple Input Files,yylex() uses three user-defined functions to handle character

19、I/O:input(): retrieve a single character, 0 on EOFoutput(c): write a single character to the outputunput(c): put a single character back on the input to be re-read,Compiler Design,Translating Regular Expressions into Finite Automata,Remember the relationship between RE and FA.The main job of a scann

20、er generator program is to transform a regular expression definition into an equivalent (D)FA.A regular expression is first translated into a nondeterministic finite automaton (NFA), then translated from NFA into DFA. (2 steps)An NFA, when reading a particular input is not required to make a unique

21、(deterministic) choice of which state to visit.,Compiler Design,Translating RE into NFA,Any regular expression can be transformed into an NFA with the following properties:There is a unique final stateThe final state has no successorsEvery other state has either one or two successorsRegular expressi

22、ons are built out of the atomic regular expressions a (where a is a character in V) and by using the three operations AB, A|B, and A*.,Compiler Design,NFA for a and l,Compiler Design,An NFA for A|B,Compiler Design,An NFA for A B,Compiler Design,An NFA for A*,Compiler Design,Translating NFA into DFA,

23、Each state of DFA (M) corresponds to a set of states of NFA (N)transforming N to M is done by subset constructionM will be in state x,y,z after reading a given input string if and only if N could be in any of the states x, y, or z, depending on the transitions it chooses.M keeps track of all the possible routes N might take and runs them in parallel.,

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 学术论文资料库 > 毕业论文

Copyright © 2018-2021 Wenke99.com All rights reserved

工信部备案号浙ICP备20026746号-2  

公安局备案号:浙公网安备33038302330469号

本站为C2C交文档易平台,即用户上传的文档直接卖给下载用户,本站只是网络服务中间平台,所有原创文档下载所得归上传人所有,若您发现上传作品侵犯了您的权利,请立刻联系网站客服并提供证据,平台将在3个工作日内予以改正。