Lecture 04: Lexical Analysis – Regular Expressions

 0    58 flashcards    up804653
mp3 indir Baskı oynamak kendini kontrol et
 
soru English cevap English
What are the components of a language
öğrenmeye başla
Lexical Structures, Syntax, Semantics
What is meant by lexical Structures
öğrenmeye başla
Its lexical structures, which concerns the forms of its individual symbols (e.g.:=), keywords, identifiers etc.
What is meant by syntax
öğrenmeye başla
Its syntax, which define the structure of the components of the language, e.g., the structures of programs, statements (e.g., assignment), expression, terms etc.
What is meant by semantics
öğrenmeye başla
Its semantics, which define the meanings and usage of structures and requirements that cannot be describe by a grammar.
Language analysis constist of what 2 parts?
öğrenmeye başla
[1] A low-level part called a lexical analyzer (mathematically, a finite automaton based on a regular grammar) [2] A high-level part called a syntax analyzer, or parser (mathematically, a push-down automaton based on a context-free grammar)
What is lexical Analysis
öğrenmeye başla
A lexical analyser (scanner) reads the source program, a character at a time, and outputs tokens to the next phase of the compiler (parser)
What does a lexical anayser do?
öğrenmeye başla
[1] It identifies substrings of the source program that belong together - lexemes [2] Lexemes match character patterns, which are associated with a lexical category called a token [3] sum is a lexeme; its token may be IDENT
What is RE short for?
öğrenmeye başla
regular expressions
What is DFA short for?
öğrenmeye başla
deterministic finite automata
Name a use for DFAs
öğrenmeye başla
deterministic finite automata (DFAs), which can be used to implement a pattern matching process;
What is the definition of Alphabet
öğrenmeye başla
[1] An alphabet Σ is a finite non-empty set(of symbols) eg:[2] set Σab={a, b} is an alphabet comprising symbols a and b; [3] the set Σaz = {a, ..., z} is the alphabet of lowercase English letters; [4] the set Σasc of all ASCII characters is an alphabet.
What is the definition of Strings
öğrenmeye başla
[1] A string or word over an alphabet Σ is a finite concatenation (or juxtaposition) of symbols from Σ. [2] abba, aaa and baaaa are strings over Σab; [3] hello, abacab, and baaaa are strings over Σaz; [4] h$(e′lo, PjM#;, and baaaa are strings over Σasc.
How are the lengths of Strings denoted?
öğrenmeye başla
[1] The length of a string w (that is, the number of symbols it has) is denoted |w|. E.g., |abba| = 4. [2] The empty or null string is denoted ε, and so |ε| = 0.
What isThe set of all strings over Σ
öğrenmeye başla
Σ∗ab = {ε, a, b, aa, ab, ba, bb, aab,...}.
How is the concatenation of a string denoted?
öğrenmeye başla
For any symbol or string x, xn denotes the string of the concatenation of n copies of x. E.g. a4 = aaaa (ab)4 = abababab
Regular expressions s__ p___ of strings of symbols
öğrenmeye başla
Regular expressions specify patterns of strings of symbols
Regular expressions ___ ____ of strings of ___
öğrenmeye başla
Regular expressions specify patterns of strings of symbols
How is the set of Strings matched by RE r denoted?
öğrenmeye başla
The set of strings matched by a RE r is denoted L(r) ⊆ Σ∗ (all the strings over the alphabet Σ) and is called the language determined or generated by r
When does a regular expression match a set of strings?
öğrenmeye başla
We say a regular expression r matches (or is matched by) a set of strings if the patterns of the strings are those specified by the regular expression.
What is the regular expression ∅
öğrenmeye başla
∅ (the symbol for empty-set or empty language, i.e., the set contains nothing) is a regular expression. This RE matches no strings at all
what is the regualr expression ε
öğrenmeye başla
ε (the empty string symbol) is a regular expression. This matches just the empty string ε.
what does this mean "Each symbol c ∈ Σ in the alphabet Σ i".
öğrenmeye başla
This RE matches the string consisting of just the symbol c.
explain in words Σ = {a, b}
öğrenmeye başla
[1] a which matches the string a; and [2] b which matches the string b, [3] Both symbols a and b are REs.
If r and s are regular expressions is r | s a regualr expression?
öğrenmeye başla
yes
what are the otehr wars to write r | s
öğrenmeye başla
r + s, and read “r or s”)
explain in words a | b
öğrenmeye başla
Regular expression a | b matches the strings a or b.
If r and s are regular expressions waht is the concatenation of r and s?
öğrenmeye başla
rs(read “r followed by s”)
can brackets be used in regular expressions?
öğrenmeye başla
yes, As with arithmetic expressions, parentheses can be used in REs to make the meaning of a regular expression clear.
is r* a regular expression?
öğrenmeye başla
yes (read “zero or more instances of r”)
what is r* in words
öğrenmeye başla
r∗ (read “zero or more instances of r”) is a RE. This matches all finite (possibly empty) concatenations of strings matched by r.
what is rr*
öğrenmeye başla
rr∗ (one or more copies of strings matched by r)
write rr* another way
öğrenmeye başla
r^+ (the ^ is her only for computer on paper its just r to the plus)
What does the Regular expression a∗ matches
öğrenmeye başla
strings ε, a, aa, aaa,...
What does RE (ab)∗ match
öğrenmeye başla
the strings ε, ab, abab,...
what does RE (a|bb)∗ match
öğrenmeye başla
the strings ε, a, bb, abb, bba, abba,...
what does RE (a|b)∗aab match
öğrenmeye başla
any string ending with aab
What does RE (a|b)∗baa(a|b)∗ match
öğrenmeye başla
any string containing the substring bba.
What is the precedence order for regular expressions
öğrenmeye başla
[1]() has the highest precedence; [2] then ∗ (or +); [3] then concatenation; and [4] | has lowest precedence.
how are REs used for Lexical Analysis
öğrenmeye başla
Regular expressions provide us with a way to describe the patterns of a programming language.
assuming the alphabet is Σasc (the set of all ASCII characters) what is a typical programs pattern(regular expression) for an IF;
öğrenmeye başla
if for a token IF;
assuming the alphabet is Σasc (the set of all ASCII characters) what is a typical programs pattern(regular expression) for a;
öğrenmeye başla
; for a token SEMICOLON;
assuming the alphabet is Σasc (the set of all ASCII characters) what is a typical programs pattern(regular expression) for (0|1|2|3|4|5|6|7|8|9)+
öğrenmeye başla
(0|1|2|3|4|5|6|7|8|9)+ for a token NUMBER;
assuming the alphabet is Σasc (the set of all ASCII characters) what is a typical programs pattern(regular expression) for (a|...|z|A|···|Z)(_|a|...|z|A|···|Z|0|···|9)∗
öğrenmeye başla
(a|...|z|A|···|Z)(_|a|...|z|A|···|Z|0|···|9)∗ for a token IDENT.
We can give REs ____ which make REs more easy to read and write, and can be used to define other regular definitions.
öğrenmeye başla
We can give REs names, which make REs more easy to read and write, and can be used to define other regular definitions.
what are some examples of named regualr expressions
öğrenmeye başla
[1] letter = A | B |···|Z | a | b |···| z [2] digit = 0 | 1 | ··· | 9 [3] ident=letter (_| letter | digit)∗
what is the formal lexical definition of a language
öğrenmeye başla
A language L over an alphabet Σ is a subset of Σ∗ (i.e. L ⊆ Σ∗).
using the lexical definition fo a language what is {ε, aab, bb}
öğrenmeye başla
{ε, aab, bb} is a language over Σab;
using the lexical definition fo a language what is the set of all Java programs
öğrenmeye başla
the set of all Java programs is a language over Σasc;
using the lexical definition fo a language what is ∅
öğrenmeye başla
∅ is the empty language (over any alphabet) with no strings;
using the lexical definition fo a language what is {ε}
öğrenmeye başla
{ε} is a language (over any alphabet) containing just the empty string.
using the lexical definition fo a language what is Σ∗
öğrenmeye başla
Σ∗ is a language over Σ for any alphabet Σ.
How is the language of REs denoted?
öğrenmeye başla
L(RE) e.g. (a∗) = {ε, a, aa, aaa,...}
interpreter L(a|b)
öğrenmeye başla
L(a|b) = L(a)∪L(b)
explain a decision procedure for L
öğrenmeye başla
an algorithm such that a language L over some alphabet Σ is able to take any input stringw ∈ Σ∗, and: 1. outputs ‘Yes’ if w ∈ L and 2. outputs ‘No’ if w not ∈ L.
Languages that can be denoted by a RE, and can have a DFA/NFA as a decision procedure, are known as _____ ______.
öğrenmeye başla
Languages that can be denoted by a RE, and can have a DFA/NFA as a decision procedure, are known as regular languages.
Study has shown that we can write a decision procedure for language L(r) using one of the what 2 algorithms?
öğrenmeye başla
[1] a Deterministic Finite Automaton (DFA), or [2] a Nondeterministic Finite Automaton (NFA).
What is Nondeterministic Finite Automaton (NFA).
öğrenmeye başla
An NFA is a Nondeterministic Finite Automaton. Nondeterministic means it can transition to, and be in, multiple states at once (i.e. for some given input).
what is a Deterministic Finite Automaton (DFA),
öğrenmeye başla
A DFA is a Deterministic Finite Automaton. Deterministic means that it can only be in, and transition to, one state at a time (i.e. for some given input).

Yorum yapmak için giriş yapmalısınız.