Homework 3
Last updated: Wed, 7 Oct 2020 12:06:23 -0400
Out: Wed Sept 30, 00:00 EST Due: Sun Oct 11, 23:59 EST (Note: extended deadline)
This is the last homework related to Chapter 1.
You will explore NFAs and their relation to regular expressions, an additional closure property of regular langs, and get practice distinguishing when a language is not regular.
Homework Problems
A Regular Expression Matcher (8 pts + 4 pts bonus)
The Backwards Operation (5 pts)
A Non-Regular language (5 pts)
A Regular and a Non-Regular Language (4 + 4 = 8 pts)
Total: 26 points (+ 4 bonus points)
Submitting
Note: This assignment has several non-code components, e.g., see The Backwards Operation, A Non-Regular language, or A Regular and a Non-Regular Language.
Submit the non-code part of this assignment at Gradescope hw3 non-code.
They must be pdf or plain text files and each file must be assigned to the correct problem in Gradescope.
Submit the code part of your solution to this assignment at Gradescope hw3.
The code submission must include the following files:- a Makefile with the following targets:
setup
run-hw3-regexp
a README containing the required information,
and the source code files needed by your Makefile.
1 Concatentation
Your Task
Implement concatentation for your NFAs. Specifically, write a function that, given:
an NFA recognizing language A_1, and
an NFA recognizing language A_2
To do this, you’ll need to be able to extract individual components of an NFA instance, e.g., the states, transitions, etc.
NOTE: When combining NFAs be careful with duplicate state names. Since the given files may use the same state names, it may be a good idea to first rename them to something unique.
There will not be any direct tests for this problem.
2 Kleene Star
Your Task
Implement the Kleene Star for your NFAs. Specifically, write a function that, given an NFA recognizing language A, returns an NFA recognizing A*.
To do this, you’ll need to be able to extract individual components of an NFA instance, e.g., the states, transitions, etc.
There will be not any direct tests for this problem.
3 A Regular Expression Matcher
Recall the formal definition (Def 1.52) of Regular Expressions from the textbook:
a for some a\in\textrm{alphabet }\Sigma,
the empty string \varepsilon,
the empty set \emptyset,
R_1 \cup R_2, sometimes written R_1\mid R_2, where R_1 and R_2 are regular expressions,
R_1 \circ R_2, sometimes written R_1R_2, where R_1 and R_2 are regular expressions,
R_1^*, where R_1 is a regular expression.
Following the proof of Lemma 1.55 from the textbook, you can use your Data Representation for NFAs from HW2, and the union, concatenation, and Kleene star operations you implemented, to easily create an NFA to match any regular expression!
In other words, you’ve implemented the equivalent of the (core of the) regular expression matcher found in nearly every programming language! Congratulations! (The regexp matcher included with mature PLs are of course much more efficient, and have many more bells and whistles than the theoretical models we are studying. But the basic behavior is the same.)
To demonstrate this, use Lemma 1.55 and the NFA closure operations you implemented to create NFAs for each of the regular expressions below (these are all from Example 1.53 in the textbook).
There are 12 examples total, but you only need to submit 8 to get full credit (every example beyond 8 will be extra credit).
Assume that \Sigma = \{0,1\}, a ^+ means "1 or more", and a \Sigma in a regular expression below means 0\cup 1.
0^*10^* = \{w \mid w \textrm{ contains a single } 1\}
\Sigma^*1\Sigma^* = \{w \mid w \textrm{ has at least one } 1\}
\Sigma^*001\Sigma^* = \{w \mid w \textrm{ contains the string } 001 \textrm{ as a substring}\}
1^*(01^+)^* = \{w \mid \textrm{every } 0 \textrm{ in } w \textrm{ is followed by at least one } 1\}
(\Sigma\Sigma)^* = \{w \mid w \textrm{ is a string of even length}\}
(\Sigma\Sigma\Sigma)^* = \{w \mid \textrm{the length of } w \textrm{ is a multiple of } 3\}
01 \cup 10 = \{01,10\}
0\Sigma^*0\cup 1\Sigma^*1\cup 0\cup 1 = \{w\mid w \textrm{ starts and ends with the same symbol}\}
(0\cup\varepsilon)1^* = 01^*\cup1^*
(0\cup\varepsilon)(1\cup\varepsilon) = \{\varepsilon,0,1,01\}
1^*\emptyset = \emptyset
\emptyset^* = \{\varepsilon\}
To show you that each NFA you create truly recognizes the same language as a PL’s regexp matcher, the grader will test your submission using the regexp implementation from an actual PL (e.g., could be Perl, Python, Java, Racket, etc.).
Specifically, your solution will be tested as follows:
Input (from stdin): a number, from 1 to 12 (inclusive), corresponding to one of the regular expression examples from above
Expected Output (to stdout): An XML automaton element representing an NFA that recognizes the same language (for strings up to length 5) as the regular expression corresponding to the input number.
To be correct, however, this NFA must have been constructed from smaller NFAs using union, concat, and star as described in Lemma 1.55. We will inspect your source code to ensure this.
To further illustrate what I mean, here is a snippet of (Racket) code from my solution:"Part of my solution, to illustrate what I want"
(define N0 (single-char-nfa "0")) ; NFA recognizing {"0"}
(define N1 (single-char-nfa "1")) ; NFA recognizing {"1"}
(define N0* (nfa* N0)) ; NFA recognizing {"", "0", "00", "000", ...}
(case (read-stdin)
["1" (nfa->xml (nfa-concats N0* N1 N0*))]
....)
Makefile target name: run-hw3-regexp
Example:
printf "1" | make -sf Makefile run-hw3-regexp
Output:<automaton>
<state id="0" name="q164"><initial/></state>
<state id="1" name="N01"></state>
<state id="2" name="N02"></state>
<state id="3" name="N13178"></state>
<state id="4" name="N14179"></state>
<state id="5" name="q164180"><final/></state>
<state id="6" name="N01181"></state>
<state id="7" name="N02182"><final/></state>
<transition><from>0</from><to>3</to><read/></transition>
<transition><from>0</from><to>1</to><read/></transition>
<transition><from>1</from><to>2</to><read>0</read></transition>
<transition><from>2</from><to>3</to><read/></transition>
<transition><from>2</from><to>1</to><read/></transition>
<transition><from>3</from><to>4</to><read>1</read></transition>
<transition><from>4</from><to>5</to><read/></transition>
<transition><from>5</from><to>6</to><read/></transition>
<transition><from>6</from><to>7</to><read>0</read></transition>
<transition><from>7</from><to>6</to><read/></transition>
</automaton>
More Hints:
To further help you, below are the exact regexp patterns (each corresponds to a numbered example from above) that the grader will use with a regexp matcher to compare to your output NFA (the grader will only check strings up to length 5).
Note that in typical regexp matching, a union of single chars, e.g., 0 \cup 1, is typically written [01], while the "vertical bar" \mid is a more general union operation.
"0*10*"
"[01]*1[01]*"
"[01]*001[01]*"
"1*(01+)*"
"([01][01])*"
"([01][01][01])*"
"01|10"
"0[01]*0|1[01]*1|0|1"
"01*|1*"
"^$|^0$|^1$|^01$"
(won’t work without the ^ or $, which stand for "string start" and "string end", respectively, demonstrating that there are some differences between the theoretical regular expressions we study in class, and the regexp matching available in programming languages)
"\b\B" (pattern that won’t match anything)
""
With these, you should be able to easily test your solution on your own using the regexp library in your favorite language. Make sure to choose "exact match" ("partial match" is the default in some langs and will not recognize the same language), which corresponds to the textbook’s notion of matching. Specifically, for any given string in the language of alphabet \Sigma = \{0,1\} (up to length 5), your NFA should accept the string if and only if the equivalent regexp pattern matches on the string.
Finally, if you still need more explanations, the internet has plenty of sites dedicated to learning about, and interactively exploring regexps, e.g., regexr.com or regex101.com.
4 The Backwards Operation
Define the Backwards (BW) operation of a language to be BW(L) = \{c_n \ldots c_0 \mid c_0\ldots c_n \in L\}, c_i \in \Sigma.
Prove that regular languages are closed under the Backwards operation.
You may use any proof style (e.g., proof by construction, inductive proofs) and may assume any theorems proven in class or the book.
Also, you may use any representation of regular languages, e.g., DFA, NFA, regexp, etc.
You may write up your proof in any form but the submission must either be a pdf or readable as plain text. Also, make sure you assign it to the proper problem in Gradescope when submitting.
5 A Non-Regular language
Prove that the language L = \{www \mid w\in \{0,1\}^*\} is not regular.
6 A Regular and a Non-Regular Language
The language L_1 = \{a^nsa^n\mid n > 0\textrm{ and }s\in\Sigma^*\} is regular.
The language L_2 = \{a^nbsa^n\mid n > 0\textrm{ and }s\in\Sigma^*\}, however, is not regular.
You may assume \Sigma = \{a,b\}.
Prove that L_1 is regular.
Prove that L_2 is not regular.