Assignment 9: FSA recognition (and a bit more)
This is an ungraded assignment. You are not required to submit this assignment. However, you should still follow the link at the private course page to work on your own repository. You are also strongly recommended to try to use git properly/productively: commit every ‘unit’ of work (e.g., individual exercises) and bug fixes separately. The workflow instructed below assumes that you are doing this.
In this assignment, you will implement a simple FSA recognizer, and a few associated utilities. The skeleton of the code is given in fsa.py. Please complete the code by writing the parts indicated in the exercises below.
1. Read an AT&T formatted FSA file
A common (exchange) format for storing finite-state automata is so-called AT&T-format, which was originally used by the tools developed in AT&T research labs. In this set of exercises, we will use a simple (but compatible) version of the AT&T formatted files. An example finite state automaton (acceptor) in AT&T format is given below (example.fsa).
0 0 a
0 1 b
1 2 a
2
The lines with 3 tab-separated items indicate transitions, where the first item is the source state, the second one is the target state, and the third item is the input symbol (edge label). By convention, the first (source) state on the first line of the file is the initial state. The lines with a single item lists the accepting states.
The file contents listed above correspond to the following automaton.
Task: implement the method read_att
,
which takes a file name whose content is as described above,
and populates the transition matrix
and the other relevant members of the FSA
class.
Your method should throw an exception if the FSA is not deterministic.
2. DFA recognition
Write a method called recognize()
that returns True
if the given string argument is accepted by the FSA,
False
otherwise.
3. DFA generation
Implement the method generate()
which
generates a string from the language of the FSA
by following a path randomly.
You are free to choose the mechanisms for generation
(a simple method is suggested in the assignment template).
You can assume that there is no sink state (or any other ‘useless’ states).
TIP: you may find the random.choice()
and/or random.random()
functions from
the standard
random library useful.
4. FSA visualization
While working with FSA, it is often useful to visualize the automata.
Implement the method to_dot()
, that writes a ‘dot’ representation of the FSA.
‘dot’ is a language defined by Graphviz, a very useful tool for visualizing graphs. ‘dot’ can define directed or undirected graphs, which can then be visualized or conveted to well-known vector (e.g., svg, pdf) or bitmap (e.g., png) formats. For example, the example automaton above can be represented in the ‘dot language’ with the following listing (example.dot)
digraph {
rankdir = LR;
start[style=invis];
node[shape=circle];
start -> 0;
0 -> 0 [label="a"];
0 -> 1 [label="b"];
1 -> 2 [label="a"];
2 [shape=doublecircle];
}
You can, for example, convert the dot file with the content above
to a png file on the command line with dot example.dot -Tpng -o example.png
.
The example above includes all you need for this exercise. However, if you want to style your graphs, you can find further documentation and tutorials from Graphviz web site.
Git TIP: the DFA version of your FSA class is now complete.
The exercises below will require changes to your implementation.
You may want to tag the state of your repository with a symbolic
tag, e.g., git tag dfa
, so that you can return to current version of your code easily.
5. Working with NFA
Modify the FSA class to handle arbitrary FSA (optionally including ϵ-NFA). You should,
- make sure that your
transitions
can represent transitions of an NFA - update the instance variable
is_deterministic
while constructing the automata, such that it isTrue
if the automata is determinisitc andFalse
if non-deterministic - update the recognize function to make sure it uses a deterministic algorithm for DFA, and uses an agenda-based backtracking algorithm for NFA
If you decide to handle ϵ-NFA,
you should also change your code to treat the string @0@
in AT&T files as ϵ.
You can assume that the input FSA do not contain ϵ-loops.
6. Return a determinized version of the given FSA
Implement the method determinize()
that
determinizes the FSA using subset construction if it is non-deterministic.