Assignment 9: FSA recognition (and a bit more)

This is an ungraded assignment. You are not required to submit this assignment. However, you should still follow the link at the private course page to work on your own repository. You are also strongly recommended to try to use git properly/productively: commit every ‘unit’ of work (e.g., individual exercises) and bug fixes separately. The workflow instructed below assumes that you are doing this.

In this assignment, you will implement a simple FSA recognizer, and a few associated utilities. The skeleton of the code is given in fsa.py. Please complete the code by writing the parts indicated in the exercises below.

1. Read an AT&T formatted FSA file

A common (exchange) format for storing finite-state automata is so-called AT&T-format, which was originally used by the tools developed in AT&T research labs. In this set of exercises, we will use a simple (but compatible) version of the AT&T formatted files. An example finite state automaton (acceptor) in AT&T format is given below (example.fsa).

0	0	a
0	1	b
1	2	a
2

The lines with 3 tab-separated items indicate transitions, where the first item is the source state, the second one is the target state, and the third item is the input symbol (edge label). By convention, the first (source) state on the first line of the file is the initial state. The lines with a single item lists the accepting states.

The file contents listed above correspond to the following automaton.

Task: implement the method read_att, which takes a file name whose content is as described above, and populates the transition matrix and the other relevant members of the FSA class. Your method should throw an exception if the FSA is not deterministic.

2. DFA recognition

Write a method called recognize() that returns True if the given string argument is accepted by the FSA, False otherwise.

3. DFA generation

Implement the method generate() which generates a string from the language of the FSA by following a path randomly. You are free to choose the mechanisms for generation (a simple method is suggested in the assignment template). You can assume that there is no sink state (or any other ‘useless’ states). TIP: you may find the random.choice() and/or random.random() functions from the standard random library useful.

4. FSA visualization

While working with FSA, it is often useful to visualize the automata. Implement the method to_dot(), that writes a ‘dot’ representation of the FSA.

‘dot’ is a language defined by Graphviz, a very useful tool for visualizing graphs. ‘dot’ can define directed or undirected graphs, which can then be visualized or conveted to well-known vector (e.g., svg, pdf) or bitmap (e.g., png) formats. For example, the example automaton above can be represented in the ‘dot language’ with the following listing (example.dot)

digraph {
    rankdir = LR;
    start[style=invis];
    node[shape=circle];
    start -> 0;
    0 -> 0 [label="a"];
    0 -> 1 [label="b"];
    1 -> 2 [label="a"];
    2 [shape=doublecircle];
}

You can, for example, convert the dot file with the content above to a png file on the command line with dot example.dot -Tpng -o example.png.

The example above includes all you need for this exercise. However, if you want to style your graphs, you can find further documentation and tutorials from Graphviz web site.

Git TIP: the DFA version of your FSA class is now complete. The exercises below will require changes to your implementation. You may want to tag the state of your repository with a symbolic tag, e.g., git tag dfa, so that you can return to current version of your code easily.

5. Working with NFA

Modify the FSA class to handle arbitrary FSA (optionally including ϵ-NFA). You should,

If you decide to handle ϵ-NFA, you should also change your code to treat the string @0@ in AT&T files as ϵ. You can assume that the input FSA do not contain ϵ-loops.

6. Return a determinized version of the given FSA

Implement the method determinize() that determinizes the FSA using subset construction if it is non-deterministic.