-
Notifications
You must be signed in to change notification settings - Fork 67
Blog post automata #433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Blog post automata #433
Changes from 4 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
2d68942
Automata package blog post
eliotwrobson 5f45a1e
Create eliot-w-robson.jpg
eliotwrobson 4e409b4
Update 2024-07-10-automata.md
eliotwrobson 2390ad1
Update 2024-07-10-automata.md
eliotwrobson b2f4118
Update _posts/2024-07-10-automata.md
eliotwrobson File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
--- | ||
layout: single | ||
title: "automata: Simulation and manipulation" | ||
excerpt: automata is a package implementing structures and algorithms for manipulating finite automata, pushdown automata, and Turing machines, that was recently accepted into the pyOpenSci ecosystem. | ||
author: Eliot W. Robson | ||
permalink: /blog/automata | ||
header: | ||
overlay_image: /images/automata/finite_language_dfa.png | ||
overlay_filter: rgba(20, 13, 36, 0.8) | ||
categories: | ||
- blog-post | ||
- automata | ||
- formal-languages | ||
- models-of-computation | ||
comments: true | ||
--- | ||
|
||
|
||
Automata are abstract machines used to represent models of computation, and are a central object of study in theoretical computer science. Given an input string of characters over a fixed alphabet, these machines either accept or reject the string. A language corresponding to an automaton is | ||
the set of all strings it accepts. Three important families of automata in increasing order of generality are the following: | ||
|
||
1. Finite-state automata | ||
2. Pushdown automata | ||
3. Turing machines | ||
|
||
The [`automata`](https://caleb531.github.io/automata/) package facilitates working with these families by allowing simulation of reading input and higher-level manipulation | ||
of the corresponding languages using specialized algorithms. For an overview on automata theory, see [this Wikipedia article](https://en.wikipedia.org/wiki/Automata_theory), and | ||
for a more comprehensive introduction to each of these topics, see [these lecture notes](https://jeffe.cs.illinois.edu/teaching/algorithms/#models). | ||
|
||
## Statement of need | ||
|
||
Automata are a core component of both computer science education and research, seeing further theoretical work | ||
and applications in a wide variety of areas such as computational biology and networking. | ||
Consequently, the manipulation of automata with software packages has seen significant attention from | ||
researchers in the past. The similarly named Mathematica package [`Automata`](https://www.cs.cmu.edu/~sutner/automata.html) implements a number of | ||
algorithms for use with finite-state automata, including regular expression conversion and binary set operations. | ||
In Java, the [Brics package](https://www.brics.dk/automaton/) implements similar algorithms, while the [JFLAP package](https://www.jflap.org/) places an emphasis | ||
on interactivity and simulation of more general families of automata. | ||
|
||
[`automata`](https://caleb531.github.io/automata/) serves the demand for such a package in the Python software ecosystem, implementing algorithms and allowing for | ||
simulation of automata in a manner comparable to the packages described previously. As a popular high-level language, Python enables | ||
significant flexibility and ease of use that directly benefits many users. The package includes a comprehensive test suite, | ||
support for modern language features (including type annotations), and has a large number of different automata, | ||
meeting the demands of users across a wide variety of use cases. In particular, the target audience | ||
is both researchers that wish to manipulate automata, and for those in educational contexts to reinforce understanding about how these | ||
models of computation function. | ||
|
||
## Package features | ||
|
||
The API of the package is designed to mimic the formal mathematical description of each automaton using built-in Python data structures | ||
(such as sets and dicts). This is for ease of use by those that are unfamiliar with these models of computation, while also providing performance | ||
suitable for tasks arising in research. In particular, algorithms in the package have been written for tackling | ||
performance on large inputs, incorporating optimizations such as only exploring the reachable set of states | ||
in the construction of a new finite-state automaton. The package also has native display integration with Jupyter | ||
notebooks, enabling easy visualization that allows students to interact with [`automata`](https://caleb531.github.io/automata/) in an exploratory manner. | ||
|
||
Of note are some commonly used and technical algorithms implemented in the package for finite-state automata: | ||
|
||
- An optimized version of the Hopcroft-Karp algorithm to determine whether two deterministic finite automata (DFA) are equivalent. | ||
|
||
- The product construction algorithm for binary set operations (union, intersection, etc.) on the languages corresponding to two input DFAs. | ||
|
||
- Thompson's algorithm for converting regular expressions to equivalent nondeterministic finite automata (NFA). | ||
|
||
- Hopcroft's algorithm for DFA minimization. | ||
|
||
- A specialized algorithm for directly constructing a state-minimal DFA accepting a given finite language. | ||
|
||
- A specialized algorithm for directly constructing a minimal DFA recognizing strings containing a given substring. | ||
|
||
To the authors' knowledge, this is the only Python package implementing all of the automata manipulation algorithms stated above. | ||
|
||
## Example usage | ||
|
||
 | ||
|
||
The following example is inspired by the use case described in [this blog post](http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata). | ||
We wish to determine which strings in a given set are within the target edit distance | ||
to a reference string. We will first initialize a DFA corresponding to a fixed set of target words | ||
over the alphabet of all lowercase ascii characters. | ||
|
||
```python | ||
from automata.fa.dfa import DFA | ||
from automata.fa.nfa import NFA | ||
import string | ||
|
||
target_words_dfa = DFA.from_finite_language( | ||
input_symbols=set(string.ascii_lowercase), | ||
language={'these', 'are', 'target', 'words', 'them', 'those'}, | ||
) | ||
``` | ||
|
||
Next, we construct an NFA recognizing all strings within a target edit distance of a fixed | ||
reference string, and then immediately convert this to an equivalent DFA. The package provides | ||
builtin functions to make this construction easy, and we use the same alphabet as the DFA that was just created. | ||
|
||
```python | ||
words_within_edit_distance_dfa = DFA.from_nfa( | ||
NFA.edit_distance( | ||
input_symbols=set(string.ascii_lowercase), | ||
reference_str='they', | ||
max_edit_distance=2, | ||
) | ||
) | ||
``` | ||
|
||
Finally, we take the intersection of the two DFAs we have constructed and read all of | ||
the words in the output DFA into a list. The library makes this straightforward and idiomatic. | ||
|
||
```python | ||
found_words_dfa = target_words_dfa & words_within_edit_distance_dfa | ||
found_words = list(found_words_dfa) | ||
``` | ||
|
||
The DFA `found_words_dfa` accepts strings in the intersection of the languages of the | ||
DFAs given as input, and `found_words` is a list containing this language. Note the power of this | ||
technique is that the DFA `words_within_edit_distance_dfa` | ||
has an infinite language, meaning we could not do this same computation just using the builtin | ||
sets in Python directly (as they always represent a finite collection), although the | ||
syntax used by [`automata`](https://caleb531.github.io/automata/) is very similar to promote intuition. | ||
|
||
## Citing | ||
|
||
This post is adapted from [our JOSS paper](https://joss.theoj.org/papers/10.21105/joss.05759), which should be used for citations. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.