Url

Aho-Corasick algorithm

What is this site? It is generaly simplier version of wikipedia. You will find there selected articles. Enjoy!

The Aho–Corasick string matching algorithm is a string searching algorithm invented by Alfred V. Aho and Margaret J. Corasick. It is a kind of dictionary-matching algorithm that locates elements of a finite set of strings (the "dictionary") within an input text. It matches all patterns "at once", so the complexity of the algorithm is linear in the length of the patterns plus the length of the searched text plus the number of output matches. Note that because all matches are found, there can be a quadratic number of matches if every substring matches (e.g. dictionary = a, aa, aaa, aaaa and input string is aaaa).

Informally, the algorithm constructs a trie with suffix tree-like set of links from each node representing a string (e.g. abc) to the node corresponding to the longest proper suffix (e.g. bc if it exists, else c if that exists, else the root). It also contains links from each node to the longest suffix node that corresponds to a dictionary entry; thus all of the matches may be enumerated by following the resulting linked list. It then uses the trie at runtime, moving along the input and keeping the longest match, using the suffix links to make sure that computation is linear. For every node that is in the dictionary and every link along the dictionary suffix linked list, an output is generated.

When the pattern dictionary is known in advance (e.g. a computer virus database), the construction of the automaton can be performed once off-line and the compiled automaton stored for later use. In this case, its run time is linear in the length of the input plus the number of matched entries.

The Aho–Corasick string matching algorithm formed the basis of the original Unix command fgrep.

The following is the Aho–Corasick data structure constructed from the specified dictionary, with each row in the table representing a node in the trie, with the column path indicating the (unique) sequence of characters from the root to the node.

At each step, the current node is extended by finding its child, and if that doesn't exist, finding its suffix's child, and if that doesn't work, finding its suffix's suffix's child, finally ending in the root node if nothing's seen before.

Execution on input string abccab yields the following steps:

In general, more than one dictionary suffix link may need to be followed, as more than one dictionary entry may end at a given character in the input.

References

External links

Retrieved from "http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm"

All text are available under the terms of the GNU Free Documentation License. Hope this site help you/
osskill - new age - atrakcje turystyczna - biuro rachunkowe szczecin - artykuły o turystyce - szampon samochodowy - trening interwałowy - rozgrzewka przed bieganiem - ćwiczenia dla ciężarnych - walc wiedeński kroki - Noclegi Polska baza noclegowa - fajne gry logiczne - zagęszczarki gruntu - wózki paletowe - narzędzia pneumatyczne