String Searching Algorithms
1 Searching for single strings
1.1 Knuth-Morris-Pratt
kmp-string-contains
kmp-string-contains-ci
kmp-make-matcher
kmp-matcher?
kmp-matcher-ci?
kmp-matcher-pattern
kmp-find-string
kmp-find-all-strings
1.2 Boyer-Moore-Horspool
bmh-string-contains
bmh-string-contains-ci
bmh-make-matcher
bmh-matcher?
bmh-matcher-ci?
bmh-matcher-pattern
bmh-find-string
bmh-find-all-strings
2 Searching for multiple strings
2.1 Aho-Corasick
ahoc-make-matcher
ahoc-matcher?
ahoc-matcher-patterns
ahoc-find-string
ahoc-find-allstrings
8.6

String Searching Algorithms

    1 Searching for single strings

    2 Searching for multiple strings

 (require string-searchers) package: string-searchers

Provides a variety of string search algorithms written in Typed Racket. They look for sequences of exact code points, not equivalencies. When using with non-ASCII text, consider normalizing strings first.

1 Searching for single strings

Functions for searching for a single substring in strings.

1.1 Knuth-Morris-Pratt

Search for strings using the Knuth-Morris-Pratt algorithm. These functions are also available in the string-searchers/kmp module, without the kmp- prefix.

procedure

(kmp-string-contains haystack    
  needle    
  [start    
  end])  (Option Index)
  haystack : String
  needle : String
  start : Index = 0
  end : Index = (string-length haystack)
Returns the first index of haystack where needle occurs, or #f if not found. If searching for the same substring many times, prefer compiling a matcher object and using the following functions instead.

procedure

(kmp-string-contains-ci haystack    
  needle    
  [start    
  end])  (Option Index)
  haystack : String
  needle : String
  start : Index = 0
  end : Index = (string-length haystack)
Returns the first index of haystack where needle occurs, ignoring case, or #f if not found. If searching for the same substring many times, prefer compiling a matcher object and using the following functions instead.

procedure

(kmp-make-matcher pattern 
  [#:case-insensitive case-insensitive?]) 
  kmp-matcher
  pattern : String
  case-insensitive? : Boolean = #f
Compiles a search string into a matcher object, optionally using case-insensitive searching.

procedure

(kmp-matcher? obj)  Boolean

  obj : Any
Returns true if its argument is a kmp-matcher object.

procedure

(kmp-matcher-ci? m)  Boolean

  m : kmp-matcher
Tests if a matcher is case-insensitive or not

procedure

(kmp-matcher-pattern m)  String

  m : kmp-matcher
Returns the search string for this matcher.

procedure

(kmp-find-string m text [start end])  (Option Index)

  m : kmp-matcher
  text : String
  start : Index = 0
  end : Index = (string-length text)
Return the index of the first occurance of the matcher’s search string in text, or #f if not found.

procedure

(kmp-find-all-strings m    
  text    
  [start    
  end    
  #:overlap overlap?])  (Listof Index)
  m : kmp-matcher
  text : String
  start : Index = 0
  end : Index = (string-length text)
  overlap? : Boolean = #t
Return the indexs of all occurances of the matcher’s search string in text, or an empty list if not found. If the overlap option is true, found matches can overlap - searching for "bb" in "bbb" will return '(0 1) when true, (0) when false.

1.2 Boyer-Moore-Horspool

Search for strings using the Boyer-Moore-Horspool algorithm. These functions are also available in the string-searchers/bmh module, without the bmh- prefix.

procedure

(bmh-string-contains haystack    
  needle    
  [start    
  end])  (Option Index)
  haystack : String
  needle : String
  start : Index = 0
  end : Index = (string-length haystack)
Returns the first index of haystack where needle occurs, or #f if not found. If searching for the same substring many times, prefer compiling a matcher object and using the following functions instead.

procedure

(bmh-string-contains-ci haystack    
  needle    
  [start    
  end])  (Option Index)
  haystack : String
  needle : String
  start : Index = 0
  end : Index = (string-length haystack)
Returns the first index of haystack where needle occurs, ignoring case, or #f if not found. If searching for the same substring many times, prefer compiling a matcher object and using the following functions instead.

procedure

(bmh-make-matcher pattern 
  [#:case-insensitive case-insensitive?]) 
  bmh-matcher
  pattern : String
  case-insensitive? : Boolean = #f
Compiles a search string into a matcher object, optionally using case-insensitive searching.

procedure

(bmh-matcher? obj)  Boolean

  obj : Any
Returns true if its argument is a bmh-matcher object.

procedure

(bmh-matcher-ci? m)  Boolean

  m : bmh-matcher
Tests if a matcher is case-insensitive or not

procedure

(bmh-matcher-pattern m)  String

  m : bmh-matcher
Returns the search string for this matcher.

procedure

(bmh-find-string m text [start end])  (Option Index)

  m : bmh-matcher
  text : String
  start : Index = 0
  end : Index = (string-length text)
Return the index of the first occurance of the matcher’s search string in text, or #f if not found.

procedure

(bmh-find-all-strings m    
  text    
  [start    
  end    
  #:overlap overlap?])  (Listof Index)
  m : bmh-matcher
  text : String
  start : Index = 0
  end : Index = (string-length text)
  overlap? : Boolean = #t
Return the indexes of all occurances of the matcher’s search string in text, or an empty list if not found. If the overlap option is true, found matches can overlap - searching for "bb" in "bbb" will return '(0 1) when true, '(0) when false.

2 Searching for multiple strings

Sections for searching for any of multiple different substrings in a string.

2.1 Aho-Corasick

Search for strings using the Aho-Corasick algorithm. These functions are also available in the string-searchers/ahoc module, without the ahoc- prefix.

procedure

(ahoc-make-matcher s0 s1 ...)  ahoc-matcher

  s0 : String
  s1 : String
Create a new Aho-Corasick matcher object that looks for the given string(s).

procedure

(ahoc-matcher? obj)  Boolean

  obj : Any
Tests if its argument is an Aho-Corasick matcher or not.

procedure

(ahoc-matcher-patterns m)  (Listof String)

  m : ahoc-matcher
Return the search strings this matcher looks for.

procedure

(ahoc-find-string m text)  (Option (Pair Index String))

  m : ahoc-matcher
  text : String
Returns the first index of a matched string and which string it is, or #f if there are no matches.

procedure

(ahoc-find-allstrings m text)  (List (Pair Index String))

  m : ahoc-matcher
  text : String
Returns the locations of all matched strings, or an empty list if none match. If one of the search strings is a prefix of another, multiple results can have the same index.