The Spencer regexp engine is shared between Postgres and TCL.
Here is a link to the internal doc of the engine at Postgres: http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/regex/README;hb=HEAD
Another engine of interest is IMHO Google's re2:
- See https://github.com/google/re2
- BSD license
- Full unicode range (not just BMP)
- On the negative side, C++.
- Article series on RE engines by the author, https://swtch.com/~rsc/regexp/regexp3.html (link to last in the series, has links to the others)
- An interesting point made in that last article: Support of unicode in the UTF-8 domain by matching bytes, not characters, and integrating UTF-8 decoding into the automaton. This also handles character classes by making them a simple alternation of the characters in the class and optimizing the automaton (merging constant prefixes and suffixes). Examples in the article, with pictures. I came across this again when reading a blog article about ripgrep (grep written in the Rust language). Had some other interesting ideas as well, like extraction of fixed sub-strings (prefix, suffix, inside, multiple) with fast search (SIMD Aho-Corasick derivative) for these, as a means of reducing the candidate locations where to try the full regexp. We do have something like that in Expect, with fast-matching *gate-keeper* glob patterns, although we don't do the SIMD thing.