Skip to content
Surf Wiki
Save to docs
general/computer-libraries

From Surf Wiki (app.surf) — the open knowledge base

TRE (computing)

Open-source library for pattern matching in text


Summary

Open-source library for pattern matching in text

FieldValue
nameTRE
authorVille Laurikari{{cite webwebsite=MIT.edu
urlhttp://web.mit.edu/r/current/arch/i386_linux26/lib/R/library/base/html/grepRaw.html
titleR: Pattern Matching for Raw Vectors}}
latest preview date
frequently updated
programming languageC
genreApproximate string matching
license2-clause BSD-like license
website

TRE is an open-source library for pattern matching in text,{{cite web

The library{{cite web

Features

TRE uses extended regular expression syntax with the addition of "directions" for matching preceding fragment in approximate way. Each of such directions specifies how many typos are allowed for this fragment.

Approximate matching{{cite conference |last1=Andoni |first1=Alexandr |first2=Robert |last2=Krauthgamer |first3=Krzysztof |last3=Onak

TypoExampleData
insertion of an extra characterregullar experessionextra l, extra e
missing a character from patternreglar expessionmissing u, missing r
replacement of some characterregolar exprezsionuo, sz

TRE allows specifying of cost for each of three typos type independently.

The project comes with a command-line utility, a reimplementation of agrep.

Though approximate matching requires some syntax extension, when this feature is not used, TRE works like most of other regular expression matching engines. This means that

  • it implements ordinary regular expressions written for strict matching;
  • programmers familiar with POSIX-style regular expressions need not do much study to be able to use TRE.

Predictable time and memory consumption

The library's author states that time spent for matching grows linearly with increasing of input text length, while memory requirement is constant during matching and does not depend on the input, only on the pattern.

Other

Other features, common for most regular expression engines could be checked in regex engines comparison tables or in list of TRE features on its web-page.

Usage example

Approximate matching directions are specified in curly brackets and should be distinguishable from repetitive quantifiers (possibly with inserting a space after opening bracket):

  • (regular){~1}\s+(expression){~2} would match variants of phrase "regular expression" in which "regular" have no more than one typo and "expression" no more than two; as in ordinary regular expressions "" means one or more space characters i.e. rogular ekspression would pass test;
  • `(expression){ 5i + 3d + 2s

Language bindings

Apart from C, TRE is usable through bindings for Perl, Python and Haskell. It is the default regular expression engine in R. However, if the project should be cross-platform, each target platform would need a separate interface.

Disadvantages

Since other regular expression engines usually do not provide approximate matching ability, there is almost no concurrent implementation with which TRE could be compared. However, there are a few things which programmers may wish to see implemented in future releases:{{cite arXiv |quote=practical improvements .. Lurikari algorithm, notably ..

  • a replacement mechanism for substituting matched text fragments (like in sed string processor and many modern implementations of regular expressions, including built into Perl or Java);
  • opportunity to use another approximate matching algorithm (than Levenshtein's) for better typo value assessment (for example Soundex), or at least this algorithm to be improved to allow typos of the "swap" type (see Damerau–Levenshtein distance).

References

References

  1. "TRE web-page - Regex Syntax".
  2. "Tre-agrep has all of grep's functionality but can also do ambiguous or fuzzy"
  3. "TRE web-page - About".
  4. "TRE web-page - FAQ".
  5. "Regular Expressions as used in R".
Wikipedia Source

This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page.

Want to explore this topic further?

Ask Mako anything about TRE (computing) — get instant answers, deeper analysis, and related topics.

Research with Mako

Free with your Surf account

Content sourced from Wikipedia, available under CC BY-SA 4.0.

This content may have been generated or modified by AI. CloudSurf Software LLC is not responsible for the accuracy, completeness, or reliability of AI-generated content. Always verify important information from primary sources.

Report