Skip to content
Surf Wiki
Save to docs
general/compiling-tools

From Surf Wiki (app.surf) — the open knowledge base

Flex (lexical analyzer generator)

UNIX program for lexical analysis


Summary

UNIX program for lexical analysis

FieldValue
nameflex
developerVern Paxson
releasedaround
latest release version2.6.4
latest release date
operating systemUnix-like
genreLexical analyzer generator
licenseBSD license
website

Flex (fast lexical analyzer generator) is a free and open-source software alternative to lex. It is a computer program that generates lexical analyzers (also known as "scanners" or "lexers"). It is frequently used as the lex implementation together with Berkeley Yacc parser generator on BSD-derived operating systems (as both lex and yacc are part of POSIX), or together with GNU bison (a version of yacc) in *BSD ports and in Linux distributions. Unlike Bison, flex is not part of the GNU Project and is not released under the GNU General Public License, although a manual for Flex was produced and published by the Free Software Foundation.

History

Flex was written in C around 1987

Example lexical analyzer

This is an example of a Flex scanner for the instructional programming language PL/0.

The tokens recognized are: '+', '-', '*', '/', '=', '(', ')', ',', ';', '.', ':=', '', '', '='; numbers: 0-9 {0-9}; identifiers: a-zA-Z {a-zA-Z0-9} and keywords: begin, call, const, do, end, if, odd, procedure, then, var, while.

C
%{
#include "y.tab.h"
%}

digit         [0-9]
letter        [a-zA-Z]

%%
"+"                  { return PLUS;       }
"-"                  { return MINUS;      }
"*"                  { return TIMES;      }
"/"                  { return SLASH;      }
"("                  { return LPAREN;     }
")"                  { return RPAREN;     }
";"                  { return SEMICOLON;  }
","                  { return COMMA;      }
"."                  { return PERIOD;     }
":="                 { return BECOMES;    }
"="                  { return EQL;        }
"<>"                 { return NEQ;        }
"<"                  { return LSS;        }
">"                  { return GTR;        }
"<="                 { return LEQ;        }
">="                 { return GEQ;        }
"begin"              { return BEGINSYM;   }
"call"               { return CALLSYM;    }
"const"              { return CONSTSYM;   }
"do"                 { return DOSYM;      }
"end"                { return ENDSYM;     }
"if"                 { return IFSYM;      }
"odd"                { return ODDSYM;     }
"procedure"          { return PROCSYM;    }
"then"               { return THENSYM;    }
"var"                { return VARSYM;     }
"while"              { return WHILESYM;   }
{letter}({letter}|{digit})* {
                       yylval.id = strdup(yytext);
                       return IDENT;      }
{digit}+             { yylval.num = atoi(yytext);
                       return NUMBER;     }
[ \t\n\r]            /* skip whitespace */
.                    { printf("Unknown character [%c]\n",yytext[0]);
                       return UNKNOWN;    }
%%

int yywrap(void){return 1;}

Internals

Main article: Lexical analysis

These programs perform character parsing and tokenizing via the use of a deterministic finite automaton (DFA). A DFA is a theoretical machine accepting regular languages, and is equivalent to read-only right moving Turing machines. The syntax is based on the use of regular expressions. See also nondeterministic finite automaton.

Issues

Time complexity

A Flex lexical analyzer usually has time complexity O(n) in the length of the input. That is, it performs a constant number of operations for each input symbol. This constant is quite low: GCC generates 12 instructions for the DFA match loop. Note that the constant is independent of the length of the token, the length of the regular expression and the size of the DFA.

However, using the REJECT macro in a scanner with the potential to match extremely long tokens can cause Flex to generate a scanner with non-linear performance. This feature is optional. In this case, the programmer has explicitly told Flex to "go back and try again" after it has already matched some input. This will cause the DFA to backtrack to find other accept states. The REJECT feature is not enabled by default, and because of its performance implications its use is discouraged in the Flex manual.

Reentrancy

By default the scanner generated by Flex is not reentrant. This can cause serious problems for programs that use the generated scanner from different threads. To overcome this issue there are options that Flex provides in order to achieve reentrancy. A detailed description of these options can be found in the Flex manual.

Usage under non-Unix environments

Normally the generated scanner contains references to the unistd.h header file, which is Unix specific. To avoid generating code that includes unistd.h, %option nounistd should be used. Another issue is the call to isatty (a Unix library function), which can be found in the generated code. The %option never-interactive forces flex to generate code that does not use isatty.

Using flex from other languages

Flex can only generate code for C and C++. To use the scanner code generated by flex from other languages a language binding tool such as SWIG can be used.

Unicode support

Flex is limited to matching 1-byte (8-bit) binary values and therefore does not support Unicode. RE/flex and other alternatives do support Unicode matching.

Flex++

flex++ is a similar lexical scanner for C++ which is included as part of the flex package. The generated code does not depend on any runtime or external library except for a memory allocator (malloc or a user-supplied alternative) unless the input also depends on it. This can be useful in embedded and similar situations where traditional operating system or C runtime facilities may not be available.

The flex++ generated C++ scanner includes the header file FlexLexer.h, which defines the interfaces of the two C++ generated classes.

References

References

  1. Levine, John. (August 2009). "flex & bison". O'Reilly Media.
  2. (1992). "lex & yacc". [[O'Reilly Media.
  3. (1992). "lex & yacc". [[O'Reilly Media.
  4. Levine, John. (August 2009). "flex & bison". O'Reilly Media.
  5. OpenBSD. (2015-12-11). "src/usr.bin/lex/".
  6. "flex(1)".
  7. "yacc(1)".
  8. (2015-11-15). "bison-3.0.4 – GNU parser generator". [[OpenBSD ports]].
  9. [https://westes.github.io/flex/manual/Is-flex-GNU-or-not_003f.html#Is-flex-GNU-or-not_003f Is flex GNU or not?], flex FAQ
  10. "Flex - a scanner generator - Table of Contents - GNU Project - Free Software Foundation (FSF)".
  11. "Flex, version 2.5 A fast scanner generator Edition 2.5, March 1995".
  12. "Performance - Lexical Analysis With Flex, for Flex 2.5.37". Flex.sourceforge.net.
  13. "Reentrant - Lexical Analysis With Flex, for Flex 2.5.37". Flex.sourceforge.net.
  14. "Code-Level And API Options - Lexical Analysis With Flex, for Flex 2.5.37". Flex.sourceforge.net.
  15. Tomassetti, Gabriele. (2020-03-04). "Why you should not use (f)lex, yacc and bison".
Wikipedia Source

This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page.

Want to explore this topic further?

Ask Mako anything about Flex (lexical analyzer generator) — get instant answers, deeper analysis, and related topics.

Research with Mako

Free with your Surf account

Content sourced from Wikipedia, available under CC BY-SA 4.0.

This content may have been generated or modified by AI. CloudSurf Software LLC is not responsible for the accuracy, completeness, or reliability of AI-generated content. Always verify important information from primary sources.

Report