TDParser internals¶
Beyond the Reference section, here is an in-depth description of
tdparser‘s internals.
Lexer helpers¶
This module holds the tdparser.Lexer class, which is available in
the top-level tdparser module.
-
class
tdparser.lexer.TokenRegistry¶ This class holds a set of (token, regexp) pairs, and selects the appropriate pair to extract data from a string.
Note
The
TokenRegistrydoesn’t interact with theTokensubclasses provided throughregister().This means that any kind of value could be provided for this field, and will be returned as-is by the
get_token()method.-
_tokens¶ Holds a list of (
Token,re.RegexObject) tuples. These are the tokens in the order they were inserted (insertion order matters).Type: list of ( Tokensubclass,re.RegexObject) tuples
-
register(self, token, regexp)¶ Register a
Tokensubclass for the givenregexp.Parameters: - token (tdparser.Token) – The
Tokensubclass to register - regexp (str) – The regular expression (as a string) associated with the token
- token (tdparser.Token) – The
-
matching_tokens(self, text[, start=0])¶ Retrieve all tokens matching a given text. The optional
startargument can be used to alter the start position for thematch()call.Parameters: - text (str) – Text for which matching (
Token,re.MatchObject) pairs should be searched - start (int) – Optional start position with
textfor the regexpmatch()call
Returns: Yields tuples of (
Token,re.MatchObject) for each token whose regexp matched thetext.- text (str) – Text for which matching (
-
get_token(self, text[, start=0])¶ Retrieve the best token class and the related
matchat the start of the giventext.The algorithm for choosing the “best” class is:
- Fetch all matching tokens (through
matching_tokens()) - Select those with the longest match
- Return the first of those tokens
A different starting position for
match()calls can be provided in thestartparameter.Parameters: - text (str) – Text for which the (
Token,re.MatchObject) pair should be returned - start (int) – Optional start position with
textfor the regexpmatch()call
Returns: (
Token,re.MatchObject) pair, the best match for the giventext.- Fetch all matching tokens (through
-
__len__(self)¶ The
len()of aTokenRegistryis the length of its_tokensattribute.
-