The regular expression language is relatively small and restricted, so not allpossible string processing tasks can be done using regular expressions. Thereare also tasks that can be done with regular expressions, but the expressionsturn out to be very complicated. In these cases, you may be better off writingPython code to do the processing; while Python code will be slower than anelaborate regular expression, it will also probably be more understandable.
How to with Regular expressions
Most letters and characters will simply match themselves. For example, theregular expression test will match the string test exactly. (You canenable a case-insensitive mode that would let this RE match Test or TESTas well; more about this later.)
You can match the characters not listed within the class by complementingthe set. This is indicated by including a '^' as the first character of theclass. For example, [^5] will match any character except '5'. If thecaret appears elsewhere in a character class, it does not have special meaning.For example: [5^] will match either a '5' or a '^'.
In short, to match a literal backslash, one has to write '\\\\' as the REstring, because the regular expression must be \\, and each backslash mustbe expressed as \\ inside a regular Python string literal. In REs thatfeature backslashes repeatedly, this leads to lots of repeated backslashes andmakes the resulting strings difficult to understand.
Once you have an object representing a compiled regular expression, what do youdo with it? Pattern objects have several methods and attributes.Only the most significant ones will be covered here; consult the re docsfor a complete listing.
You can learn about this by interactively experimenting with the remodule. If you have tkinter available, you may also want to look atTools/demo/redemo.py, a demonstration program included with thePython distribution. It allows you to enter REs and strings, and displayswhether the RE matches or fails. redemo.py can be quite useful whentrying to debug a complicated RE.
Usually ^ matches only at the beginning of the string, and $ matchesonly at the end of the string and immediately before the newline (if any) at theend of the string. When this flag is specified, ^ matches at the beginningof the string and at the beginning of each line within the string, immediatelyfollowing each newline. Similarly, the $ metacharacter matches either atthe end of the string and at the end of each line (immediately preceding eachnewline).
Matches at the beginning of lines. Unless the MULTILINE flag has beenset, this will only match at the beginning of the string. In MULTILINEmode, this also matches immediately after each newline within the string.
Frequently you need to obtain more information than just whether the RE matchedor not. Regular expressions are often used to dissect strings by writing a REdivided into several subgroups which match different components of interest.For example, an RFC-822 header line is divided into a header name and a value,separated by a ':', like this:
Groups are marked by the '(', ')' metacharacters. '(' and ')'have much the same meaning as they do in mathematical expressions; they grouptogether the expressions contained inside them, and you can repeat the contentsof a group with a quantifier, such as *, +, ?, orm,n. For example, (ab)* will match zero or more repetitions ofab.
Split string by the matches of the regular expression. If capturingparentheses are used in the RE, then their contents will also be returned aspart of the resulting list. If maxsplit is nonzero, at most maxsplit splitsare performed.
Another common task is to find all the matches for a pattern, and replace themwith a different string. The sub() method takes a replacement value,which can be either a string or a function, and the string to be processed.
If replacement is a string, any backslash escapes in it are processed. Thatis, \n is converted to a single newline character, \r is converted to acarriage return, and so forth. Unknown escapes such as \& are left alone.Backreferences, such as \6, are replaced with the substring matched by thecorresponding group in the RE. This lets you incorporate portions of theoriginal text in the resulting replacement string.
When using the module-level re.sub() function, the pattern is passed asthe first argument. The pattern may be provided as an object or as a string; ifyou need to specify regular expression flags, you must either use apattern object as the first parameter, or use embedded modifiers in thepattern string, e.g. sub("(?i)b+", "x", "bbbb BBBB") returns 'x x'.
Another common task is deleting every occurrence of a single character from astring or replacing it with another single character. You might do this withsomething like re.sub('\n', ' ', S), but translate() is capable ofdoing both tasks and will be faster than any regular expression operation canbe.
Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects. These patterns are used with the exec() and test() methods of RegExp, and with the match(), matchAll(), replace(), replaceAll(), search(), and split() methods of String. This chapter describes JavaScript regular expressions.
A regular expression pattern is composed of simple characters, such as /abc/, or a combination of simple and special characters, such as /ab*c/ or /Chapter (\d+)\.\d*/. The last example includes parentheses, which are used as a memory device. The match made with this part of the pattern is remembered for later use, as described in Using groups.
Simple patterns are constructed of characters for which you want to find a direct match. For example, the pattern /abc/ matches character combinations in strings only when the exact sequence "abc" occurs (all characters together and in that order). Such a match would succeed in the strings "Hi, do you know your abc's?" and "The latest airplane designs evolved from slabcraft.". In both cases the match is with the substring "abc". There is no match in the string "Grab crab" because while it contains the substring "ab c", it does not contain the exact substring "abc".
Assertions include boundaries, which indicate the beginnings and endings of lines and words, and other patterns indicating in some way that a match is possible (including look-ahead, look-behind, and conditional expressions).
Groups group multiple patterns as a whole, and capturing groups provide extra submatch information when using a regular expression pattern to match against a string. Backreferences refer to a previously captured group in the same regular expression.
If using the RegExp constructor with a string literal, remember that the backslash is an escape in string literals, so to use it in the regular expression, you need to escape it at the string literal level. /a\*b/ and new RegExp("a\\*b") create the same expression, which searches for "a" followed by a literal "*" followed by "b".
The "g" after the regular expression is an option or flag that performs a global search, looking in the whole string and returning all matches. It is explained in detail below in Advanced Searching With Flags.
Parentheses around any part of the regular expression pattern causes that part of the matched substring to be remembered. Once remembered, the substring can be recalled for other use. See Groups and backreferences for more details.
When you want to know whether a pattern is found in a string, use the test() or search() methods; for more information (but slower execution) use the exec() or match() methods. If you use exec() or match() and if the match succeeds, these methods return an array and update properties of the associated regular expression object and also of the predefined regular expression object, RegExp. If the match fails, the exec() method returns null (which coerces to false).
As shown in the second form of this example, you can use a regular expression created with an object initializer without assigning it to a variable. If you do, however, every occurrence is a new regular expression. For this reason, if you use this form without assigning it to a variable, you cannot subsequently access the properties of that regular expression. For example, assume you have this script:
The occurrences of /d(b+)d/g in the two statements are different regular expression objects and hence have different values for their lastIndex property. If you need to access the properties of a regular expression created with an object initializer, you should first assign it to a variable.
Regular expressions have optional flags that allow for functionality like global searching and case-insensitive searching. These flags can be used separately or together in any order, and are included as part of the regular expression.
The m flag is used to specify that a multiline input string should be treated as multiple lines. If the m flag is used, ^ and $ match at the start or end of any line within the input string instead of the start or end of the entire string.
The "u" flag is used to create "unicode" regular expressions; that is, regular expressions which support matching against unicode text. This is mainly accomplished through the use of Unicode property escapes, which are supported only within "unicode" regular expressions.
In the following example, the user is expected to enter a phone number. When the user presses the "Check" button, the script checks the validity of the number. If the number is valid (matches the character sequence specified by the regular expression), the script shows a message thanking the user and confirming the number. If the number is invalid, the script informs the user that the phone number is not valid.
\n Regular expressions are patterns used to match character combinations in strings.\n In JavaScript, regular expressions are also objects. These patterns are used with the exec() and test() methods of RegExp, and with the match(), matchAll(), replace(), replaceAll(), search(), and split() methods of String.\n This chapter describes JavaScript regular expressions.\n 2ff7e9595c
Comments