Python regex quick reference
Aug. 9, 2020
-
Back-End
Search Patterns
| Regex pattern |
Match |
| ^ |
Beginning of the string |
| $ |
End of the string |
| [a-e] |
= [abcde] |
| [0-5] |
= [012345] |
| [A-Z] |
= [ABCDEFGHIJKLMNOPQRSTUVWXYZ] |
| [A-Za-z] |
= all letters |
| [-az] or [az-] |
= "-" or "a" or "z" |
| [-a-z] |
= "-" or "a...z" |
| [^abc] |
= not ("a", "b" or "c") |
| [a^bc] |
= "a", "b", "c" or "^" |
| () |
defining a group |
| . |
Any character |
>>> re.findall(r"^A(.*)B","A123B")
['123']
>>> re.findall("1(a|b)2","001a20001b20")
['a', 'b']
| a{4} |
Exactly 4 a's |
| a{4,8} |
Between (inclusive) 4 and 8 a's |
| a{9,} |
9 or more a's |
| ? |
match 0 or 1 repetitions of the preceding re
ab? will match either ‘a’ or ‘ab’.
|
>>> re.findall("ab?","123abaacd")
['ab', 'a', 'a']
| * |
match 0 or more repetitions of the preceding re,
ab* will match ‘a’, ‘ab’, or ‘a’ followed by any number of ‘b’s.
|
>>> re.findall("ab*","123abbbbabacd")
['abbbb', 'ab', 'a']
| + |
match 1 or more repetitions of the preceding re .
ab+ will match ‘a’ followed by any non-zero number of ‘b’s; it will not match just ‘a’.
|
>>> re.findall("ab+","123abbbbabacd")
['abbbb', 'ab']
| ?, *, + |
are "greedy" patterns: they match as much text as possible.
Adding a '?' makes them "non-greedy": as few characters as possible will be matched.
|
>>> re.findall(r"<.*>","<a> b <c>")
['<a> b <c>']
>>> re.findall(r"<.*>","<a> b <c")
['<a>']
>>> re.findall(r"<.*?>","<a> b <c>")
['<a>', '<c>']
| \d |
Any decimal digit: [0-9] |
| \D |
complement of \d. Any non-digit character: [^0-9] |
| \s |
Any whitespace character: [ \t\n\r\f\v] |
| \S |
Complement of \s. Any non-whitespace character: [^ \t\n\r\f\v] |
| \w |
Any alphanumeric character: [a-zA-Z0-9_] |
| \W |
Complement of \w |
| \b |
A word boundary (empty string, but only at the start or end of a word) |
| \B |
A non-word boundary (empty string, but not at the start or end of a word) |
Escape Sequences in Strings
| Escape Sequence | Meaning Notes |
| \newline | Ignored |
| \\ | Backslash (\) |
| \' | Single quote (') |
| \" | Double quote (") |
| \a | ASCII Bell (BEL) |
| \b | ASCII Backspace (BS) |
| \f | ASCII Formfeed (FF) |
| \n | ASCII Linefeed (LF) |
| \N{name} | Character named name in the Unicode database (Unicode only) |
| \r | ASCII Carriage Return (CR) |
| \t | ASCII Horizontal Tab (TAB) |
| \uxxxx | Character with 16-bit hex value xxxx (Unicode only) |
| \Uxxxxxxxx | Character with 32-bit hex value xxxxxxxx (Unicode only) |
| \v | ASCII Vertical Tab (VT) |
| \ooo | Character with octal value ooo |
| \xhh | Character with hex value hh |
Groups
| (?P...) |
define a capturing group named 'name' |
| (?P=name) |
refer to the captured group named 'name' |
| \n |
the n'th captured group |
| (?#...) |
a comment |
>>> match = re.search(r"<([a-z]+)>(.*)</\1>","<name>Samuel</name>")
>>> match.group(1)
'name'
>>> match.group(2)
'Samuel'
>>> re.findall(r"<([a-z]+)>(.*)</\1>","<name>Samuel</name>")
[('name', 'Samuel')]
>>> m = re.search(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
>>> m.group('first_name')
'Malcolm'
>>> m.group('last_name')
'Reynolds'
>>> m.groupdict()
{'first_name': 'Malcolm', 'last_name': 'Reynolds'}
>>> re.findall(r"<(?P<tag>[a-z]+)>(.*)</(?P=tag)>", "<name>Malcolm Reynolds</name>")
[('name', 'Malcolm Reynolds')]
| (?=...) |
positive lookahead |
>>> re.findall('abc (?=def)', 'abc def')
['abc ']
| (?!...) |
negative lookahead |
>>> re.findall('abc(?!def)', 'abcde')
['abc']
| (?<=...) |
positive lookbehind |
>>> re.findall('(?<=abc)def', 'abcdef')
['def']
>>> re.findall(r'(?<=-)\w+', 'spam-egg')
['egg']
>>> re.findall(r'(?<=:).*\.(?#find the list)', 'This is an list: 1, 2, 3, 4 .')
[' 1, 2, 3, 4 .']
| (?<!...) |
negative lookbehind |
>>> re.findall('(?<!abc)def', 'abcdef defabc')
['def']
Example usage:
>>> import re
>>> match = re.search(r"at","A cat in a hat.")
>>> match
<_sre.SRE_Match object; span=(3, 5), match='at'>
>>> match = re.search(r"(at)","A cat in a hat.")
>>> m.group(1)
'at'
>>> m.group(0)
'at'
>>> m.span()
(3, 5)
>>> m.start()
3
>>> m.end()
5
>>> re.findall(r"at","A cat in a hat.")
['at', 'at']
>>> re.sub("at","**", "A cat in a hat.")
'A c** in a h**.'
>>> compiled_re = re.compile("at")
>>> compiled_re.search("A cat in a hat.")
<_sre.SRE_Match object; span=(3, 5), match='at'>
>>> re.findall(r"at","A cat in a hat./nA rAt and a bAt.", re.IGNORECASE)
['at', 'at', 'At', 'At']
>>> "A cat and a \n rat".split(" ")
['A', 'cat', '', 'and', '', 'a', '', '\n', '', 'rat']
>>> "A cat and a \n rat".split(None)
['A', 'cat', 'and', 'a', 'rat']
>>> "A cat and a \n rat".split(r"at")
['A c', ' and a \n r', '']
| Abbreviation | Full name | Description |
| re.I | re.IGNORECASE | Makes the regular expression case-insensitive |
| re.L | re.LOCALE | The behaviour of some special sequences like
\w, \W, \b,\s, \S will be made dependant on the current locale, i.e. the user's language, country aso. |
| re.M | re.MULTILINE | ^ and $ will match at the beginning and at the end of each line and
not just at the beginning and the end of the string |
| re.S | re.DOTALL | The dot "." will match every character plus the newline |
| re.U | re.UNICODE | Makes \w, \W, \b, \B, \d, \D, \s, \S dependent on Unicode
character properties |
| re.X | re.VERBOSE | Allowing "verbose regular expressions", i.e.
whitespace are ignored. This means that spaces, tabs, and carriage returns are not matched as such.
If you want to match a space in a verbose regular expression, you'll need to escape it by escaping it
with a backslash in front of it or include it in a character class.
# are also ignored, except when in a character class or preceded by an non-escaped backslash.
Everything following a "#" will be ignored until the end of the line, so this character can be
used to start a comment.
|
Links:
https://www.python-course.eu/python3_re.php
https://www.python-course.eu/python3_re_advanced.php
https://github.com/amalshehu/legendary-regex/blob/master/README.md
https://www.rexegg.com/regex-lookarounds.html
https://www.regular-expressions.info/lookaround.html
Return to home