Cara menggunakan php regex literal

Regular Expression, or regex or regexp in short, is extremely and amazingly powerful in searching and manipulating text strings, particularly in processing text files. One line of regex can easily replace several dozen lines of programming codes.

Regex is supported in all the scripting languages (such as Perl, Python, PHP, and JavaScript); as well as general purpose programming languages such as Java; and even word processors such as Word for searching texts. Getting started with regex may not be easy due to its geeky syntax, but it is certainly worth the investment of your time.

Regex By Examples

This section is meant for those who need to refresh their memory. For novices, go to the next section to learn the syntax, before looking at these examples.

Regex Syntax Summary

  • Character: All characters, except those having special meaning in regex, matches themselves. E.g., the regex x matches substring "x"; regex 9 matches find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++0; regex find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++1 matches find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++2; and regex find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++3 matches find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++4.
  • Special Regex Characters: These characters have special meaning in regex (to be discussed below): find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++6, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++7, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++8, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 231, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 232, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 233, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 234, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 235, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 236, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 237, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238.
  • Escape Sequences (\char):
    • To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238). E.g., 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 460 matches 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 461; regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 462 matches 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 463; and regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 464 matches 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 465.
    • You also need to use regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 466 to match 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 467 (back-slash).
    • Regex recognizes common escape sequences such as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 468 for newline, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 469 for tab, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200 for carriage-return, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 201 for a up to 3-digit octal number, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202 for a two-digit hex code, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 203 for a 4-digit Unicode, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 204 for a 8-digit Unicode.
  • A Sequence of Characters (or String): Strings can be matched via combining a sequence of characters (called sub-expressions). E.g., the regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 205 matches 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 206. The matching, by default, is case-sensitive, but can be set to case-insensitive via modifier.
  • OR Operator (|): E.g., the regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 207 accepts strings 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 208 or 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 209.
  • Character class (or Bracket List):
    • [...]: Accept ANY ONE of the character within the square bracket, e.g., input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"0 matches input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"1, input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"2, input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"3, input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"4 or input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"5.
    • [.-.] (Range Expression): Accept ANY ONE of the character in the range, e.g., input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"6 matches any digit; input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"7 matches any uppercase or lowercase letters.
    • [^...]: NOT ONE of the character, e.g., input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"8 matches any non-digit.
    • Only these four characters require escape sequence inside the bracket list: find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9, ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$0, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 234, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238.
  • Occurrence Indicators (or Repetition Operators):
    • +: one or more (^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$3), e.g., ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$4 matches one or more digits such as ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$5, ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$6.
    • *: zero or more (^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$7), e.g., ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$8 matches zero or more digits. It accepts all those in ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$4 plus the empty string.
    • ?: zero or one (optional), e.g., x0 matches an optional 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 463, x2, or an empty string.
    • {m,n}: x3 to x4 (both inclusive)
    • {m}: exactly x3 times
    • {m,}: x3 or more (x7)
  • Metacharacters: matches a character
    • . (dot): ANY ONE character except newline. Same as x8
    • \d, \D: ANY ONE digit/non-digit character. Digits are input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"6
    • \w, \W: ANY ONE word/non-word character. For ASCII, word characters are "x"0
    • \s, \S: ANY ONE space/non-space character. For ASCII, whitespace characters are "x"1
  • Position Anchors: does not match character, but position such as start-of-line, end-of-line, start-of-word and end-of-word.
    • ^, $: start-of-line and end-of-line respectively. E.g., "x"2 matches a numeric string.
    • \b: boundary of word, i.e., start-of-word or end-of-word. E.g., "x"3 matches the word "x"4 in the input string.
    • \B: Inverse of \b, i.e., non-start-of-word or non-end-of-word.
    • \<, \>: start-of-word and end-of-word respectively, similar to "x"5. E.g., 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238 matches the word "x"4 in the input string.
    • \A, \Z: start-of-input and end-of-input respectively.
  • Parenthesized Back References:
    • Use parentheses "x"8 to create a back reference.
    • Use "x"9, 90, ... (Java, Perl, JavaScript) or 91, 92, ... (Python) to retreive the back references in sequential order.
  • Laziness (Curb Greediness for Repetition Operators): 93, 94, 95, 96, 97

Example: Numbers [0-9]+ or \d+

  1. A regex (regular expression) consists of a sequence of sub-expressions. In this example, input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"6 and find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++6.
  2. The find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++00, known as character class (or bracket list), encloses a list of characters. It matches any SINGLE character in the list. In this example, input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"6 matches any SINGLE character between 0 and 9 (i.e., a digit), where dash (^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$0) denotes the range.
  3. The find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++6, known as occurrence indicator (or repetition operator), indicates one or more occurrences (^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$3) of the previous sub-expression. In this case, ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$4 matches one or more digits.
  4. A regex may match a portion of the input (i.e., substring) or the entire input. In fact, it could match zero or more substrings of the input (with global modifier).
  5. This regex matches any numeric substring (of digits 0 to 9) of the input. For examples,
    1. If the input is find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++06, it matches substring find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++07.
    2. If the input is find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++08, it matches nothing.
    3. If the input is find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++09, it matches substrings find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++10, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++11 and find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++12 (three matches).
    Take note that this regex matches number with leading zeros, such as find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++13, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++14 and find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++15, which may not be desirable.
  6. You can also write find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++16, where find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++17 is known as a metacharacter that matches any digit (same as input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"6). There are more than one ways to write a regex! Take note that many programming languages (C, Java, JavaScript, Python) use backslash 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238 as the prefix for escape sequences (e.g., 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 468 for newline), and you need to write find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++21 instead.

Code Examples (Python, Java, JavaScript, Perl, PHP)

Code Example in Python

See "" for full coverage.

Python supports Regex via module find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++22. Python also uses backslash (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238) for escape sequences (i.e., you need to write 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 466 for 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++27 for find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++17), but it supports raw string in the form of find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++29, which ignore the interpretation of escape sequences - great for writing regex.

Code Example in Java

See "Regular Expressions (Regex) in Java" for full coverage.

Java supports Regex in package find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++30.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

The output is:

find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++Code Example in Perl

See "" for full coverage.

Perl makes extensive use of regular expressions with many built-in syntaxes and operators. In Perl (and JavaScript), a regex is delimited by a pair of forward slashes (default), in the form of find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++31. You can use built-in operators:

  • m/regex/modifier or /regex/modifier: Match against the find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++32. x3 is optional.
  • s/regex/replacement/modifier: Substitute matched substring(s) by the replacement.

In Perl, you can use single-quoted non-interpolating string find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++34 to write regex to disable interpretation of backslash (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238) by Perl.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23Code Example in JavaScript

See "" for full coverage.

In JavaScript (and Perl), a regex is delimited by a pair of forward slashes, in the form of find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++36. There are two sets of methods, issue via a find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++37 object or a find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++38 object.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46Code Example in PHP

[TODO]

Example: Full Numeric Strings ^[0-9]+$ or ^\d+$

  1. The leading find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9 and the trailing 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230 are known as position anchors, which match the start and end positions of the line, respectively. As the result, the entire input string shall be matched fully, instead of a portion of the input string (substring).
  2. This regex matches any non-empty numeric strings (comprising of digits 0 to 9), e.g., "find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++41" and "find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++42". It does not match with "" (empty string), "find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++43", "find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++44", "find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++45", etc. However, it also matches "find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++46", "find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++47" and "find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++48" with leading zeros.

Example: Positive Integer Literals [1-9][0-9]*|0 or [1-9]\d*|0

  1. find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++49 matches any character between 1 to 9; ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$8 matches zero or more digits. The find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++7 is an occurrence indicator representing zero or more occurrences. Together, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++52 matches any numbers without a leading zero.
  2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 237 represents the OR operator; which is used to include the number find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++41.
  3. This expression matches "find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++41" and "find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++56"; but does not match "find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++46" and "find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++47" (but see below).
  4. You can replace input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"6 by metacharacter find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++17, but not find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++49.
  5. We did not use position anchors find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9 and 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230 in this regex. Hence, it can match any parts of the input string. For examples,
    1. If the input string is "find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++45", it matches the substring find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++07.
    2. If the input string is find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++08, it matches nothing.
    3. If the input string is find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++67, it matches substrings find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++07, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++11 and find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++12 (three matches).
    4. If the input string is find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++71, it matches substrings: find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++12, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++12 and find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++74 (three matches)!!!

Example: Full Integer Literals ^[+-]?[1-9][0-9]*|0$ or ^[+-]?[1-9]\d*|0$

  1. This regex match an Integer literal (for entire string with the position anchors), both positive, negative and zero.
  2. find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++75 matches either find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++6 or ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$0 sign. find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++8 is an occurrence indicator denoting 0 or 1 occurrence, i.e. optional. Hence, x0 matches an optional leading find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++6 or ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$0 sign.
  3. We have covered three occurrence indicators: find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++6 for one or more, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++7 for zero or more, and find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++8 for zero or one.

Example: Identifiers (or Names) [a-zA-Z_][0-9a-zA-Z_]* or [a-zA-Z_]\w*

  1. Begin with one letters or underscore, followed by zero or more digits, letters and underscore.
  2. You can use metacharacter find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++85 for a word character "x"0. Recall that metacharacter find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++17 can be used for a digit input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"6.

Example: Image Filenames ^\w+\.(gif|png|jpg|jpeg)$

  1. The position anchors find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9 and 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230 match the beginning and the ending of the input string, respectively. That is, this regex shall match the entire input string, instead of a part of the input string (substring).
  2. find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++91 matches one or more word characters (same as find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++92).
  3. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 460 matches the dot find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++94 character. We need to use 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 460 to represent find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5 as find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5 has special meaning in regex. The 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238 is known as the escape code, which restore the original literal meaning of the following character. Similarly, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++7, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++6, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++8 (occurrence indicators), find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230 (position anchors) have special meaning in regex. You need to use an escape code to match with these characters.
  4. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2304 matches either "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2305", "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2306", "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2307" or "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2308". The 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 237 denotes "OR" operator. The parentheses are used for grouping the selections.
  5. The modifier 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2310 after the regex specifies case-insensitive matching (applicable to some languages like Perl and JavaScript only). That is, it accepts "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2311" and "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2312".

Example: Email Addresses ^\w+([.-]?\w+)*@\w+([.-]?\w+)*(\.\w{2,3})+$

  1. The position anchors find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9 and 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230 match the beginning and the ending of the input string, respectively. That is, this regex shall match the entire input string, instead of a part of the input string (substring).
  2. find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++91 matches 1 or more word characters (same as find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++92).
  3. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2317 matches an optional character find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5 or ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$0. Although dot (find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5) has special meaning in regex, in a character class (square brackets) any characters except find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9, ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$0, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 234 or 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238 is a literal, and do not require escape sequence.
  4. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2325 matches 0 or more occurrences of 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2326.
  5. The sub-expression 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2327 is used to match the username in the email, before the find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++3 sign. It begins with at least one word character "x"0, followed by more word characters or find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5 or ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$0. However, a find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5 or ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$0 must follow by a word character "x"0. That is, the input string cannot begin with find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5 or ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$0; and cannot contain "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2337", "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2338", "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2339" or "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2340". Example of valid string are "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2341".
  6. The find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++3 matches itself. In regex, all characters other than those having special meanings matches itself, e.g., 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2343 matches 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2343, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2345 matches 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2345, and etc.
  7. Again, the sub-expression 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2327 is used to match the email domain name, with the same pattern as the username described above.
  8. The sub-expression 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2348 matches a find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5 followed by two or three word characters, e.g., "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2350", "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2351", "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2352", "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2353", "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2354".
  9. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2355 specifies that the above sub-expression could occur one or more times, e.g., "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2350", "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2357", "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2358" etc.

Exercise: Interpret this regex, which provide another representation of email address: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2359.

Example: Swapping Words using Parenthesized Back-References ^(\S+)\s+(\S+)$ and $2 $1

  1. The find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9 and 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230 match the beginning and ending of the input string, respectively.
  2. The 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2362 (lowercase 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2363) matches a whitespace (blank, tab 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 469, and newline 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200 or 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 468). On the other hand, the 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2367 (uppercase 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2368) matches anything that is NOT matched by 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2362, i.e., non-whitespace. In regex, the uppercase metacharacter denotes the inverse of the lowercase counterpart, for example, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++85 for word character and 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2371 for non-word character; find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++17 for digit and 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2373 or non-digit.
  3. The above regex matches two words (without white spaces) separated by one or more whitespaces.
  4. Parentheses 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2374 have two meanings in regex:
    1. to group sub-expressions, e.g., 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2375
    2. to provide a so-called back-reference for capturing and extracting matches.
  5. The parentheses in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2376, called parenthesized back-reference, is used to extract the matched substring from the input string. In this regex, there are two 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2376, match the first two words, separated by one or more whitespaces 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2378. The two matched words are extracted from the input string and typically kept in special variables "x"9 and 90 (or 91 and 92 in Python), respectively.
  6. To swap the two words, you can access the special variables, and print "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2383" (via a programming language); or substitute operator "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2384" (in Perl).
Code Example in Python

Python keeps the parenthesized back references in 91, 92, .... Also, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2387 keeps the entire match.

Code Example in Java

Java keeps the parenthesized back references in "x"9, 90, ....

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Example: HTTP Addresses ^\/\/\S+(\/\S+)*(\/)?$

  1. Begin with 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2390. Take note that you may need to write 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2391 as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2392 with an escape code in some languages (JavaScript, Perl).
  2. Followed by 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2367, one or more non-whitespaces, for the domain name.
  3. Followed by 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2394, zero or more "/...", for the sub-directories.
  4. Followed by 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2395, an optional (0 or 1) trailing 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2391, for directory request.

Example: Regex Patterns in AngularJS

The following rather complex regex patterns are used by AngularJS in JavaScript syntax:

Example: Sample Regex in Perl

Regular Expression (Regex) Syntax

A Regular Expression (or Regex) is a pattern (or filter) that describes a set of strings that matches the pattern. In other words, a regex accepts a certain set of strings and rejects the rest.

A regex consists of a sequence of characters, metacharacters (such as find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++17, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2373, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238s, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4601, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++85, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2371) and operators (such as find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++6, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++7, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++8, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 237, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9). They are constructed by combining many smaller sub-expressions.

Matching a Single Character

The fundamental building blocks of a regex are patterns that match a single character. Most characters, including all letters (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4609 and 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4610) and digits (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4611), match itself. For example, the regex x matches substring "x"; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4614 matches 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4615; and 9 matches find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++0.

Non-alphanumeric characters without special meaning in regex also matches itself. For example, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++1 matches find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++2; find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++3 matches find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++4.

Regex Special Characters and Escape Sequences

Regex's Special Characters

These characters have special meaning in regex (I will discuss in detail in the later sections):

  • metacharacter: dot (find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5)
  • bracket list: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4623
  • position anchors: find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230
  • occurrence indicators: find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++6, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++7, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++8, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4629
  • parentheses: "x"8
  • or: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 237
  • escape and metacharacter: backslash (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238)
Escape Sequences

The characters listed above have special meanings in regex. To match these characters, we need to prepend it with a backslash (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238), known as escape sequence.  For examples, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 462 matches 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 463; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4636 matches 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4637; and 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 460 matches 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 461.

Regex also recognizes common escape sequences such as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 468 for newline, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 469 for tab, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200 for carriage-return, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 201 for a up to 3-digit octal number, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202 for a two-digit hex code, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 203 for a 4-digit Unicode, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 204 for a 8-digit Unicode.

Code Example in PythonCode Example in JavaScript

[TODO]

Code Example in Java

[TODO]

Matching a Sequence of Characters (String or Text)

Sub-Expressions

A regex is constructed by combining many smaller sub-expressions or atoms. For example, the regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4647 matches the string "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4647". The matching, by default, is case-sensitive, but can be set to case-insensitive via modifier.

OR (|) Operator

You can provide alternatives using the "OR" operator, denoted by a vertical bar 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4649. For example, the regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4650 accepts strings "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4651", "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4652", "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4653" or "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4654".

Bracket List (Character Class) [...], [^...], [.-.]

A bracket expression is a list of characters enclosed by 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4623, also called character class. It matches ANY ONE character in the list. However, if the first character of the list is the caret (find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9), then it matches ANY ONE character NOT in the list. For example, the regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4657 matches a single digit find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++41, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4659, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4654, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4661, or 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4662; the regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4663 matches any single character other than find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++41, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4659, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4654, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4661, or 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4662.

Instead of listing all characters, you could use a range expression inside the bracket. A range expression consists of two characters separated by a hyphen (^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$0). It matches any single character that sorts between the two characters, inclusive. For example, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4670 is the same as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4671. You could include a caret (find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9) in front of the range to invert the matching. For example, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4673 is equivalent to 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4674.

Most of the special regex characters lose their meaning inside bracket list, and can be used as they are; except find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9, ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$0, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 234 or 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238.

  • To include a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 234, place it first in the list, or use escape 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4680.
  • To include a find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9, place it anywhere but first, or use escape 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4682.
  • To include a ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$0 place it last, or use escape 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4684.
  • To include a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238, use escape 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 466.
  • No escape needed for the other characters such as find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++6, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++7, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++8, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 231, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 232, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 235, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 236, and etc, inside the bracket list
  • You can also include metacharacters (to be explained in the next section), such as find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++85, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2371, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++17, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2373, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2362, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4601 inside the bracket list.
Name Character Classes in Bracket List (For Perl Only?)

Named (POSIX) classes of characters are pre-defined within bracket expressions. They are:

  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2001, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2002, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2003: letters+digits, letters, digits.
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2004: hexadecimal digits.
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2005, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2006: lowercase/uppercase letters.
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2007: Control characters
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2008: printable characters, except space.
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2009: printable characters, include space.
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2010: printable characters, excluding letters and digits.
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2011: whitespace

For example, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2012 means 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2013. (Note that the square brackets in these class names are part of the symbolic names, and must be included in addition to the square brackets delimiting the bracket list.)

Metacharacters ., \w, \W, \d, \D, \s, \S

A metacharacter is a symbol with a special meaning inside a regex.

  • The metacharacter dot (find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5) matches any single character except newline 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 468 (same as x8). For example, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2017 matches any 3 characters (including alphabets, numbers, whitespaces, but except newline); 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2018 matches "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2019", "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2020", "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2021", and so on.
  • find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++85 (word character) matches any single letter, number or underscore (same as "x"0). The uppercase counterpart 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2371 (non-word-character) matches any single character that doesn't match by find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++85 (same as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2026).
  • In regex, the uppercase metacharacter is always the inverse of the lowercase counterpart.
  • find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++17 (digit) matches any single digit (same as input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"6). The uppercase counterpart 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2373 (non-digit) matches any single character that is not a digit (same as input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"8).
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2362 (space) matches any single whitespace (same as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2032, blank, tab, newline, carriage-return and form-feed). The uppercase counterpart 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4601 (non-space) matches any single character that doesn't match by 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2362 (same as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2035).

Examples:

Backslash (\) and Regex Escape Sequences

Regex uses backslash (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238) for two purposes:

  1. for metacharacters such as find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++17 (digit), 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2373 (non-digit), 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2362 (space), 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4601 (non-space), find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++85 (word), 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2371 (non-word).
  2. to escape special regex characters, e.g., 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 460 for find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 462 for find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++6, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2047 for find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++7, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2049 for find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++8. You also need to write 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 466 for 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238 in regex to avoid ambiguity.
  3. Regex also recognizes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 468 for newline, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 469 for tab, etc.

Take note that in many programming languages (C, Java, Python), backslash (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238) is also used for escape sequences in string, e.g., 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2056 for newline, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2057 for tab, and you also need to write 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2058 for 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238. Consequently, to write regex pattern 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 466 (which matches one 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 238) in these languages, you need to write 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2062 (two levels of escape!!!). Similarly, you need to write 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2063 for regex metacharacter find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++17. This is cumbersome and error-prone!!!

Occurrence Indicators (Repetition Operators): +, *, ?, {m}, {m,n}, {m,}

A regex sub-expression may be followed by an occurrence indicator (aka repetition operator):

  • find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++8: The preceding item is optional and matched at most once (i.e., occurs 0 or 1 times or optional).
  • find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++7: The preceding item will be matched zero or more times, i.e., ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$7
  • find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++6: The preceding item will be matched one or more times, i.e., ^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$3
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2070: The preceding item is matched exactly m times.
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2071: The preceding item is matched m or more times, i.e., x7
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2073: The preceding item is matched at least m times, but not more than n times.

For example: The regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2074 accepts "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2075", "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2076" and "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2077".

Modifiers

You can apply modifiers to a regex to tailor its behavior, such as global, case-insensitive, multiline, etc. The ways to apply modifiers differ among languages.

In Perl, you can attach modifiers after a regex, in the form of 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2078. For examples:

In Java, you apply modifiers when compiling the regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2079. For example,

The commonly-used modifer modes are:

  • Case-Insensitive mode (or 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2310): case-insensitive matching for letters.
  • Global (or 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2081): match All instead of first match.
  • Multiline mode (or x3): affect find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2085 and 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2086. In multiline mode, find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9 matches start-of-line or start-of-input; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230 matches end-of-line or end-of-input, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2085 matches start-of-input; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2086 matches end-of-input.
  • Single-line mode (or 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2363): Dot (find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++5) will match all characters, including newline.
  • Comment mode (or x): allow and ignore embedded comment starting with 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2094 till end-of-line (EOL).
  • more...

Greediness, Laziness and Backtracking for Repetition Operators

Greediness of Repetition Operators *, +, ?, {m,n}: The repetition operators are greedy operators, and by default grasp as many characters as possible for a match. For example, the regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2074 try to match for "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2077", then "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2076", and then "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2075".

Lazy Quantifiers *?, +?, ??, {m,n}?, {m,}?, : You can put an extra find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++8 after the repetition operators to curb its greediness (i.e., stop at the shortest match). For example,

input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"

Backtracking: If a regex reaches a state where a match cannot be completed, it backtracks by unwinding one character from the greedy match. For example, if the regex input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"00 is matched against the string "input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"01", the input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"02 first matches "input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"01"; unwinds to match "input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"04"; unwinds to match "input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"05"; and finally unwinds to match "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4614", such that the rest of the patterns can find a match.

Possessive Quantifiers *+, ++, ?+, {m,n}+, {m,}+: You can put an extra find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++6 to the repetition operators to disable backtracking, even it may result in match failure. e.g, input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"08 will not match input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"09. This feature might not be supported in some languages.

Position Anchors ^, $, \b, \B, \<, \>, \A, \Z

Positional anchors DO NOT match actual character, but matches position in a string, such as start-of-line, end-of-line, start-of-word, and end-of-word.

  • ^ and $: The find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9 matches the start-of-line. The 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230 matches the end-of-line excluding newline, or end-of-input (for input not ending with newline). These are the most commonly-used position anchors. For examples,
  • \b and \B: The "x"5 matches the boundary of a word (i.e., start-of-word or end-of-word); and input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"13 matches inverse of "x"5, or non-word-boundary. For examples,
  • input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"15 and input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"16: The input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"15 and input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"16 match the start-of-word and end-of-word, respectively (compared with "x"5, which can match both the start and end of a word).
  • \A and \Z: The 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2085 matches the start of the input. The 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2086 matches the end of the input.
    They are different from find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9 and 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230 when it comes to matching input with multiple lines. find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++9 matches at the start of the string and after each line break, while 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2085 only matches at the start of the string. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230 matches at the end of the string and before each line break, while 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2086 only matches at the end of the string. For examples,

Capturing Matches via Parenthesized Back-References & Matched Variables $1, $2, ...

Parentheses "x"8 serve two purposes in regex:

  1. Firstly, parentheses "x"8 can be used to group sub-expressions for overriding the precedence or applying a repetition operator. For example,input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"30 (accepts find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++43, input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"32, input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"33, ...) is different from input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"34 (accepts find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++43, input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"36, input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"37, ...).
  2. Secondly, parentheses are used to provide the so called back-references (or capturing groups). A back-reference contains the matched substring. For examples, the regex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2376 creates one back-reference 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2376, which contains the first word (consecutive non-spaces) of the input string; the regex input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"40 creates two back-references: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2376 and another 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2376, containing the first two words, separated by one or more spaces 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2378.

These back-references (or capturing groups) are stored in special variables "x"9, 90, … (or 91, 92, ... in Python), where "x"9contains the substring matched the first pair of parentheses, and so on. For example, input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"40 creates two back-references which matched with the first two words. The matched words are stored in "x"9 and 90 (or 91 and 92), respectively.

Back-references are important to manipulate the string. Back-references can be used in the substitution string as well as the pattern. For examples,

(Advanced) Lookahead/Lookbehind, Groupings and Conditional

These feature might not be supported in some languages.

Positive Lookahead (?=pattern)

The input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"54 is known as positive lookahead. It performs the match, but does not capture the match, returning only the result: match or no match. It is also called assertion as it does not consume any characters in matching. For example, the following complex regex is used to match email addresses by AngularJS:

^(?=.{1,254}$)(?=.{1,64}@)[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+(\.[-!#$%&'*+/0-9=?A-Z^_`a-z{|}~]+)*@[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?(\.[A-Za-z0-9]([A-Za-z0-9-]{0,61}[A-Za-z0-9])?)*$

The first positive lookahead patterns input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"55 sets the maximum length to 254 characters. The second positive lookahead input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"56 sets maximum of 64 characters before the input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"57 sign for the username.

Negative Lookahead (?!pattern)

Inverse of input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"54. Match if input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"59 is missing. For example, input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"60 matches input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"61 in input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"62 (not consuming input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"63); but not input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"64. Whereas input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"65 matches input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"61 in input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"64, but not find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++43.

Positive Lookbehind (?<=pattern)

[TODO]

Negative Lookbehind (?pattern)

[TODO]

Non-Capturing Group (?:pattern)

Recall that you can use Parenthesized Back-References to capture the matches. To disable capturing, use input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"69 inside the parentheses in the form of input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"70. In other words, input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"69 disables the creation of a capturing group, so as not to create an unnecessary capturing group.

Example: [TODO]

Named Capturing Group (?<name>pattern)

The capture group can be referenced later by input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"72.

Atomic Grouping (>pattern)

Disable backtracking, even if this may lead to match failure.

Conditional (?(Cond)then|else)

[TODO]

Unicode

The metacharacters find() found substring "00123" starting at index 3 and ending at index 8 find() found substring "456" starting at index 11 and ending at index 14 find() found substring "0" starting at index 15 and ending at index 16 matches() found nothing lookingAt() found nothing abc**xyz456_0 abc++xyz++_++85, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2371, (word and non-word character), "x"5, input = "The first and second instances" regex = .* matches "first and second" But regex = .*? produces two matches: "first" and "second"13 (word and non-word boundary) recongize Unicode characters.

Apa itu regex php?

Regex merupakan singkatan dari Regular Expression, yaitu sebuah metode untuk mencari suatu pola dalam sebuah string. Dalam PHP, yang sering digunakan adalah PCRE atau “Perl Compatible Regular Expression†.

Apa itu Regex Pattern?

Regex itu adalah sebuah teks dalam bentuk pola untuk pencarian dan banyak dipakai untuk pencocokan, pencarian, dan manipulasi teks.

Apa itu regex javascript?

Regular Expression(Regex) adalah pola yang digunakan sebagai kriteria untuk mendapatkan kombinasi karakter pada suatu string.

Apa kepanjangan dari regexp?

Singkatan regex dan regexp (regular expression) menunjukkan Regular Expression yang digunakan dalam ilmu komputer teoritis, pemrograman, pengembangan perangkat lunak, pengolah kata dan optimisasi mesin pencari.

Postingan terbaru

LIHAT SEMUA